HanLP语义相似度，希望可以输出句子的embedding以便做存储，提高效率 #1792

yuxulingche · 2022-11-16T01:27:01Z

Describe the feature and the current behavior/state.
当前使用sts，输入两个句子，对于大量句子比较，效率太低，虽然可以batch来做，但效率还是不够

Will this change the current api? How?
可以在sts里增加一个输出

Who will benefit with this feature?
sts使用者

Are you willing to contribute it (Yes/No):
No

System information

Any other info
HanLP语义相似度比较的效果不错，非常感谢作者的贡献，但现在有大量句子需要比较，希望HanLP能增加输出句子embedding的功能，先存储，使用时算cos距离，提高实际使用中的比较效率

hankcs · 2022-11-16T01:35:01Z

Hi, 目前的STS模型需要同时输入一对句子计算相似度，不支持输出embedding。我们正在研发用于检索的句子embedding，敬请关注后续更新。

yfq512 · 2023-04-04T00:57:20Z

同样期待高效率的方法，目前可以使用simhash和bert的方法，但simhash准确率一般，bert计算量又大

zhangyifei1 · 2024-10-25T06:26:38Z

请问现在支持了吗

shenwenxin · 2024-10-30T03:07:24Z

Hi, 目前的STS模型需要同时输入一对句子计算相似度，不支持输出embedding。我们正在研发用于检索的句子embedding，敬请关注后续更新。

@hankcs 你好请问现在支持了吗？

yuxulingche added the feature request Suggest an idea for this project label Nov 16, 2022

yuxulingche assigned hankcs Nov 16, 2022

Provide feedback