Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

流式推理启动速度 #1873

Closed
ZxnSnowy opened this issue Dec 18, 2024 · 1 comment
Closed

流式推理启动速度 #1873

ZxnSnowy opened this issue Dec 18, 2024 · 1 comment

Comments

@ZxnSnowy
Copy link

使用流式推理,第一次生成比第二次生成的时间慢很多。这是什么原因呢?第一次推理会有一些警告,与这些警告有关系吗?
模型加载:
image
第一次的输出:
image
第二次无警告:
image

@yitenghao
Copy link

不光是流式推理,只要加载模型就存在这个问题,我的理解是加载模型并没有加载起完整的模型,需要推理一次才会把一些变量对象创建好,下次推理就省了创建的步骤。所以我是改了api_v2.py,在启动api服务前随便拿个文本推理了一次,后面调用接口的耗时就正常了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants