Replies: 1 comment 6 replies
-
Hi, there are both asyncio and openai examples in : https://github.com/ggerganov/llama.cpp/tree/master/examples/server/tests Generated tokens will be received after all prompt tokens are processed. Please check the figures in the Also, if you need additional help, please share the command and the model you use to start the server. |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I'm using the OpenAI Python library with the llama.cpp HTTP Server and Jupyter Notebook. I'm trying to stream the output from the API response instead of receiving the full output at once. However, I'm having trouble getting the streaming to work properly.
Here's the code I'm using:
I've set stream=True in the client.chat.completions.create() method, but the output is not being streamed as expected. Instead, it seems to wait for the entire response before printing it.
I'm looking for any references, examples, or guidance on how to properly implement streaming with the OpenAI Python library when using the llama.cpp HTTP Server. I want to be able to display the generated text in real-time as it is being produced by the API.
Any help or insights would be greatly appreciated. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions