Replies: 1 comment
-
Here's the full output of running |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there!
I'm having some trouble when using the
libllama
API from Python viactypes
, and I'd really appreciate it if anybody could help me figure this out. I'll try to explain the issue as effectively as I can within this post, but if you'd like to look at the full code, it's these two files:libllama.py
llama.py
Setup
The model I'm using is a q6_K GGUF quant of Llama-3.1-8B-Instruct. The quant is confirmed working with
llama-cli
and other llama.cpp examples. I'm loading it with 8192n_ctx
and 2048n_batch
.The prompt that I'm using to test the model is as follows. The
\n
characters are actually newlines, and not a literal"\n"
string. I'm using the Llama 3 instruct template:This tokenizes to:
... which is correct. 👍 So far so good. This
list[int]
of tokens is stored in a variable calledtokens
from here on out.Next, I set up the batch:
Then I call
llama_decode
with this batch:llama_decode
returns 0 - all good. No errors or warnings in the terminal. 👍The problem
The problem is that after calling
llama_decode
, the logits are not what I expect. Whether I get the logits viallama_get_logits
, orllama_get_logits_ith
, or even if I initialize and use a greedy sampler, the top token ID is always 1839. This token is' href'
, which does not make sense as the first word in a story about two potatoes in love. Using this model, and using the above tokens as the only context, I consistently get' href'
as the most likely next token, no matter how many times I try.I would expect that the output would be something like "Once" (as in "Once upon a time..."), "Sure" (as in "Sure, here's a silly story..."), or something like that. Not " href".
What I've tried
Using
logits_all = True
:Using
logits_all = False
:Using a
llama_sampler
, like so:... which prints:
Using a different model:
Replacing Llama-3.1-8B-Instruct with Llama-3.2-1B-Instruct:
My question
Why is this happening? I think I'm setting up the batch correctly.
llama_decode
returns 0. No errors in the terminal. Several different methods all yield the same nonsensical result. I've been trying to fix this for like 2 days straight and I'm just not sure what else to try at this point.I'm hoping someone smarter than me will be able to chime in and point me in the right direction. To that end, please excuse the following behaviour: @compilade @slaren @ngxson @JohannesGaessler @bartowski1182. I think you all know the codebase better than I do and if you could spare a few minutes to look over this issue, I'd be really grateful. If not, feel free to ignore. In either case, thank you for reading, and have a nice day. :)
Beta Was this translation helpful? Give feedback.
All reactions