Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval bug: GGML_ASSERT((qs.n_attention_wv == n_attn_layer) && "n_attention_wv is unexpected") failed Could not attach to process. #10799

Open
hissain opened this issue Dec 12, 2024 · 3 comments

Comments

@hissain
Copy link

hissain commented Dec 12, 2024

Name and Version

version: 4310 (5555c0c)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

RTX GeForce 4090 with 24 GB VRAM

Models

Huggingface/ Orpo-Llama-3.2-1B-15k

Problem description & steps to reproduce

I was trying to follow this tutorial ( https://github.com/hissain/ml/blob/main/codes/quantization_example.ipynb ) and in step running,

$llama.cpp/bin/llama-quantize ./Orpo-Llama-3.2-1B-15k/Llama-3.2-0.03K-1b-F16.gguf ./Orpo-Llama-3.2-1B-15k/Orpo-Llama-3.2-1B-15k-Q4_K_M.gguf Q4_K_M

First Bad Commit

No response

Relevant log output

$ llama.cpp/bin/llama-quantize ./Orpo-Llama-3.2-1B-15k/Llama-3.2-0.03K-1b-F16.gguf ./Orpo-Llama-3.2-1B-15k/Orpo-Llama-3.2-1B-15k-Q4_K_M.gguf Q4_K_M
main: build = 4310 (5555c0c1)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing './Orpo-Llama-3.2-1B-15k/Llama-3.2-0.03K-1b-F16.gguf' to './Orpo-Llama-3.2-1B-15k/Orpo-Llama-3.2-1B-15k-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 40 key-value pairs and 1 tensors from ./Orpo-Llama-3.2-1B-15k/Llama-3.2-0.03K-1b-F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B
llama_model_loader: - kv   3:                       general.organization str              = Meta Llama
llama_model_loader: - kv   4:                           general.finetune str              = 1b
llama_model_loader: - kv   5:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   6:                         general.size_label str              = 0.03K
llama_model_loader: - kv   7:                            general.license str              = mit
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Llama 3.2 1B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Meta Llama
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/meta-llama/Lla...
llama_model_loader: - kv  12:                      general.dataset.count u32              = 1
llama_model_loader: - kv  13:                     general.dataset.0.name str              = Orpo Dpo Mix 40k
llama_model_loader: - kv  14:             general.dataset.0.organization str              = Mlabonne
llama_model_loader: - kv  15:                 general.dataset.0.repo_url str              = https://huggingface.co/mlabonne/orpo-...
llama_model_loader: - kv  16:                               general.tags arr[str,1]       = ["text-generation"]
llama_model_loader: - kv  17:                          llama.block_count u32              = 16
llama_model_loader: - kv  18:                       llama.context_length u32              = 131072
llama_model_loader: - kv  19:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv  20:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  21:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  22:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  23:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  24:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  25:                 llama.attention.key_length u32              = 64
llama_model_loader: - kv  26:               llama.attention.value_length u32              = 64
llama_model_loader: - kv  27:                          general.file_type u32              = 1
llama_model_loader: - kv  28:                           llama.vocab_size u32              = 128258
llama_model_loader: - kv  29:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv  30:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  31:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  32:                      tokenizer.ggml.tokens arr[str,128258]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  33:                  tokenizer.ggml.token_type arr[i32,128258]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  34:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  35:                tokenizer.ggml.bos_token_id u32              = 128256
llama_model_loader: - kv  36:                tokenizer.ggml.eos_token_id u32              = 128257
llama_model_loader: - kv  37:            tokenizer.ggml.padding_token_id u32              = 128257
llama_model_loader: - kv  38:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
llama_model_loader: - kv  39:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:    1 tensors
/home/hissain/github/samsung/ml/quantization/llama.cpp/src/llama.cpp:18812: GGML_ASSERT((qs.n_attention_wv == n_attn_layer) && "n_attention_wv is unexpected") failed
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)
@arch-btw
Copy link
Contributor

There's another report of this here: #10793

@ggerganov
Copy link
Owner

There is some mismatch in the tensor names probably. Provide the full log including -lv 1.

@ajitwadekar
Copy link

@ggerganov can you please let me know the command to get the full log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants