We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version: 4310 (5555c0c) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Linux
CUDA
RTX GeForce 4090 with 24 GB VRAM
Huggingface/ Orpo-Llama-3.2-1B-15k
I was trying to follow this tutorial ( https://github.com/hissain/ml/blob/main/codes/quantization_example.ipynb ) and in step running,
$llama.cpp/bin/llama-quantize ./Orpo-Llama-3.2-1B-15k/Llama-3.2-0.03K-1b-F16.gguf ./Orpo-Llama-3.2-1B-15k/Orpo-Llama-3.2-1B-15k-Q4_K_M.gguf Q4_K_M
No response
$ llama.cpp/bin/llama-quantize ./Orpo-Llama-3.2-1B-15k/Llama-3.2-0.03K-1b-F16.gguf ./Orpo-Llama-3.2-1B-15k/Orpo-Llama-3.2-1B-15k-Q4_K_M.gguf Q4_K_M main: build = 4310 (5555c0c1) main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu main: quantizing './Orpo-Llama-3.2-1B-15k/Llama-3.2-0.03K-1b-F16.gguf' to './Orpo-Llama-3.2-1B-15k/Orpo-Llama-3.2-1B-15k-Q4_K_M.gguf' as Q4_K_M llama_model_loader: loaded meta data with 40 key-value pairs and 1 tensors from ./Orpo-Llama-3.2-1B-15k/Llama-3.2-0.03K-1b-F16.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.2 1B llama_model_loader: - kv 3: general.organization str = Meta Llama llama_model_loader: - kv 4: general.finetune str = 1b llama_model_loader: - kv 5: general.basename str = Llama-3.2 llama_model_loader: - kv 6: general.size_label str = 0.03K llama_model_loader: - kv 7: general.license str = mit llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Llama 3.2 1B llama_model_loader: - kv 10: general.base_model.0.organization str = Meta Llama llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/meta-llama/Lla... llama_model_loader: - kv 12: general.dataset.count u32 = 1 llama_model_loader: - kv 13: general.dataset.0.name str = Orpo Dpo Mix 40k llama_model_loader: - kv 14: general.dataset.0.organization str = Mlabonne llama_model_loader: - kv 15: general.dataset.0.repo_url str = https://huggingface.co/mlabonne/orpo-... llama_model_loader: - kv 16: general.tags arr[str,1] = ["text-generation"] llama_model_loader: - kv 17: llama.block_count u32 = 16 llama_model_loader: - kv 18: llama.context_length u32 = 131072 llama_model_loader: - kv 19: llama.embedding_length u32 = 2048 llama_model_loader: - kv 20: llama.feed_forward_length u32 = 8192 llama_model_loader: - kv 21: llama.attention.head_count u32 = 32 llama_model_loader: - kv 22: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 23: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 24: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 25: llama.attention.key_length u32 = 64 llama_model_loader: - kv 26: llama.attention.value_length u32 = 64 llama_model_loader: - kv 27: general.file_type u32 = 1 llama_model_loader: - kv 28: llama.vocab_size u32 = 128258 llama_model_loader: - kv 29: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 30: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 31: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 32: tokenizer.ggml.tokens arr[str,128258] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 33: tokenizer.ggml.token_type arr[i32,128258] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 34: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 35: tokenizer.ggml.bos_token_id u32 = 128256 llama_model_loader: - kv 36: tokenizer.ggml.eos_token_id u32 = 128257 llama_model_loader: - kv 37: tokenizer.ggml.padding_token_id u32 = 128257 llama_model_loader: - kv 38: tokenizer.chat_template str = {% for message in messages %}{{'<|im_... llama_model_loader: - kv 39: general.quantization_version u32 = 2 llama_model_loader: - type f32: 1 tensors /home/hissain/github/samsung/ml/quantization/llama.cpp/src/llama.cpp:18812: GGML_ASSERT((qs.n_attention_wv == n_attn_layer) && "n_attention_wv is unexpected") failed Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. Aborted (core dumped)
The text was updated successfully, but these errors were encountered:
There's another report of this here: #10793
Sorry, something went wrong.
There is some mismatch in the tensor names probably. Provide the full log including -lv 1.
-lv 1
@ggerganov can you please let me know the command to get the full log
No branches or pull requests
Name and Version
version: 4310 (5555c0c)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX GeForce 4090 with 24 GB VRAM
Models
Huggingface/ Orpo-Llama-3.2-1B-15k
Problem description & steps to reproduce
I was trying to follow this tutorial ( https://github.com/hissain/ml/blob/main/codes/quantization_example.ipynb ) and in step running,
$llama.cpp/bin/llama-quantize ./Orpo-Llama-3.2-1B-15k/Llama-3.2-0.03K-1b-F16.gguf ./Orpo-Llama-3.2-1B-15k/Orpo-Llama-3.2-1B-15k-Q4_K_M.gguf Q4_K_M
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: