We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version: 3411 (e02b597) built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
Linux
CPU
NA
llama-3.2-3B
when I run ./llama-server -m <model_name>, got error:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 255, got 254
The model was converted using python convert_hf_to_gguf.py.
No response
llama_model_loader: - type f32: 58 tensors llama_model_loader: - type f16: 197 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 3072 llm_load_print_meta: n_layer = 28 llm_load_print_meta: n_head = 24 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 3 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 8192 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = ?B llm_load_print_meta: model ftype = F16 llm_load_print_meta: model params = 3.21 B llm_load_print_meta: model size = 5.98 GiB (16.00 BPW) llm_load_print_meta: general.name = 0cb88a4f764b7a12671c53f0838cd831a0843b95 llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 llm_load_tensors: ggml ctx size = 0.12 MiB llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 255, got 254 llama_load_model_from_file: failed to load model llama_init_from_gpt_params: error: failed to load model './models/0cb88a4f764b7a12671c53f0838cd831a0843b95-3.2B-0cb88a4f764b7a12671c53f0838cd831a0843b95-F16.gguf' ERR [ load_model] unable to load model | tid="139850917829632" timestamp=1733839943 model="./models/0cb88a4f764b7a12671c53f0838cd831a0843b95-3.2B-0cb88a4f764b7a12671c53f0838cd831a0843b95-F16.gguf" Segmentation fault (core dumped)
The text was updated successfully, but these errors were encountered:
That is quite an old build, I'd highly recommend updating. I assume this is from before they added the rope tensor, it's a common error for old builds
Sorry, something went wrong.
No branches or pull requests
Name and Version
version: 3411 (e02b597)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CPU
Hardware
NA
Models
llama-3.2-3B
Problem description & steps to reproduce
when I run ./llama-server -m <model_name>, got error:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 255, got 254
The model was converted using python convert_hf_to_gguf.py.
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: