Deploying a PEFT StarCoderBase-1B in Tabby #1122

ClarkWain · 2023-12-26T09:49:09Z

ClarkWain
Dec 26, 2023

Hello,

After reviewing the documentation on the model registry, I understand that Tabby has been using llama.cpp for inference since version 0.5.0, which supports GGUF format model files. However, I've noticed that Tabby seems to support 8-bit quantization, which differs from the convert-hf-to-gguf.py script provided within llama.cpp that only supports float16 and float32.

Could you please explain how the StartCoderBase-1B model was converted into the 8-bit quantized format q8_0.v2.gguf?

Thank you for your assistance.

wsxiaoys · 2023-12-26T10:38:52Z

wsxiaoys
Dec 26, 2023
Maintainer

llama.cpp provides quantize tool to convert fp16 checkpoint to lower precision

0 replies

ClarkWain · 2023-12-27T02:54:55Z

ClarkWain
Dec 27, 2023
Author

I clone starcoder from https://huggingface.co/bigcode/starcoderbase-1b,

then convert to it to gguf file format by using convert-hf-to-gguf.py from llama.cpp.

python convert-hf-to-gguf.py /data/models/starcoderbase-1b/ --outfile startcoder1b.gguf

finally, when I try to run starcoder1 with llama.cpp, it failed.

./main -m startcoder1b.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e

log :

Log start
main: build = 1699 (b9f4795)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1703645466
llama_model_loader: loaded meta data with 17 key-value pairs and 292 tensors from startcoder1b.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = starcoder
llama_model_loader: - kv   1:                               general.name str              = StarCoder
llama_model_loader: - kv   2:                   starcoder.context_length u32              = 8192
llama_model_loader: - kv   3:                 starcoder.embedding_length u32              = 2048
llama_model_loader: - kv   4:              starcoder.feed_forward_length u32              = 8192
llama_model_loader: - kv   5:                      starcoder.block_count u32              = 24
llama_model_loader: - kv   6:             starcoder.attention.head_count u32              = 16
llama_model_loader: - kv   7:          starcoder.attention.head_count_kv u32              = 1
llama_model_loader: - kv   8:     starcoder.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv   9:                          general.file_type u32              = 1
llama_model_loader: - kv  10:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  11:                      tokenizer.ggml.tokens arr[str,49152]   = ["<|endoftext|>", "<fim_prefix>", "<f...
llama_model_loader: - kv  12:                  tokenizer.ggml.token_type arr[i32,49152]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  13:                      tokenizer.ggml.merges arr[str,48891]   = ["Ġ Ġ", "ĠĠ ĠĠ", "ĠĠĠĠ ĠĠ...
llama_model_loader: - kv  14:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  15:                tokenizer.ggml.eos_token_id u32              = 0
llama_model_loader: - kv  16:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - type  f32:  194 tensors
llama_model_loader: - type  f16:   98 tensors
llm_load_vocab: special tokens definition check successful ( 19/49152 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = starcoder
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 49152
llm_load_print_meta: n_merges         = 48891
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 1
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 16
llm_load_print_meta: f_norm_eps       = 1.0e-05
llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = F16
llm_load_print_meta: model params     = 1.14 B
llm_load_print_meta: model size       = 2.12 GiB (16.01 BPW) 
llm_load_print_meta: general.name     = StarCoder
llm_load_print_meta: BOS token        = 0 '<|endoftext|>'
llm_load_print_meta: EOS token        = 0 '<|endoftext|>'
llm_load_print_meta: UNK token        = 0 '<|endoftext|>'
llm_load_print_meta: LF token         = 145 'Ä'
llm_load_tensors: ggml ctx size       =    0.11 MiB
error loading model: create_tensor: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'startcoder1b.gguf'
main: error: unable to load model

So, Why did it fail? how you convert starcoder to gguf ?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying a PEFT StarCoderBase-1B in Tabby #1122

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Deploying a PEFT StarCoderBase-1B in Tabby #1122

ClarkWain Dec 26, 2023

Replies: 2 comments

wsxiaoys Dec 26, 2023 Maintainer

ClarkWain Dec 27, 2023 Author

ClarkWain
Dec 26, 2023

wsxiaoys
Dec 26, 2023
Maintainer

ClarkWain
Dec 27, 2023
Author