You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Microsoft has released a new Phi-4 14B model. So far it's available only on Azure AI Foundry, in a few days it will appear on HuggingFace.
Motivation
The model is advertised as having strong reasoning abilities despite its relatively small size. It would be great to have it supported in llama.cpp.
Possible Implementation
The model uses Phi3ForCausalLM architecture that is already supported in llama.cpp. The differences I noticed that cause problems are:
It uses GPT2Tokenizer tokenizer_class, not LlamaTokenizer like the previous Phi models. The convert_hf_to_gguf.py script expects Phi3ForCausalLM-based models to have SentencePiece tokenizer.model file and throws exception if it's not present. It has to be modified to support Phi-4.
The model has sliding_window parameter value set to null in config.json. Phi-4 Technical Report says:
The phi-4 model is based on a decoder-only transformer architecture with 14B parameters and a default context length of 4096. This is later extended to a 16K context length during midtraining. The architecture closely follows phi-3-medium, except that we now use the tiktoken tokenizer (for better multilingual support) with a padded vocabulary size of 100,352 (including unused tokens) and we use full attention over the 4K context length, rather than a 2K sliding window used in phi-3-medium
As for the second problem, I manually changed sliding_window parameter value to the max context length (16384) in config.json before conversion. This allowed me to test the model. I suppose the final implementation shall detect presence of Phi 4 model and build full KQ mask instead of sliding window KQ mask.
The text was updated successfully, but these errors were encountered:
Prerequisites
Feature Description
Microsoft has released a new Phi-4 14B model. So far it's available only on Azure AI Foundry, in a few days it will appear on HuggingFace.
Motivation
The model is advertised as having strong reasoning abilities despite its relatively small size. It would be great to have it supported in llama.cpp.
Possible Implementation
The model uses Phi3ForCausalLM architecture that is already supported in llama.cpp. The differences I noticed that cause problems are:
GPT2Tokenizer
tokenizer_class, notLlamaTokenizer
like the previous Phi models. Theconvert_hf_to_gguf.py
script expectsPhi3ForCausalLM
-based models to have SentencePiecetokenizer.model
file and throws exception if it's not present. It has to be modified to support Phi-4.sliding_window
parameter value set tonull
in config.json. Phi-4 Technical Report says:My initial solution for the 1st problem was:
As for the second problem, I manually changed
sliding_window
parameter value to the max context length (16384) in config.json before conversion. This allowed me to test the model. I suppose the final implementation shall detect presence of Phi 4 model and build full KQ mask instead of sliding window KQ mask.The text was updated successfully, but these errors were encountered: