Feature Request: Add support for Phi-4 model #10814

fairydreaming · 2024-12-13T15:38:01Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Microsoft has released a new Phi-4 14B model. So far it's available only on Azure AI Foundry, in a few days it will appear on HuggingFace.

Motivation

The model is advertised as having strong reasoning abilities despite its relatively small size. It would be great to have it supported in llama.cpp.

Possible Implementation

The model uses Phi3ForCausalLM architecture that is already supported in llama.cpp. The differences I noticed that cause problems are:

It uses GPT2Tokenizer tokenizer_class, not LlamaTokenizer like the previous Phi models. The convert_hf_to_gguf.py script expects Phi3ForCausalLM-based models to have SentencePiece tokenizer.model file and throws exception if it's not present. It has to be modified to support Phi-4.
The model has sliding_window parameter value set to null in config.json. Phi-4 Technical Report says:

The phi-4 model is based on a decoder-only transformer architecture with 14B parameters and a default context length of 4096. This is later extended to a 16K context length during midtraining. The architecture closely follows phi-3-medium, except that we now use the tiktoken tokenizer (for better multilingual support) with a padded vocabulary size of 100,352 (including unused tokens) and we use full attention over the 4K context length, rather than a 2K sliding window used in phi-3-medium

My initial solution for the 1st problem was:

diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py
index c63d929c..1ae37b83 100755
--- a/convert_hf_to_gguf.py
+++ b/convert_hf_to_gguf.py
@@ -2129,6 +2129,9 @@ class Phi3MiniModel(Model):
     model_arch = gguf.MODEL_ARCH.PHI3
 
     def set_vocab(self):
+        if self.metadata.name == "Phi 4":
+            return self._set_vocab_gpt2()
+
         from sentencepiece import SentencePieceProcessor
 
         tokenizer_path = self.dir_model / 'tokenizer.model'

As for the second problem, I manually changed sliding_window parameter value to the max context length (16384) in config.json before conversion. This allowed me to test the model. I suppose the final implementation shall detect presence of Phi 4 model and build full KQ mask instead of sliding window KQ mask.

The text was updated successfully, but these errors were encountered:

fairydreaming added the enhancement New feature or request label Dec 13, 2024

fairydreaming linked a pull request Dec 13, 2024 that will close this issue

Add support for Microsoft Phi-4 model #10817

Open

BBC-Esq mentioned this issue Dec 18, 2024

Phi4 support please? OpenNMT/CTranslate2#1835

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add support for Phi-4 model #10814

Feature Request: Add support for Phi-4 model #10814

fairydreaming commented Dec 13, 2024

Feature Request: Add support for Phi-4 model #10814

Feature Request: Add support for Phi-4 model #10814

Comments

fairydreaming commented Dec 13, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation