Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add support for Phi-4 model #10814

Open
4 tasks done
fairydreaming opened this issue Dec 13, 2024 · 0 comments · May be fixed by #10817
Open
4 tasks done

Feature Request: Add support for Phi-4 model #10814

fairydreaming opened this issue Dec 13, 2024 · 0 comments · May be fixed by #10817
Labels
enhancement New feature or request

Comments

@fairydreaming
Copy link
Collaborator

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Microsoft has released a new Phi-4 14B model. So far it's available only on Azure AI Foundry, in a few days it will appear on HuggingFace.

Motivation

The model is advertised as having strong reasoning abilities despite its relatively small size. It would be great to have it supported in llama.cpp.

Possible Implementation

The model uses Phi3ForCausalLM architecture that is already supported in llama.cpp. The differences I noticed that cause problems are:

  1. It uses GPT2Tokenizer tokenizer_class, not LlamaTokenizer like the previous Phi models. The convert_hf_to_gguf.py script expects Phi3ForCausalLM-based models to have SentencePiece tokenizer.model file and throws exception if it's not present. It has to be modified to support Phi-4.
  2. The model has sliding_window parameter value set to null in config.json. Phi-4 Technical Report says:

The phi-4 model is based on a decoder-only transformer architecture with 14B parameters and a default context length of 4096. This is later extended to a 16K context length during midtraining. The architecture closely follows phi-3-medium, except that we now use the tiktoken tokenizer (for better multilingual support) with a padded vocabulary size of 100,352 (including unused tokens) and we use full attention over the 4K context length, rather than a 2K sliding window used in phi-3-medium

My initial solution for the 1st problem was:

diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py
index c63d929c..1ae37b83 100755
--- a/convert_hf_to_gguf.py
+++ b/convert_hf_to_gguf.py
@@ -2129,6 +2129,9 @@ class Phi3MiniModel(Model):
     model_arch = gguf.MODEL_ARCH.PHI3
 
     def set_vocab(self):
+        if self.metadata.name == "Phi 4":
+            return self._set_vocab_gpt2()
+
         from sentencepiece import SentencePieceProcessor
 
         tokenizer_path = self.dir_model / 'tokenizer.model'

As for the second problem, I manually changed sliding_window parameter value to the max context length (16384) in config.json before conversion. This allowed me to test the model. I suppose the final implementation shall detect presence of Phi 4 model and build full KQ mask instead of sliding window KQ mask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant