We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello, I have read the paper of Domino recently. And when I run the Domino according to this blog https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-domino/README.md and this one https://github.com/microsoft/DeepSpeedExamples/blob/master/training/DeepSpeed-Domino/README.md. I just run the shell https://github.com/microsoft/DeepSpeedExamples/blob/master/training/DeepSpeed-Domino/pretrain_llama_7b.sh. However, I met the following error: I think the cdb in deepspeed is from client such as Megatron? Therefore, I am very confused about the relationship between deepspeed and megatron. And if I want to run domino, should I use the https://github.com/microsoft/DeepSpeedExamples/blob/master/training/DeepSpeed-Domino/ or Megatron-DeepSpeed. At the same time, according to the paper, domino is tensor-parallel only? I want to know whether domino supports zero3 or not since I found that the optimizer is from megatron not from deepspeed?
The text was updated successfully, but these errors were encountered:
Hi @yingtongxiong, thanks for reporting this error, and we will fix it soon.
To run the code now, replace deepspee.comm with torch.distributed in the deepspeed/runtime/domino/transformer.py, and use https://github.com/microsoft/DeepSpeedExamples/blob/master/training/DeepSpeed-Domino/
deepspee.comm
torch.distributed
deepspeed/runtime/domino/transformer.py
Domino is tensor parallel only now. But we will support zero3 in the future.
Sorry, something went wrong.
Hi, @yingtongxiong
Thanks for the question.
Short answer is for now you should use https://github.com/microsoft/DeepSpeedExamples/blob/master/training/DeepSpeed-Domino/ for now, which is a minimum dependency we maintained relevant to Megatron.
Right now we have not incorporated with zero3 but it is on our roadmap. Thanks @hwchen2017 and please follow up with help here. thx
@hwchen2017 @GuanhuaWang Thank you very much
GuanhuaWang
hwchen2017
Successfully merging a pull request may close this issue.
Hello, I have read the paper of Domino recently. And when I run the Domino according to this blog https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-domino/README.md and this one https://github.com/microsoft/DeepSpeedExamples/blob/master/training/DeepSpeed-Domino/README.md. I just run the shell https://github.com/microsoft/DeepSpeedExamples/blob/master/training/DeepSpeed-Domino/pretrain_llama_7b.sh. However, I met the following error:
I think the cdb in deepspeed is from client such as Megatron? Therefore, I am very confused about the relationship between deepspeed and megatron. And if I want to run domino, should I use the https://github.com/microsoft/DeepSpeedExamples/blob/master/training/DeepSpeed-Domino/ or Megatron-DeepSpeed. At the same time, according to the paper, domino is tensor-parallel only? I want to know whether domino supports zero3 or not since I found that the optimizer is from megatron not from deepspeed?
The text was updated successfully, but these errors were encountered: