Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]When fine-tuning an LLM, the following error occurs after training for some time: self.optimizer.param_groups[param_group_id]['params'] = [] IndexError: list index out of range #6857

Open
tdtgi opened this issue Dec 12, 2024 · 2 comments
Assignees
Labels
bug Something isn't working training

Comments

@tdtgi
Copy link

tdtgi commented Dec 12, 2024

File "/share/home/xxx/miniconda3/envs/qwen2_vl_FT/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 2095, in step
[rank2]: self._optimizer_step(sub_group_id)
[rank2]: File "/share/home/xxx/miniconda3/envs/qwen2_vl_FT/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 980, in _optimizer_step
[rank2]: self.optimizer.param_groups[param_group_id]['params'] = []
[rank2]: IndexError: list index out of range

@tdtgi tdtgi added bug Something isn't working training labels Dec 12, 2024
@jomayeri
Copy link
Contributor

Can you provide reproduction code?

@jomayeri jomayeri self-assigned this Dec 12, 2024
@tdtgi
Copy link
Author

tdtgi commented Dec 13, 2024

Can you provide reproduction code?

I am fine-tuning the LLM using LLaMA Factory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

2 participants