Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]The higher the zero_stage, the GPU memory is consumed. #6936

Open
yangshenchang opened this issue Jan 9, 2025 · 2 comments
Open

[BUG]The higher the zero_stage, the GPU memory is consumed. #6936

yangshenchang opened this issue Jan 9, 2025 · 2 comments
Labels
bug Something isn't working training

Comments

@yangshenchang
Copy link

Thank you for your wonderful work. When I used a small model to test the zero strategy of deepspeed, the higher the stage, the more video memory is consumed. Is there any other operation that needs to be enabled?
The model used is resnet152Image

@yangshenchang yangshenchang added bug Something isn't working training labels Jan 9, 2025
@tjruwase
Copy link
Contributor

tjruwase commented Jan 9, 2025

@yangshenchang, ZeRO is designed for data-parallel training of multi-billion parameter models that are larger than GPU memory. ZeRO allocates some extra buffers for its memory optimizations. In other words, ZeRO is not recommended for small models like resnet152.

@yangshenchang
Copy link
Author

Ok,Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

2 participants