[BUG]The higher the zero_stage, the GPU memory is consumed. #6936

yangshenchang · 2025-01-09T09:35:16Z

Thank you for your wonderful work. When I used a small model to test the zero strategy of deepspeed, the higher the stage, the more video memory is consumed. Is there any other operation that needs to be enabled?
The model used is resnet152

tjruwase · 2025-01-09T19:32:52Z

@yangshenchang, ZeRO is designed for data-parallel training of multi-billion parameter models that are larger than GPU memory. ZeRO allocates some extra buffers for its memory optimizations. In other words, ZeRO is not recommended for small models like resnet152.

yangshenchang · 2025-01-10T00:41:46Z

Ok，Thanks a lot

yangshenchang added bug Something isn't working training labels Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]The higher the zero_stage, the GPU memory is consumed. #6936

[BUG]The higher the zero_stage, the GPU memory is consumed. #6936

yangshenchang commented Jan 9, 2025

tjruwase commented Jan 9, 2025

yangshenchang commented Jan 10, 2025

[BUG]The higher the zero_stage, the GPU memory is consumed. #6936

[BUG]The higher the zero_stage, the GPU memory is consumed. #6936

Comments

yangshenchang commented Jan 9, 2025

tjruwase commented Jan 9, 2025

yangshenchang commented Jan 10, 2025