bug: inference of fine-tuning model is not working #143

KayMKM · 2025-01-09T07:59:19Z

I followed this document to fine-tune and deployed inference: https://github.com/microsoft/vscode-ai-toolkit/blob/main/doc/finetune.md

But I the inference endpoint occurs an error:

2025-01-09T06:46:43.857955385Z Traceback (most recent call last):
2025-01-09T06:46:43.857984860Z   File "/mount/inference/utils.py", line 55, in load_model
2025-01-09T06:46:43.875525329Z     model = AutoModelForCausalLM.from_pretrained(
2025-01-09T06:46:43.875556908Z             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.875562920Z   File "/opt/conda/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained
2025-01-09T06:46:43.875625156Z     return model_class.from_pretrained(
2025-01-09T06:46:43.875632990Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.875660385Z   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4000, in from_pretrained
2025-01-09T06:46:43.876266321Z     dispatch_model(model, **device_map_kwargs)
2025-01-09T06:46:43.876299283Z   File "/opt/conda/lib/python3.11/site-packages/accelerate/big_modeling.py", line 498, in dispatch_model
2025-01-09T06:46:43.876418477Z     model.to(device)
2025-01-09T06:46:43.876484914Z   File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2849, in to
2025-01-09T06:46:43.876775446Z     raise ValueError(
2025-01-09T06:46:43.876788781Z ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
2025-01-09T06:46:43.876792589Z 
2025-01-09T06:46:43.876795624Z During handling of the above exception, another exception occurred:
2025-01-09T06:46:43.876797928Z 
2025-01-09T06:46:43.876801214Z Traceback (most recent call last):
2025-01-09T06:46:43.876829868Z   File "/mount/inference/./gradio_chat.py", line 42, in <module>
2025-01-09T06:46:43.895541690Z     model = load_model(model_name, torch_dtype, quant_type)
2025-01-09T06:46:43.895572127Z             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-09T06:46:43.895577717Z   File "/mount/inference/utils.py", line 70, in load_model
2025-01-09T06:46:43.908149431Z     raise RuntimeError(f"Error loading model: {e}")
2025-01-09T06:46:43.908180659Z RuntimeError: Error loading model: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

I tried to upgrade the transformer package version to 4.47.1 but not working

The text was updated successfully, but these errors were encountered:

swatDong · 2025-01-09T10:11:54Z

Could you please share more info like which model and what parameters?

microsoft-github-policy-service bot added the needs attention The issue needs contributor's attention label Jan 9, 2025

swatDong assigned XiaofuHuang Jan 9, 2025

swatDong added the needs more info Need user to provide more info label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: inference of fine-tuning model is not working #143

bug: inference of fine-tuning model is not working #143

KayMKM commented Jan 9, 2025 •

edited

Loading

swatDong commented Jan 9, 2025

bug: inference of fine-tuning model is not working #143

bug: inference of fine-tuning model is not working #143

Comments

KayMKM commented Jan 9, 2025 • edited Loading

swatDong commented Jan 9, 2025

KayMKM commented Jan 9, 2025 •

edited

Loading