You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently covered are Models, Playground, and Finetune.
Bulk Run and evaluation are both missing.
Note while the evaluations appear to be standard, it won't be clear how they apply specifically in this case or may have conflicting documentation if a google search lands someone for example at a description of F1 score, where the harmonic mean of precision and recall isn't helpful in understanding how that applies to LLM results.
Of specific interest: Which of the evaluations require use of a model? (gpt4o-mini was the only choice .. but which eval caused that choice to be presented (I am assuming METEOR?) was unclear
The text was updated successfully, but these errors were encountered:
Thanks for your feedback, we'll update the documentation to cover topics you mentioned.
For you question about the requirement of judge model: currently, only LLM-based evaluators require use of a model. At this time, only Azure OpenAI models and those compatible with OpenAI API (including OpenAI models and GitHub models) are supported.
You'll first need to add model from Model Catalog, then you'll have the choice to select this model while creating a new evaluation.
Currently covered are Models, Playground, and Finetune.
Bulk Run and evaluation are both missing.
Note while the evaluations appear to be standard, it won't be clear how they apply specifically in this case or may have conflicting documentation if a google search lands someone for example at a description of F1 score, where the harmonic mean of precision and recall isn't helpful in understanding how that applies to LLM results.
Of specific interest: Which of the evaluations require use of a model? (gpt4o-mini was the only choice .. but which eval caused that choice to be presented (I am assuming METEOR?) was unclear
The text was updated successfully, but these errors were encountered: