Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on Evaluation is not present at the landing page for the Documentation link in the help and Feedback pane. #135

Open
SteveJSteiner opened this issue Dec 17, 2024 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@SteveJSteiner
Copy link

Currently covered are Models, Playground, and Finetune.
Bulk Run and evaluation are both missing.

Note while the evaluations appear to be standard, it won't be clear how they apply specifically in this case or may have conflicting documentation if a google search lands someone for example at a description of F1 score, where the harmonic mean of precision and recall isn't helpful in understanding how that applies to LLM results.

Of specific interest: Which of the evaluations require use of a model? (gpt4o-mini was the only choice .. but which eval caused that choice to be presented (I am assuming METEOR?) was unclear

@microsoft-github-policy-service microsoft-github-policy-service bot added the needs attention The issue needs contributor's attention label Dec 17, 2024
@qinezh qinezh added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 24, 2024
@qinezh
Copy link
Contributor

qinezh commented Dec 25, 2024

Thanks for your feedback, we'll update the documentation to cover topics you mentioned.

For you question about the requirement of judge model: currently, only LLM-based evaluators require use of a model. At this time, only Azure OpenAI models and those compatible with OpenAI API (including OpenAI models and GitHub models) are supported.

You'll first need to add model from Model Catalog, then you'll have the choice to select this model while creating a new evaluation.


LLM-based:

  • Coherence
  • Fluency
  • Relevence
  • Similarity

Code-based:

  • BLEU
  • GLEU
  • F1 Score
  • METEOR

@qinezh qinezh removed the needs attention The issue needs contributor's attention label Dec 25, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added the needs attention The issue needs contributor's attention label Jan 3, 2025
@qinezh qinezh removed the needs attention The issue needs contributor's attention label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants