This is Just the Beginning
All the steps in the previous sections are probably the simplest evaluation workflow you can setup and very likely your LLM application is a lot more complicated than what we’ve shown.
This section will go through how to get the most out of Confident AI, and offer some points and hand you off into other parts of the documentation where you might have questions for.
Other Things You Should Do
You should also:
- Create and version your prompts in Prompt Studio.
- Log hyperparameters during evaluation for parameter insights.
- Unit-test in CI/CD pipelines to run evals pre-deployment
- Setup notifications to get notified when evaluations complete
- Setup more LLM tracing to debug nested components, like retrievers and tool calls, especially for agents
- Online metrics to monitor real-time performance
- Talk to us to validate whether the metrics you’ve chosen are approrpiate for your use case
Important Disclaimers To Note
In this quickstart, there are important things to take notice of:
- You can use any custom LLM judge as evaluation model when running evals locally
- You can run evaluations using Confident AI’s LLM judges when running evals on the cloud
- You should DEFINITELY put more time into choosing the right metrics and not blindly use the
AnswerRelevancyMetric
- You should analyze whether your dataset has good enough test coverage
Need More Help?
Come ask any questions by joining our discord community of 2,000 members , we’ll be thrilled to have you.
Last updated on