This is Just the Beginning
All the steps in the previous sections are probably the simplest evaluation workflow you can setup and very likely your LLM application is a lot more complicated than what we’ve shown.
This section will go through how to get the most out of Confident AI, and offer some points and hand you off into other parts of the documentation where you might have questions for.
Important Disclaimers To Note
In this quickstart, there are important things to take notice of:
- You can use any custom LLM judge as evaluation model when running evals locally
- You can run evaluations using Confident AI’s LLM judges when running evals on the cloud
- You should include more than the
"System Prompt"
in Prompt Studio, doing so allows Confident AI to give you better insights - You should DEFINITELY put more time into choosing the right metrics and not blindly use the
AnswerRelevancyMetric
- You should analyze whether your dataset has good enough test coverage
Other Things You Should Do
You should also:
- Unit-test in CI/CD pipelines to run evals pre-deployment
- Setup notifications to get notified when evaluations complete
- LLM Tracing to debug nested components, like retrievers and tool calls, especially for agents
- Online metrics to monitor real-time performance
- Talk to us to validate whether the metrics you’ve chosen are approrpiate for your use case
This is what tracing looks like on Confident AI, which allows you to debug your LLM app and not just rely on surface level insights:
Loading video...
LLM Tracing Quick Glance | Confident AI
0 views • 0 days ago
Confident AI
100K subscribers
0
Need More Help?
Come ask any questions by joining our discord community of 2,000 members , we’ll be thrilled to have you.
Last updated on