This is Just the Beginning

All the steps in the previous sections are probably the simplest evaluation workflow you can setup and very likely your LLM application is a lot more complicated than what we’ve shown.

This section will go through how to get the most out of Confident AI, and offer some points and hand you off into other parts of the documentation where you might have questions for.

Other Things You Should Do

You should also:

Create and version your prompts in Prompt Studio.
Log hyperparameters during evaluation for parameter insights.
Unit-test in CI/CD pipelines to run evals pre-deployment
Setup notifications to get notified when evaluations complete
Setup more LLM tracing to debug nested components, like retrievers and tool calls, especially for agents
Online metrics to monitor real-time performance
Talk to us to validate whether the metrics you’ve chosen are approrpiate for your use case

Important Disclaimers To Note

In this quickstart, there are important things to take notice of:

You can use any custom LLM judge as evaluation model when running evals locally
You can run evaluations using Confident AI’s LLM judges when running evals on the cloud
You should DEFINITELY put more time into choosing the right metrics and not blindly use the AnswerRelevancyMetric
You should analyze whether your dataset has good enough test coverage

Need More Help?

Come ask any questions by joining our discord community of 2,000 members , we’ll be thrilled to have you.