Introduction to LLM Evaluation on Confident AI
Confident AI’s evaluation features are second-to-none and 100% integrated and 100% integrated with DeepEval. All the features you’ve seen up to this point in the documentation leads up to the LLM evaluation suite.
What does Confident AI’s LLM Evaluation offer?
We offer evals that can evaluate your LLM app at the:
- End-to-end level
- Component-level
Including support for:
- Single-turn
- Multi-turn, and
- Multimodal (text and images)
And offer 50+ metrics that are:
- Default, battle-tested, open-source, and plug-and-play
- Custom, research-backed, and easy to create in natrual language
- For any use case, LLM system architecture, or framework
- Powered by DeepEval
You can also evaluations in both CI/CD environments, as a separate Python script, on traces in production, or through APIs.
You can also run evals on an individual component-level, or end-to-end should you wish to treat your LLM app as a black-box.
Testing Reports
Testing reports are automatically generated each time you run an evaluation using DeepEval or on the UI.
Single-turn
Single-turn evals test single, atomic LLM interactions at either an end-to-end, or component-level via LLM tracing.
Single-Turn Testing Reports
Get Started
Get LLM evals for your LLM app, powered by DeepEval.
Advanced Features
Confident AI LLM evals goes beyond single-turn end-to-end evaluations:
FAQs
I already use DeepEval, how long will setup take?
If you’re already using DeepEval, setting up Confident AI will take less than one minute. All you have to do is create an account and login via the CLI using the API key available to you on the platform.
deepeval login --confident-api-key YOUR_API_KEY
What about online evals in production?
You can run online evals to get performance-over-time graphs by setting up LLM tracing and enabling evaluations in code.
Can I run evals on the cloud instead of locally?
Certaintly, checkout this section to learn how to start creating metrics on the cloud without needing to code.