Introduction to LLM Evaluation on Confident AI

Confident AI’s evaluation features are second-to-none and 100% integrated and 100% integrated with DeepEval. All the features you’ve seen up to this point in the documentation leads up to the LLM evaluation suite.

What does Confident AI’s LLM Evaluation offer?

We offer evals that can evaluate your LLM app at the:

End-to-end level
Component-level

Including support for:

Single-turn
Multi-turn, and
Multimodal (text and images)

And offer 50+ metrics that are:

Default, battle-tested, open-source, and plug-and-play
Custom, research-backed, and easy to create in natrual language
For any use case, LLM system architecture, or framework
Powered by DeepEval

You can also evaluations in both CI/CD environments, as a separate Python script, on traces in production, or through APIs.

💡

You can also run evals on an individual component-level, or end-to-end should you wish to treat your LLM app as a black-box.

Testing Reports

Testing reports are automatically generated each time you run an evaluation using DeepEval or on the UI.

Single-turn

Single-turn evals test single, atomic LLM interactions at either an end-to-end, or component-level via LLM tracing.

Loading video...

Single-Turn Testing Reports

0 views • 0 days ago

Confident AI

100K subscribers

Get Started

Get LLM evals for your LLM app, powered by DeepEval.

5 Min Quickstart End-to-end Evals Metrics

Advanced Features

Confident AI LLM evals goes beyond single-turn end-to-end evaluations:

A|B Regression Testing Component Evals Multi-turn Evals Parameter Comparison Testing Reports Unit-Testing in CI/CD

FAQs

I already use DeepEval, how long will setup take?

If you’re already using DeepEval, setting up Confident AI will take less than one minute. All you have to do is create an account and login via the CLI using the API key available to you on the platform.


deepeval login --confident-api-key YOUR_API_KEY

What about online evals in production?

You can run online evals to get performance-over-time graphs by setting up LLM tracing and enabling evaluations in code.

Can I run evals on the cloud instead of locally?

Certaintly, checkout this section to learn how to start creating metrics on the cloud without needing to code.

Introduction to LLM Evaluation on Confident AI

What does Confident AI’s LLM Evaluation offer?

Testing Reports

Single-turn

Single-Turn Testing Reports

Multi-turn

Multi-Turn Testing Reports

A|B Regression Testing

AB Regression Testing Suite

Custom Metrics

Custom Metrics on Platform

Find best prompts, models, etc.

Finding Best Prompts and Models