Confident AI - The DeepEval Platform

Confident AI is the cloud platform for DeepEval, the most widely adopted open-source framework to evaluate LLM applications such as RAG pipielines, agentics, chatbots, or even just an LLM itself.

How does Confident AI work?

Confident AI enables organizations to iterate on LLM applications by allowing you to pick the best model and improve on different versions of your prompts. This helps you achieve the best performance for any particular LLM use case you may have.

In a nutshell, we do this by running LLM evaluations for both:

In development, as you’re building your app before deploying to production.
In production, as your app is live and real-time reporting of performance is required.

There are a dozen more features which will be shown in depth later in this documentation that compliments this evaluation workflow as LLM applications can get very complicated very quickly.

💡

Confident AI created and owns DeepEval.

When you choose Confident AI, you can rest assured that:

The platform supports all LLM use cases, no matter how complicated they are, or your evaluation needs.
Our expertise in building DeepEval can help you simplify the process of evaluation to get the most out of your eval results.

Even if you’re not using Confident AI, you can join our discord community to talk LLM evaluation anytime.

Features Overview

LLM Evaluation LLM Tracing Prompt Studio Dataset Editor Human-in-the-Loop Red Teaming

How to Navigate the Docs

This documentation is designed to teach you how to use Confident AI to evaluate your LLM apps with as little fluff as possible. The:

Concepts section covers essential terminology for understanding LLM evaluation - recommended starting point for beginners
Quickstart section provides the fundamentals needed to begin using the platform effectively
Data Handling page explains data organization, privacy controls, and residency options
Platform Features section details the complete suite of Confident AI capabilities:

💡

Each individual platform feature page includes either a code and/or video summary that you can follow along.

There will be numerous places throughout this documentation where we will reference DeepEval’s documentation, especially for running evaluations and customizing metrics. This is not a mistake, and everything you see on DeepEval’s documentation can be applied directly to Confident AI.

For those interested in using only our open-source DeepEval LLM evaluation framework without Confident AI, please visit DeepEval’s documentation .

FAQs

How is this different from DeepEval?

While DeepEval computes the metric results required for data-driven LLM app development, it does not provide the insights required for iteration.

Click here for a more comprehensive comparison.

What LLM use cases are supported?

All types of LLM use cases are supported, including summarization, Text-SQL, custom support chatbots, internal RAG QAs, conversational agents, etc.

These use cases can be of any system, including RAG pipelines, agentic workflows, conversational chatbots, or just a combination of everything (e.g. RAG chatbots, agentic RAG).

Confident AI has tailored metrics and platform capabilities for different types of LLM applications, and it is extremely important to adjust your evaluation strategy depending on your LLM use case. You can read more on the different types of use cases on this page.

What about complex agentic systems?

Complex agentic systems are definitely supported on Confident AI through LLM tracing. One thing to note though is that it is extremely important to decide carefully on what to (not) evaluate in a complex LLM agentic workflow, since trying to evaluate everything means you’re actually evaluating nothing.

The first step to evaluating agentic use cases is to setup tracing before deciding on which and where LLM evaluation metrics should be used. You can also join our discord community or contact support@confident-ai.com to share what you’re building and get recommendations on how to best run evals if in doubt.

Is Confident AI enterprise ready?

Yes, Confident AI offers SSO, data segregation for teams, user roles and permissions (with customizations available), and well as the ability to self-host in your cloud premises.

What about HIPAA compliance?

We’re proudly HIPAA compliant and are willing to sign BAAs with customers on the Premium subscription plan or above. .

Can I self-host Confident AI?

Yes, while most users are using the SaaS offering, your organization can deploy Confident AI in your cloud premises (e.g. AWS, Azure, GCP, etc.) through a dockerized manner, which includes integrations with your existing identify providers (e.g. Azure AD, Ping, Okta, etc.) of choice for authentication into Confident AI platform. In our experience, this process takes 1-2 weeks max.

What is the pricing?

No credit card upfront is required, and we offer transparent pricing with 4 different tiers which includes a generous free tier. You can view the full pricing here.

We try to make it sure that you only pay for something once you have had the chance to try it out. If you don’t think this is the case, please email support@confident-ai.com and we will make things more generous.