Skip to Content
Confident AI is free to try . No credit card required.
Evals API
Metrics
Data Models

Data Models for Metrics

A metric is responsible for computing evaluation scores, and a metric collection represents a group of related metrics that you want to evaluate together. Metric collections and metrics are connected in-directly via metric settings, which specifies the specific threshold, strictness, etc. of each metric in different collections.

Metric: A DeepEval metric - all of DeepEval’s metrics are available through the Evals API.

Metric Settings: Configuration options for how a metric within a metric collection should be evaluated, including the thresold, strictness, and whether to include reasoning.

Metric Collection: A group of metrics that you wish to evaluate together (either for a test run or online evaluation).

💡

Confident AI’s Evals API takes care of all of the complicated relationships under the hood.

The reason why a metric collection does not relate directly to metrics is because a metric requires settings in order for it to run - which should not be duplicated across different collections. The good news is as a user you don’t have to care about the underlying logic except for the name of the metric you wish to include for your collection.

Metric

A metric is uniquely identified within a project via its name and multiTurn value. For example, you can have two metrics with the same name only if they are not both single/multi-turn. You also cannot use any of the default names that are already reserved by Confident AI.

metric.d.ts
type Metric = { name: string; criteria?: string; evaluationSteps?: string; evaluationParams?: LLMTestCaseParams[]; // only for single-turn metrics multiTurn?: boolean; // defaults to false }

As you’ll learn in the next section, although the criteria and evaluationSteps fields are option in reality you will be required to provide either field when creating a metric. This is because all metrics you create are by definition custom metrics (powered by DeepEval’s G-Eval), and therefore will always require a criteria/evaluationSteps.

For a single-turn metric, you must provide at least one of LLMTestCaseParams in evaluationParams:

export enum LLMTestCaseParams { INPUT = "input", ACTUAL_OUTPUT = "actualOutput", EXPECTED_OUTPUT = "expectedOutput", CONTEXT = "context", EXPECTED_TOOLS = "expectedTools", }

For a multi-turn metric, there’s no need to provide evaluationParams.

Metric Settings

A metric setting defines the configurations of how a metric should be ran during evaluation:

metric-setting.d.ts
type MetricSetting = { metric: Metric; activated?: boolean; // defaults to true threshold?: number; // defaults to 0.5 includeReason?: boolean; // defaults to true strictMode?: boolean; // defaults to false }

The activated field allows you to avoid running metrics that are already in your collection.

Metric Collection

A metric collection groups related metric (along with their settings) together for evaluation:

metric-collection.d.ts
type MetricCollection = { name: string; metricSettings?: MetricSetting[]; multiTurn?: boolean; // defaults to false }

Note that you can only add metric settings whose metrics has the same multiTurn value as the metric collection you’re creating.

Last updated on