Data Models for Metrics
A metric is responsible for computing evaluation scores, and a metric collection represents a group of related metrics that you want to evaluate together. Metric collections and metrics are connected in-directly via metric settings, which specifies the specific threshold, strictness, etc. of each metric in different collections.
• Metric: A DeepEval metric - all of DeepEval’s metrics are available through the Evals API.
• Metric Settings: Configuration options for how a metric within a metric collection should be evaluated, including the thresold, strictness, and whether to include reasoning.
• Metric Collection: A group of metrics that you wish to evaluate together (either for a test run or online evaluation).
Confident AI’s Evals API takes care of all of the complicated relationships under the hood.
The reason why a metric collection does not relate directly to metrics is because a metric requires settings in order for it to run - which should not be duplicated across different collections. The good news is as a user you don’t have to care about the underlying logic except for the name of the metric you wish to include for your collection.
Metric
A metric is uniquely identified within a project via its name
and multiTurn
value. For example, you can have two metrics with the same name only if they are not both single/multi-turn. You also cannot use any of the default names that are already reserved by Confident AI.
type Metric = {
name: string;
criteria?: string;
evaluationSteps?: string;
evaluationParams?: LLMTestCaseParams[]; // only for single-turn metrics
multiTurn?: boolean; // defaults to false
}
As you’ll learn in the next section, although the criteria
and evaluationSteps
fields are option in reality you will be required to provide either field when creating a metric. This is because all metrics you create are by definition custom metrics (powered by DeepEval’s G-Eval ), and therefore will always require a criteria
/evaluationSteps
.
For a single-turn metric, you must provide at least one of LLMTestCaseParams
in evaluationParams
:
export enum LLMTestCaseParams {
INPUT = "input",
ACTUAL_OUTPUT = "actualOutput",
EXPECTED_OUTPUT = "expectedOutput",
CONTEXT = "context",
EXPECTED_TOOLS = "expectedTools",
}
For a multi-turn metric, there’s no need to provide evaluationParams
.
Metric Settings
A metric setting defines the configurations of how a metric should be ran during evaluation:
type MetricSetting = {
metric: Metric;
activated?: boolean; // defaults to true
threshold?: number; // defaults to 0.5
includeReason?: boolean; // defaults to true
strictMode?: boolean; // defaults to false
}
The activated
field allows you to avoid running metrics that are already in your collection.
Metric Collection
A metric collection groups related metric (along with their settings) together for evaluation:
type MetricCollection = {
name: string;
metricSettings?: MetricSetting[];
multiTurn?: boolean; // defaults to false
}
Note that you can only add metric settings whose metrics has the same multiTurn
value as the metric collection you’re creating.