Create Metrics On The Cloud

Running metrics on the cloud produces exactly the same results as running metrics locally. However, you might want to run metrics on cloud when:

You need to run online evaluations in production
You want to enable non-technical team members to run evals directly from the UI
You are not using Python, in this case you can delegate the evals to Confident AI’s hosted DeepEval servers instead

Otherwise, we would recommend running metrics locally instead since it gives you more control in customizations.

💡

We highly recommend that you only continue with this section once you are happy with your selection of metrics and their performance when ran locally.

Video Summary

Loading video...

Creating metric collections on Confident AI

0 views • 0 days ago

Confident AI

100K subscribers

Create a Metric Collection

Creating a collection of metrics on Confident AI allows you to specify which group of metrics you wish to evaluate your LLM application on.

To create a metric collection, in your project space go to Metrics > Collections, click on the Create Collection button, and enter a collection name. Your collection name must not already be taken in your project.

Add a Metric

Click on Add metric in your newly created collection, and select the metric you wish to add to it.

Configure Metric Settings

When you add a metric to a collection, you’ll have the option to configure each individual metric’s threshold, explanability, and strictness. There are three settings you can be tuning:

Threshold: Determines the minimum evaluation score required for your metric to pass. If a metric fails, the test case also fails. Defaults to 0.5.
Include reason: When turned on, a metric will generate a reason alongside the evaluation score for each metric run. Defaults to True.
Strict mode: When turned on, a metric will pass only and if only the evaluation is perfect (ie. 1.0). Defaults to False.

You can change them and click the Save button, otherwise they will be defaulted to their default values.

Using Your Metric Collection

In development

There are two ways to run evals using the metric collection you’ve defined:

Through Typescript or an HTTPS POST request that sends over a list of test cases with the generated outputs from your LLM app, or
On the platform directly, which will be triggered through the click of a button without the need of code

Click on the respective links to learn how.

In production

The only way to run online evaluations in production is by providing the name of your created metric collection to the observe functions you’ve defined, which you can learn how here.