Create Metrics On The Cloud
Running metrics on the cloud produces exactly the same results as running metrics locally. However, you might want to run metrics on cloud when:
- You need to run online evaluations in production
- You want to enable non-technical team members to run evals directly from the UI
- You are not using Python, in this case you can delegate the evals to Confident AI’s hosted DeepEval servers instead
Otherwise, we would recommend running metrics locally instead since it gives you more control in customizations.
We highly recommend that you only continue with this section once you are happy with your selection of metrics and their performance when ran locally.
Video Summary
Creating metric collections on Confident AI
Create a Metric Collection
Creating a collection of metrics on Confident AI allows you to specify which group of metrics you wish to evaluate your LLM application on.
To create a metric collection, in your project space go to Metrics > Collections, click on the Create Collection button, and enter a collection name. Your collection name must not already be taken in your project.
Add a Metric
Click on Add metric in your newly created collection, and select the metric you wish to add to it.
Configure Metric Settings
When you add a metric to a collection, you’ll have the option to configure each individual metric’s threshold, explanability, and strictness. There are three settings you can be tuning:
- Threshold: Determines the minimum evaluation score required for your metric to pass. If a metric fails, the test case also fails. Defaults to
0.5
. - Include reason: When turned on, a metric will generate a reason alongside the evaluation score for each metric run. Defaults to
True
. - Strict mode: When turned on, a metric will pass only and if only the evaluation is perfect (ie. 1.0). Defaults to
False
.
You can change them and click the Save button, otherwise they will be defaulted to their default values.
Using Your Metric Collection
In development
There are two ways to run evals using the metric collection you’ve defined:
- Through Typescript or an HTTPS
POST
request that sends over a list of test cases with the generated outputs from your LLM app, or - On the platform directly, which will be triggered through the click of a button without the need of code
Click on the respective links to learn how.
In production
The only way to run online evaluations in production is by providing the name of your created metric collection to the observe functions you’ve defined, which you can learn how here.