Evals API Quickstart
Follow this 5 min guide to get started with Evals API.
Create An Account
If you don’t already have a Confident AI account, visit app.confident-ai.com and create one. You can create an account using your email address or through Google authentication for a faster process.
If you wish to sign in using SSO, checkout our enterprise offering.
After creating your account, you’ll be guided through the onboarding process. You’ll find your Project API key at the last step of the onboarding.
Create A Metric Collection
Create a metric collection by providing a custom name for it. Note that this name must not already be taken in your project for an existing metric collection.
curl -X POST "https://api.confident-ai.com/v1/metrics" \
-H "Content-Type: application/json" \
-H "CONFIDENT_API_KEY: <PROJECT-API-KEY>" \
-d '{
"name": "Collection Name",
"metricSettings": [
{
"metric": {"name": "Answer Relevancy"},
"threshold": 0.8
}
],
"multiturn": false
}'
{"status": 200}
In this example, the "metrics"
argument specifies which metric belongs to this collection. The reason why we’re able to provide the "Answer Relevancy"
metric here is because it is one of the many default metrics offered by Confident AI. For a metric that is not already defined, you will have to provide additional fields to create a custom metric like G-Eval.
Click to see the full list of default metric names
When you run an evaluation, you will provide a metric collection, instead of individual metrics.
You can learn more about the metric collection data model here.
Run Your First Evaluation
To run an evaluation, provide the name of the metric collection you just created and the list of test cases:
Single-Turn
curl -X POST "https://api.confident-ai.com/v1/evaluate" \
-H "Content-Type: application/json" \
-H "CONFIDENT_API_KEY: <PROJECT-API-KEY>" \
-d '{
"metricCollection": "Collection Name",
"testCases": [
{
"input": "How tall is mount everest?",
"actualOutput": "I don't know, pretty tall I guess?"
}
]
}'
The /v1/evaluate
API endpoint will create a test run on Confident AI and return a structured response containing the test run ID.
{
"success": true,
"data": {"id": "TEST-RUN-ID"}
}
🎉 Congratulations! You’ve ran your first evaluation on Confident AI via the Evals API. To learn more about the /v1/evaluate
endpoint, visit the evaluation section.
For one-off evaluations, it is recommended that you use the POST /v1/metric
endpoint instead, which avoids polluting your dashboard with necessary test runs.
Verifying Evals on the Platform
Everytime when you do something using the Evals API, changes will be reflected on the platform as well. This is especially helpful for those that need more visibility into what a customers’ evals look like.
For example, here is the testing report you’ll get on the platform whenever you call the /v1/evaluate
endpoint: