Evals API Quickstart

Follow this 5 min guide to get started with Evals API.

Create An Account

If you don’t already have a Confident AI account, visit app.confident-ai.com and create one. You can create an account using your email address or through Google authentication for a faster process.

💡

If you wish to sign in using SSO, checkout our enterprise offering.

After creating your account, you’ll be guided through the onboarding process. You’ll find your Project API key at the last step of the onboarding.

Create A Metric Collection

Create a metric collection by providing a custom name for it. Note that this name must not already be taken in your project for an existing metric collection.

POST request


curl -X POST "https://api.confident-ai.com/v1/metrics" \
     -H "Content-Type: application/json" \
     -H "CONFIDENT_API_KEY: <PROJECT-API-KEY>" \
     -d '{
            "name": "Collection Name",
            "metricSettings": [
                {
                    "metric": {"name": "Answer Relevancy"},
                    "threshold": 0.8
                }
            ],
            "multiturn": false
         }'

response


{"status": 200}

In this example, the "metrics" argument specifies which metric belongs to this collection. The reason why we’re able to provide the "Answer Relevancy" metric here is because it is one of the many default metrics offered by Confident AI. For a metric that is not already defined, you will have to provide additional fields to create a custom metric like G-Eval.

Click to see the full list of default metric names

When you run an evaluation, you will provide a metric collection, instead of individual metrics.

You can learn more about the metric collection data model here.

Run Your First Evaluation

To run an evaluation, provide the name of the metric collection you just created and the list of test cases:

Single-Turn

POST request


curl -X POST "https://api.confident-ai.com/v1/evaluate" \
     -H "Content-Type: application/json" \
     -H "CONFIDENT_API_KEY: <PROJECT-API-KEY>" \
     -d '{
            "metricCollection": "Collection Name",
            "testCases": [
                {
                    "input": "How tall is mount everest?",
                    "actualOutput": "I don't know, pretty tall I guess?"
                }
            ]
         }'

The /v1/evaluate API endpoint will create a test run on Confident AI and return a structured response containing the test run ID.

response


{
    "success": true, 
    "data": {"id": "TEST-RUN-ID"}
}

🎉 Congratulations! You’ve ran your first evaluation on Confident AI via the Evals API. To learn more about the /v1/evaluate endpoint, visit the evaluation section.

💡

For one-off evaluations, it is recommended that you use the POST /v1/metric endpoint instead, which avoids polluting your dashboard with necessary test runs.

Verifying Evals on the Platform

Everytime when you do something using the Evals API, changes will be reflected on the platform as well. This is especially helpful for those that need more visibility into what a customers’ evals look like.

For example, here is the testing report you’ll get on the platform whenever you call the /v1/evaluate endpoint:

Loading video...

Testing Reports on Confident AI

0 views • 0 days ago

Confident AI

100K subscribers