Run Component-Level Evals

You can run evaluations on a component-level by creating test cases at evaluation time. This requires you to setup tracing, which also brings additional benefits such as component-level debugging and visualization of latencies, model costs, etc. in testing reports on the UI.

💡

Setting up tracing also automatically grants you access to all of Confident AI’s observability features for production monitoring.

Running component-level evals enables you to:

Generate organization-wide sharable testing reports
A|B experimentation for regression testing
Hyperparameter experimentation
Data-driven decision making

If you would rather treat your LLM app as a black-box, you can run end-to-end evaluations instead.

Code Summary

Evals In CI/CD

test_llm_app.py


from openai import OpenAI
import pytest
from deepeval.dataset import EvaluationDataset
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.tracing import observe, update_current_span
from deepeval import assert_test
 
 
@observe(metrics=[AnswerRelevancyMetric()])
def llm_app(query: str) -> str:
    res = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": query}
        ]
    ).choices[0].message["content"]
 
    update_current_span(test_case=LLMTestCase(input=query, actual_output=res))
    return res
 
dataset = EvaluationDataset()
dataset.pull(alias="your-dataset-alias")
 
@pytest.mark.parametrize("golden", dataset.goldens)
def test_llm_app(golden: Golden)
    assert_test(golden=golden, observed_callback=llm_app)

Execute file using DeepEval’s pytest wrapper:


deepeval test run test_llm_app.py

Define Metrics

You can define your metrics by importing them from DeepEval.


from deepeval.metrics import AnswerRelevancyMetric
 
metric = AnswerRelevancyMetric()

Setup Tracing and Create Test Cases

Setup tracing for your LLM application with the @observe decorator - which will allow you to create test cases at runtime for each individual component.


from deepeval.tracing import observe, update_current_span
...
 
# Decorate your LLM app and provide metrics
@observe(metrics=[metric])
def llm_app(query: str) -> str:
    res = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": query}
        ]
    ).choices[0].message["content"]
 
    # This creates test cases for your component at evaluation time
    update_current_span(test_case=LLMTestCase(input=query, actual_output=res))
    return res

Tracing will also allow you to visualize and debug the latencies, model cost, etc. of each individual component in your testing report. You can read more about all of Confident AI’s tracing features and capabilities in the tracing section.

Run an Evaluation

At evaluation time, DeepEval will pass in the inputs of each individual golden in your dataset to invoke your LLM app. Be patient when waiting for evaluation results, as evaluation duration is often limited by the execution time of your LLM app.

In CI/CD

Unit-test your LLM app in CI/CD using DeepEval’s pytest integration via the assert_test function:

test_llm_app.py


import pytest
from deepeval.dataset import EvaluationDataset, Golden
from deepeval import assert_test
...
 
dataset = EvaluationDataset()
dataset.pull(alias="your-dataset-alias")
 
# This loops through your goldens
@pytest.mark.parametrize("golden", dataset.goldens)
def test_llm_app(golden: Golden)
    assert_test(golden=golden, observed_callback=llm_app)

Execute deepeval test run in the CLI to test it out:


deepeval test run test_llm_app.py

Congratulations 🎉! Your test run should now be available on Confident AI as a testing report ✅. Click around the testing report on Confident AI, and take tyour time to get familarized with it.

⚠️

NOTE: Don’t forget to add this command in your .yaml files to automate it in CI/CD pipelines such as GitHub actions!

Setup Notifications (recommended)

You can also setup your project to receive notifications through either email, slack, discord, or teams each time an evaluation is completed, both locally or on the cloud, by configuring your project integrations in Project Settings > Integrations.

To learn how, visit the project integrations page.