Setup Tracing for LLM Evals and Observability

The last section showed end-to-end evaluation, and in this section we’ll show how to also evaluate individual components within your LLM app through tracing.

What is and why tracing on Confident AI?

Tracing is the process of tracking how different components of your LLM app interacts with one another. For example, retrievers (embedding models) interacting with generators (LLMs), or LLMs interacting with different tool calls.

When you do tracing on Confident AI, you immediately get access to:

40+ DeepEval metrics that can be applied to anywhere in your LLM app
LLM observability and production monitoring, with all the important tracing features you’d need such as metadata logging, PII masking, conversation tracking, setting tags, etc.

Confident AI is also feature complete for tracing, click here for more detail.

Decorate Your LLM App

Assuming this is your_llm_app, you will trace it using the @observe decorator:


from openai import OpenAI
from deepeval.tracing import observe
 
@observe()
def your_llm_app(query: str) -> str:
    return openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": query}
        ]
    ).choices[0].message["content"]
 
@observe()
def redundant_llm_wrapper(query: str) -> str:
    return your_llm_app(query)
 
# Call app to send trace to Confident AI
redundant_llm_wrapper("Write me a poem.")

That’s it! Sanity check yourself by running this file and checking Confident AI’s Observatory > Traces to see your first trace.

See what an example trace looks like on Confident AI

Loading video...

Tracing (different from the code example!)

0 views • 0 days ago

Confident AI

100K subscribers

The redundant_llm_wrapper is simply there to show you this works perfectly fine even if your_llm_app is a nested component.

The @observe decorate tells Confident AI that your_llm_app is a component by itself. A component is actually known as a span, and many spans make up a trace.

Technically, you can also think of end-to-end evaluations as running evals on a trace.

Define Metrics and Create Test Case

The last step is to define your metrics and create test cases at runtime. These metrics and test cases work exactly the same way as it does in end-to-end evaluation:


from openai import OpenAI
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
from deepeval.tracing import observe, update_current_span
 
@observe(metrics=[AnswerRelevancyMetric()])
def your_llm_app(query: str) -> str:
    res = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": query}
        ]
    ).choices[0].message["content"]
 
    update_current_span(test_case=LLMTestCase(input=query, actual_output=res))
    return res
 
@observe()
def redundant_llm_wrapper(query: str) -> str:
    return your_llm_app(query)

💡

This also allows you to create test cases at runtime without rewriting your codebase.

Finally run an evaluation:


from deepeval.dataset import EvaluationDataset
...
 
dataset = EvaluationDataset()
dataset.pull(alias="your-dataset-alias")
 
evaluate(goldens=goldens.dataset, observed_callback=redundant_llm_wrapper)

Congratulations 🎉! Your component-level test run is now available on Confident AI, and you should be able to see the trace associated with it.

Tracing for Production Monitoring

When you have tracing setup, all invocations of your LLM app outside of an evaluation session is automatically traced on Confident AI’s dashboard for debugging.

Loading video...

LLM Tracing Quick Glance | Confident AI

0 views • 0 days ago

Confident AI

100K subscribers

Confident AI supports a ton of tracing features for the best observability experience. You can configure trace enviornments to "staging" or "production" for example, or even mask PII for Confident AI to not store sensitive information.

The full docs for tracing can be found here.