Introduction to LLM Tracing on Confident AI
Confident AI offers an Observatory for teams to trace and monitor LLM applications. Think Datadog for LLM apps.
Tracing on Confident AI contains all the features you would expect but also enables DeepEval metrics to run on individual components.
Confident AI tracing is designed to be completely non-intrusive to your application. It:
- Requires no rewrite of your existing code - just add the
@observe
decorator - Runs asynchronously in the background with zero impact on latency
- Fails silently if there are any issues, ensuring your app keeps running
- Works with any function signature - you can set input/output at runtime
- Allows you to run online evaluations for spans and traces.
Feature Highlights
LLM Tracing for an Agentic RAG App
Quickstart
Two important terminologies to be aware of:
- Trace: The overall process of tracking and visualizing the execution flow of your LLM application
- Span: Individual units of work within your application (e.g., LLM calls, tool executions, retrievals)
Each observed function CREATES A SPAN, and MANY SPANS MAKE UP A TRACE. When you have tracing setup, you can run evaluations on both the trace and span level.
Installation
Install DeepEval and setup your tracing enviornment:
Python
pip install deepeval
Don’t forget to login using your API key on Confident AI in the CLI:
deepeval login --confident-api-key YOUR_API_KEY
You can also login by using the login_with_confident_api_key
method without going through the CLI if you’re in a notebook environment:
import deepeval
deepeval.login_with_confident_api_key("YOUR_API_KEY")
Setup tracing
Python
The @observe
decorator is the primary way to instrument your LLM application for tracing.
from openai import OpenAI
from deepeval.tracing import observe
@observe()
def llm_app(query: str) -> str:
return openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "user", "content": query}
]
).choices[0].message["content"]
# Call app to send trace to Confident AI
llm_app("Write me a poem.")
If your llm_app
has more than one function, simply decorate those functions with @observe
too.
✅ You can now go to the Observatory to see your traces there.
In a later section, you’ll learn how to create spans that are LLM specific, which would allow you to log things such as token cost and model name automatically.
Enable Online Evals
You can enable online evaluation to run both end-to-end (metrics on the trace) and component-level (metrics on the span) evaluations on Confident AI.
First, enable online metrics on Confident AI by creating a metric collection. Now in your code, add these lines to automatically run online evals in production:
Python
from openai import OpenAI
from deepeval.tracing import observe, update_current_span
@observe(metrics=["Answer Relevancy])
def llm_app(query: str) -> str:
res = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "user", "content": query}
]
).choices[0].message["content"]
update_current_span(test_case=LLMTestCase(input=quer, actual_output=res))
return res
# Call app to send trace to Confident AI
llm_app("Write me a poem.")
Congratulations 🎉! Now whenever you run your LLM app, all traces will be logged and evaluated on Confident AI.
More information on online evaluations can be found here.
Why Tracing On Confident AI?
Confident AI is the only platform where you can leverage DeepEval’s evaluations on traces. Confident AI is also feature complete for your LLM tracing expectations, with features including:
You can run evaluations on individual spans (component-level), traces (end-to-end), or threads (conversation evals), with access to an unlimited eval use cases through DeepEval. Whatever you’re using for evals in development, bring it to prod.