LLM Observability Overview
Confident AI offers an Observatory for teams to trace and monitor LLM applications. Think Datadog for LLM apps. The observatory allows you to:
- Detect and debug issues in your LLM applications in real-time
- Search and analyze historical generation data with powerful filters
- Collect human feedback on model responses
- Run evaluations to measure and improve performance
- Track costs and latency to optimize resource usage
Features Summary
Simple Walkthrough
This walkthrough will show how to trace LLM applications on Confident AI. Not all features will be covered, but if you follow all the steps you’ll have LLM tracing setup.
Setup Tracing
Here’s a simple tracing implementation that monitors an OpenAI generation using the @observe
decorator:
from deepeval.tracing import observe, update_current_span_attributes
@observe(type="llm", model="gpt-4")
def generate_response(prompt: str) -> str:
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
output = response.choices[0].message.content
update_current_span_attributes(
LlmAttributes(input=prompt, output=output)
)
return output
Enable Online Metrics (recommended)
This step is optional but, you can enable online metrics to evaluate all traces logged on Confident AI. First, enable online metrics on Confident AI by goin got Metrics > Collections, and create a new metric collection. Then add the “Answer Relevancy” metric to this newly created collection, and make sure it is activated. Finally, select the metric collection, and click Enable for monitoring.
Now in your code, add these lines to automatically run online evals in production:
from openai import OpenAI
from deepeval.tracing import (
observe,
update_current_span_attributes,
update_current_span_test_case_parameters,
LlmAttributes,
)
@observe(type="llm", model="gpt-4", metrics="Answer Relevancy")
def generate_response(prompt: str) -> str:
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4", messages=[{"role": "user", "content": prompt}]
)
output = response.choices[0].message.content
update_current_span_attributes(LlmAttributes(input=prompt, output=output))
update_current_span_test_case_parameters(input=prompt, actual_output=output)
return output
Now whenever you run your generate_response()
function, all traces will be logged and evaluated on Confident AI.
Trace a Generation
Once you have everything setup, all you have to do is run the generate_response()
function:
...
generate_response("Hi!")
View Traces
Here’s a quick overview of the Observatory page on Confident AI:
LLM Tracing for an Agentic RAG App
Future Roadmap
- Span level metrics defining
- Hyperparameter logging
- Custom properties logging
- Latency and cost tracking display
- Self-served alerting notification configurations
- More integrations