LLM Observability Overview

Confident AI offers an Observatory for teams to trace and monitor LLM applications. Think Datadog for LLM apps. The observatory allows you to:

Detect and debug issues in your LLM applications in real-time
Search and analyze historical generation data with powerful filters
Collect human feedback on model responses
Run evaluations to measure and improve performance
Track costs and latency to optimize resource usage

Features Summary

LLM Tracing Online Metrics Latency and Cost Tracking Filters and Searches Performance Alerting

Simple Walkthrough

This walkthrough will show how to trace LLM applications on Confident AI. Not all features will be covered, but if you follow all the steps you’ll have LLM tracing setup.

Setup Tracing

Here’s a simple tracing implementation that monitors an OpenAI generation using the @observe decorator:


from deepeval.tracing import observe, update_current_span_attributes
 
 
@observe(type="llm", model="gpt-4")
def generate_response(prompt: str) -> str:
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    output = response.choices[0].message.content
 
    update_current_span_attributes(
      LlmAttributes(input=prompt, output=output)
    )
    return output

Enable Online Metrics (recommended)

This step is optional but, you can enable online metrics to evaluate all traces logged on Confident AI. First, enable online metrics on Confident AI by goin got Metrics > Collections, and create a new metric collection. Then add the “Answer Relevancy” metric to this newly created collection, and make sure it is activated. Finally, select the metric collection, and click Enable for monitoring.

Now in your code, add these lines to automatically run online evals in production:


from openai import OpenAI
from deepeval.tracing import (
    observe,
    update_current_span_attributes,
    update_current_span_test_case_parameters,
    LlmAttributes,
)
 
@observe(type="llm", model="gpt-4", metrics="Answer Relevancy")
def generate_response(prompt: str) -> str:
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4", messages=[{"role": "user", "content": prompt}]
    )
    output = response.choices[0].message.content
 
    update_current_span_attributes(LlmAttributes(input=prompt, output=output))
    update_current_span_test_case_parameters(input=prompt, actual_output=output)
    return output

Now whenever you run your generate_response() function, all traces will be logged and evaluated on Confident AI.

Trace a Generation

Once you have everything setup, all you have to do is run the generate_response() function:


...
 
generate_response("Hi!")

View Traces

Here’s a quick overview of the Observatory page on Confident AI:

Loading video...

LLM Tracing for an Agentic RAG App

0 views • 0 days ago

Confident AI

100K subscribers

Future Roadmap

Span level metrics defining
Hyperparameter logging
Custom properties logging
Latency and cost tracking display
Self-served alerting notification configurations
More integrations