LangGraph
LangGraph is a framework for building reactive, multi-agent systems.
Quickstart
Confident AI provides a CallbackHandler
that can be used to trace LangGraph agent’s execution.
Install the following packages:
pip install -U deepeval langgraph langchain langchain-openai
Login with your API key and configure DeepEval’s CallbackHandler
as a callback for LangGraph:
import os
import time
from langgraph.prebuilt import create_react_agent
import deepeval
from deepeval.integrations.langchain import CallbackHandler
os.environ["OPENAI_API_KEY"] = "<your-openai-api-key>"
deepeval.login("<your-confident-api-key>")
def get_weather(city: str) -> str:
"""Returns the weather in a city"""
return f"It's always sunny in {city}!"
agent = create_react_agent(
model="openai:gpt-4o-mini",
tools=[get_weather],
prompt="You are a helpful assistant"
)
result = agent.invoke(
input = {"messages": [{"role": "user", "content": "what is the weather in sf"}]},
config = {"callbacks": [CallbackHandler()]}
)
time.sleep(5) # Wait for the trace to be published
Run your agent by executing the script:
python main.py
You can directly view the traces in the Observatory by clicking on the link in the output printed in the console.
DeepEval’s CallbackHandler
is an implementation of LangChain’s BaseCallbackHandler
.
Advanced Usage
Trace attributes
You can set custom trace attributes in your CallbackHandler
by providing it for each agent invocation, including:
Each of the attributes are optional, and works exactly the same way as you would expect for the native tracing features on Confident AI:
...
agent.invoke(
input=input,
config={
"callbacks": [
CallbackHandler(
name="Name of Trace",
tags=["Tag 1", "Tag 2"],
metadata={"Key": "Value"},
thread_id="your-thread-id",
user_id="your-user-id",
)
]
},
)
Online evals
To run online evaluation on your LangGraph agent, simply provide metric_collection
as an argument in CallbackHandler
. This will allow you run online evaluations in production.
Task completion metric is the only supported metric for the LangGraph integration. Therefore, your metric collection should only contain the task completion metric.
from deepeval.integrations.langchain import CallbackHandler
...
# Invoke your agent with the metric collection name
agent.invoke(
input = {"messages": [{"role": "user", "content": "what is the weather in sf"}]},
config = {"callbacks": [
CallbackHandler(metric_collection="<metric-collection-name-with-task-completion>")
]}
)
This will evaluate the entire trace of the agent’s execution with the task completion metric in production.
End-to-end evals
To configure your LangGraph agent for end-to-end evaluations in development, simply supply metrics
to the CallbackHandler
. Then, use the dataset
generator to invoke your LangGraph agent for each golden.
As with online evaluations, Task completion metric is the only supported metric for end-to-end evaluations.
Synchronous
from deepeval.metrics import TaskCompletionMetric
from deepeval.dataset import Golden, EvaluationDataset
...
# Create a metric
task_completion = TaskCompletionMetric(
threshold=0.7,
model="gpt-4o-mini",
include_reason=True
)
# Create goldens
goldens = [
Golden(input="What is the weather in Bogotá, Colombia?"),
Golden(input="What is the weather in Paris, France?"),
]
dataset = EvaluationDataset(goldens=goldens)
# Run evaluation for each golden
for golden in dataset.evals_iterator():
agent.invoke(
input={"messages": [{"role": "user", "content": golden.input}]},
config={"callbacks": [CallbackHandler(metrics=[task_completion])]}
)
This will automatically create a generate a test run and use the task completion metric to evaluate your entire LangGraph trace for each golden.