Skip to Content
Confident AI is free to try . No credit card required.

LangGraph

LangGraph is a framework for building reactive, multi-agent systems.

Quickstart

Confident AI provides a CallbackHandler that can be used to trace LangGraph agent’s execution.

Install the following packages:

pip install -U deepeval langgraph langchain langchain-openai

Login with your API key and configure DeepEval’s CallbackHandler as a callback for LangGraph:

main.py
import os import time from langgraph.prebuilt import create_react_agent import deepeval from deepeval.integrations.langchain import CallbackHandler os.environ["OPENAI_API_KEY"] = "<your-openai-api-key>" deepeval.login("<your-confident-api-key>") def get_weather(city: str) -> str: """Returns the weather in a city""" return f"It's always sunny in {city}!" agent = create_react_agent( model="openai:gpt-4o-mini", tools=[get_weather], prompt="You are a helpful assistant" ) result = agent.invoke( input = {"messages": [{"role": "user", "content": "what is the weather in sf"}]}, config = {"callbacks": [CallbackHandler()]} ) time.sleep(5) # Wait for the trace to be published

Run your agent by executing the script:

python main.py

You can directly view the traces in the Observatory by clicking on the link in the output printed in the console.

💡

DeepEval’s CallbackHandler is an implementation of LangChain’s BaseCallbackHandler.

Advanced Usage

Trace attributes

You can set custom trace attributes in your CallbackHandler by providing it for each agent invocation, including:

Each of the attributes are optional, and works exactly the same way as you would expect for the native tracing features on Confident AI:

main.py
... agent.invoke( input=input, config={ "callbacks": [ CallbackHandler( name="Name of Trace", tags=["Tag 1", "Tag 2"], metadata={"Key": "Value"}, thread_id="your-thread-id", user_id="your-user-id", ) ] }, )

Online evals

To run online evaluation on your LangGraph agent, simply provide metric_collection as an argument in CallbackHandler. This will allow you run online evaluations in production.

⚠️

Task completion metric is the only supported metric for the LangGraph integration. Therefore, your metric collection should only contain the task completion metric.

main.py
from deepeval.integrations.langchain import CallbackHandler ... # Invoke your agent with the metric collection name agent.invoke( input = {"messages": [{"role": "user", "content": "what is the weather in sf"}]}, config = {"callbacks": [ CallbackHandler(metric_collection="<metric-collection-name-with-task-completion>") ]} )

This will evaluate the entire trace of the agent’s execution with the task completion metric in production.

End-to-end evals

To configure your LangGraph agent for end-to-end evaluations in development, simply supply metrics to the CallbackHandler. Then, use the dataset generator to invoke your LangGraph agent for each golden.

As with online evaluations, Task completion metric is the only supported metric for end-to-end evaluations.

main.py
from deepeval.metrics import TaskCompletionMetric from deepeval.dataset import Golden, EvaluationDataset ... # Create a metric task_completion = TaskCompletionMetric( threshold=0.7, model="gpt-4o-mini", include_reason=True ) # Create goldens goldens = [ Golden(input="What is the weather in Bogotá, Colombia?"), Golden(input="What is the weather in Paris, France?"), ] dataset = EvaluationDataset(goldens=goldens) # Run evaluation for each golden for golden in dataset.evals_iterator(): agent.invoke( input={"messages": [{"role": "user", "content": golden.input}]}, config={"callbacks": [CallbackHandler(metrics=[task_completion])]} )

This will automatically create a generate a test run and use the task completion metric to evaluate your entire LangGraph trace for each golden.

Last updated on