Threads

A “thread” on Confident AI is a group of one or more traces. This is useful for those building AI chatrooms, conversational agents, etc., where you wish to view entire conversations on Confident AI.

⚠️

It is traces that are grouped together, not spans.

Set Threads At Runtime

You can use the update_current_trace function to set the thread_id within traces, which Confident AI will use to group traces together:

Python

main.py


from deepeval.tracing import observe, update_current_trace
from openai import OpenAI
 
client = OpenAI()
 
@observe()
def llm_app(query: str):
    res = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": query}
        ]
    ).choices[0].message.content
 
    update_current_trace(thread_id="your-thread-id", input=query, output=res)
    return res
 
llm_app("Write me a poem.")

The thread_id can be any string, and the input and output is optional and simply gives you more control on what is displayed on the UI.

💡

If the I/O is not provided, it will be set to the default I/O values of the trace.

Inputs/Outputs

Note that although not strictly enforced, you should aim to make the input the user text input that is incoming to your multi-turn LLM app, and the output being the generated text output. Essentially, your trace should represent the observable system inputs and outputs of your application.

main.py


from deepeval.tracing import observe, update_current_trace
from openai import OpenAI
 
client = OpenAI()
 
@observe()
def llm_app(query: str):
    messages = {"role": "user", "content": query}
    res = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    ).choices[0].message[0]
 
    # ✅ Do this, query is the raw user input
    update_current_trace(thread_id="your-thread-id", input=query, output=res)
 
    # ❌ Don't do this, messages is not the raw user input
    # update_current_trace(thread_id="your-thread-id", input=messages, output=res)
    return res

Also note that you don’t have to set inputs/outputs for a trace that does not contain an user input or LLM output. You can simply leave it blank, and Confident AI will format the turns accordingly on the UI and for evals.

example.py


# ✅ You can set inputs and not set outputs
update_current_trace(thread_id="your-thread-id", input=query)
 
# ✅ You can set outputs and not set inputs
update_current_trace(thread_id="your-thread-id", output=res)
 
# ✅ You can omit setting both, given that there is at least one trace with input/output set for a thread
update_current_trace(thread_id="your-thread-id")

Tools Called and Retrieval Context

You can also specify any tools that were called or retrieval context involved (for a RAG system) for any LLM generated text in a conversation (which in this case is the output on a trace).

main.py


from deepeval.test_case import ToolCall
from deepeval.tracing import TurnContext
...
 
update_current_trace(
  thread_id="your-thread-id", 
  output=res,
  turn_context=TurnContext(
    retrieval_context=["RAG context goes here."],
    tools_called=[ToolCall(name="Websearch")]
  )
)

💡

Learn more about the data model of ToolCall here.

The turn context is complimentry to the output, and allows you to log any additional context involved in the generation of this turn.

Run Offline Evals on Threads

Use the evaluate_thread method to run offline evals on conversations once they’ve finished running:

main.py


from deepeval.tracing import evaluate_thread
 
evaluate_thread(thread_id="your-thread-id", metric_collection="Metric Collection")

You’ll need to create a multi-turn metric collection on Confident AI if you haven’t already to specify which metrics you invoke for a particular thread.

Under the hood, Confident AI takes all the inputs, outputs, and any turn context you’ve supplied to build a list of turns for a ConversationalTestCase. Confident AI will then use the multi-turn metrics found in your metric collection to run evals on the specified thread.

Loading image...

Conversational Test Case Architecture

Threads

Set Threads At Runtime

Python

JS/TypeScript

Inputs/Outputs

Tools Called and Retrieval Context

Run Offline Evals on Threads