Dataset Editor Overview

Confident AI’s Dataset Editor allows you to create, manage, and annotate datasets for your LLM testing and evaluation needs. You can efficiently organize your goldens and maintain high-quality datasets for your LLM applications without relying on CSV files anymore.

💡

If you’re unsure what exactly is a dataset or golden, click here.

Think of it like a Google Sheets or Notion CSV editor, but the difference is that each row is already in the structure of a Golden, and most importantly you can use it directly in code when evaluating with DeepEval.

Features Summary

Annotate Datasets Import from CSV Upload via DeepEval Custom Columns Dataset Backups Analyze Test Coverage

Simple Walkthrough

This walkthrough will show how to use datasets on Confident AI for evaluation. Not all features will be covered, but if you follow all the steps you’ll have the datasets features setup.

Create a Dataset

Go to Datasets in your project space, and create a new dataset. Provide an alias, which is an name unique for your dataset. You must not use the same alias twice in a project space.

Create a Golden

Once you’ve created your first dataset, click on Create Golden in your Dataset Editor. The Golden Editor will pop out, provide an input for your golden, as well as any other fields that you wish, and click Save.

💡

You can, but should not, include the actual_output, retrieval_context, and tools_called parameter in goldens. Click here to learn why.

Pull Your Dataset For Evals

Once you have your dataset, its time to pull it and prepare the test cases for evaluation:


from deepeval.dataset import EvaluationDataset
from deepeval.test_case import LLMTestCase
from deepeval import evaluate
 
dataset = EvaluationDataset()
# Replace with your actual dataset alias
dataset.pull(alias="My Evals Dataset")
 
# Convert goldens to test cases
for golden in dataset.goldens:
    test_case = LLMTestCase(
        input=golden.input,
        # Replace your_llm_app() with your actual LLM application function
        actual_output=your_llm_app(golden.input)
    )
    dataset.test_cases.append(test_case)
 
# Run an evaluation
evaluate(test_cases=dataset.test_cases, metrics=[...])

Here we pulled the goldens in our dataset from Confident AI by providing an alias, before converting them into test cases dynamically based on the implementation of my LLM application. For more information, click here.

Enable Backups (recommended)

Lastly, you should consider enabling backups and revision history in the Revision History tab of your dataset page. This will enable recovery when required, and keep track of when and where your dataset was edited at.

Future Roadmap

Automated test coverage analysis
Generate goldens from existing ones
Add goldens from production traces