Skip to Content
Confident AI is free to try . No credit card required.
Dataset Editor
Overview

Dataset Editor Overview

Confident AI’s Dataset Editor allows you to create, manage, and annotate datasets for your LLM testing and evaluation needs. You can efficiently organize your goldens and maintain high-quality datasets for your LLM applications without relying on CSV files anymore.

💡

If you’re unsure what exactly is a dataset or golden, click here.

Think of it like a Google Sheets or Notion CSV editor, but the difference is that each row is already in the structure of a Golden, and most importantly you can use it directly in code when evaluating with DeepEval.

Features Summary

Simple Walkthrough

This walkthrough will show how to use datasets on Confident AI for evaluation. Not all features will be covered, but if you follow all the steps you’ll have the datasets features setup.

Create a Dataset

Go to Datasets in your project space, and create a new dataset. Provide an alias, which is an name unique for your dataset. You must not use the same alias twice in a project space.

Create a Golden

Once you’ve created your first dataset, click on Create Golden in your Dataset Editor. The Golden Editor will pop out, provide an input for your golden, as well as any other fields that you wish, and click Save.

💡

You can, but should not, include the actual_output, retrieval_context, and tools_called parameter in goldens. Click here to learn why.

Pull Your Dataset For Evals

Once you have your dataset, its time to pull it and prepare the test cases for evaluation:

from deepeval.dataset import EvaluationDataset from deepeval.test_case import LLMTestCase from deepeval import evaluate dataset = EvaluationDataset() # Replace with your actual dataset alias dataset.pull(alias="My Evals Dataset") # Convert goldens to test cases for golden in dataset.goldens: test_case = LLMTestCase( input=golden.input, # Replace your_llm_app() with your actual LLM application function actual_output=your_llm_app(golden.input) ) dataset.test_cases.append(test_case) # Run an evaluation evaluate(test_cases=dataset.test_cases, metrics=[...])

Here we pulled the goldens in our dataset from Confident AI by providing an alias, before converting them into test cases dynamically based on the implementation of my LLM application. For more information, click here.

Lastly, you should consider enabling backups and revision history in the Revision History tab of your dataset page. This will enable recovery when required, and keep track of when and where your dataset was edited at.

Future Roadmap

  • Automated test coverage analysis
  • Generate goldens from existing ones
  • Add goldens from production traces
Last updated on