Import Datasets
There are three ways to populate your datasets with goldens if you already have a dataset:
- From CSV files in the dataset editor
- From DeepEval through APIs
- From existing test cases in a test run
The last method means that you can create a dataset out of test runs that you have already ran. This means you can easily reuse the test set for subsequent benchmarking of your LLM application.
Video Summary
Import goldens from CSV
Import Golden(s) From CSV
Click on the Upload Golden button, and you’ll have the opportunity to map CSV columns to golden fields when importing.
The golden fields include:
input
: a string representing theinput
to prompt your LLM application with during evaluation.- [Optional]
actual_output
: a string representing the generatedactual_output
of your LLM application for the correspondinginput
. - [Optional]
expected_output
: a string representing the ideal output based on for the correspondinginput
. - [Optional]
retrieval_context
a list of strings representing the retrieved text chunks of your LLM application for the correspondinginput
. This is only for RAG pipelines. - [Optional]
context
: a list of strings representing the ground truth as supporting context. - [Optional]
comments
: a string representing whatever comments your data annotators have for this particular golden (e.g. “Watch out for this expected output! It needs more work.”). - [Optional]
additional_metadata
: a free-form JSON which you can use to include as any additional data which you can later make use of in code during evaluation time.
The full explanation of what a golden is and its fields are available here.
Once you’re done with your mapping, click Next to review your upload details, and press Save. For any problems you’re encountering, contact support@confident-ai.com.
Upload Golden(s) Via DeepEval
Push
You can also push finalized goldens to Confident AI through DeepEval. Simply create an EvaluationDataset
with a list of Golden
s, and pushing it to Confident AI by supplying the dataset alias
.
from deepeval.dataset import EvaluationDataset, Golden
# Define golden
golden = Golden(input="Input of my first golden!")
# Initialize dataset
dataset = EvaluationDataset(goldens=[golden])
# Provide an alias when pushing a dataset
dataset.push(alias="QA Dataset")
By default, the push
method marks goldens as finalized, and will be pulled by default using the pull
method. To push goldens as unfinalized, use the queue method instead.
You can also choose to overwrite or append to an existing dataset if an existing dataset with the same alias already exist.
...
# Overwrite existing datasets
dataset.push(alias="QA Dataset", overwrite=True)
deepeval
will prompt you in the terminal if no value for overwrite
is
provided.
You can also load a dataset from CSV or JSON locally before uploading it to Confident AI through deepeval
. For more information on deepeval
’s EvaluationDataset
, visit the official DeepEval documentation.
Queue
If your goldens are not ready for evaluation, need human annotation, as is the case when you want to have a second pair of eyes going through uploaded goldens before using them for evaluation, you can queue them to dataset instead. Queued goldens are automatically marked as unfinalized.
from deepeval.dataset import EvaluationDataset, Golden
# Initialize dataset
dataset = EvaluationDataset()
# Provide a dataset alias when queueing goldens
dataset.push(alias="QA Dataset", goldens=[Golden(input="Input of my first golden!")])
There are TWO mandatory and ONE optional parameters when using the queue
method:
alias
: A string specifying the alias of the dataset.goldens
: A list ofGolden
s and must not be empty.- [Optional]
print_response
: A boolean specifying whether to print a success message after queueing. Defaulted toTrue
.
The queue
method also creates a new dataset with the alias
you provided if no such dataset exist.
The term “queue” is used to signify that unfinalized goldens need editing before they are ready for evaluation.
From Existing Test Runs
Sometimes, you may already have ran an evaluation and have existing test runs on Confident AI. This doesn’t mean it is too late to create a dataset, but instead makes life easier as you can create a dataset out of an existing test run, and later reuse this dataset to standardize the benchmarking of all future iterations of your LLM application.
To create a dataset from an existing test run, go to Evaluation > Select a test run to go to an individual Test Run page > Test Cases, and click on the Save as new dataset button.