Skip to Content
Confident AI is free to try . No credit card required.
Dataset Editor
Import Datasets

Import Datasets

There are three ways to populate your datasets with goldens if you already have a dataset:

The last method means that you can create a dataset out of test runs that you have already ran. This means you can easily reuse the test set for subsequent benchmarking of your LLM application.

Video Summary

Loading video...

Import goldens from CSV

0 views • 0 days ago
Confident AI Logo
Confident AI
100K subscribers
0

Import Golden(s) From CSV

Click on the Upload Golden button, and you’ll have the opportunity to map CSV columns to golden fields when importing.

The golden fields include:

  • input: a string representing the input to prompt your LLM application with during evaluation.
  • [Optional]actual_output: a string representing the generated actual_output of your LLM application for the corresponding input.
  • [Optional]expected_output: a string representing the ideal output based on for the corresponding input.
  • [Optional]retrieval_context a list of strings representing the retrieved text chunks of your LLM application for the corresponding input. This is only for RAG pipelines.
  • [Optional]context: a list of strings representing the ground truth as supporting context.
  • [Optional]comments: a string representing whatever comments your data annotators have for this particular golden (e.g. “Watch out for this expected output! It needs more work.”).
  • [Optional]additional_metadata: a free-form JSON which you can use to include as any additional data which you can later make use of in code during evaluation time.
💡

The full explanation of what a golden is and its fields are available here.

Once you’re done with your mapping, click Next to review your upload details, and press Save. For any problems you’re encountering, contact support@confident-ai.com.

Upload Golden(s) Via DeepEval

Push

You can also push finalized goldens to Confident AI through DeepEval. Simply create an EvaluationDataset with a list of Goldens, and pushing it to Confident AI by supplying the dataset alias.

from deepeval.dataset import EvaluationDataset, Golden # Define golden golden = Golden(input="Input of my first golden!") # Initialize dataset dataset = EvaluationDataset(goldens=[golden]) # Provide an alias when pushing a dataset dataset.push(alias="QA Dataset")

By default, the push method marks goldens as finalized, and will be pulled by default using the pull method. To push goldens as unfinalized, use the queue method instead.

You can also choose to overwrite or append to an existing dataset if an existing dataset with the same alias already exist.

... # Overwrite existing datasets dataset.push(alias="QA Dataset", overwrite=True)

deepeval will prompt you in the terminal if no value for overwrite is provided.

You can also load a dataset from CSV or JSON locally before uploading it to Confident AI through deepeval. For more information on deepeval’s EvaluationDataset, visit the official DeepEval documentation.

Queue

If your goldens are not ready for evaluation, need human annotation, as is the case when you want to have a second pair of eyes going through uploaded goldens before using them for evaluation, you can queue them to dataset instead. Queued goldens are automatically marked as unfinalized.

from deepeval.dataset import EvaluationDataset, Golden # Initialize dataset dataset = EvaluationDataset() # Provide a dataset alias when queueing goldens dataset.push(alias="QA Dataset", goldens=[Golden(input="Input of my first golden!")])

There are TWO mandatory and ONE optional parameters when using the queue method:

  • alias: A string specifying the alias of the dataset.
  • goldens: A list of Goldens and must not be empty.
  • [Optional] print_response: A boolean specifying whether to print a success message after queueing. Defaulted to True.

The queue method also creates a new dataset with the alias you provided if no such dataset exist.

💡

The term “queue” is used to signify that unfinalized goldens need editing before they are ready for evaluation.

From Existing Test Runs

Sometimes, you may already have ran an evaluation and have existing test runs on Confident AI. This doesn’t mean it is too late to create a dataset, but instead makes life easier as you can create a dataset out of an existing test run, and later reuse this dataset to standardize the benchmarking of all future iterations of your LLM application.

To create a dataset from an existing test run, go to Evaluation > Select a test run to go to an individual Test Run page > Test Cases, and click on the Save as new dataset button.

Last updated on