Custom Data

To use pre-split data from external sources, use the custom_data method.

The custom_data method allows you to provide pre-split training and validation datasets.

Usage Examples

Click the below button to run this example in Colab:

Open In Colab

Splitter:
  external_split:
    method: custom_data
    filepath:
      ori: benchmark://adult-income_ori         # Training set
      control: benchmark://adult-income_control # Validation set
    schema:
      ori: benchmark://adult-income_schema
      control: benchmark://adult-income_schema

This example demonstrates custom_data usage with other modules:

  • Splitter: Uses custom_data to load pre-split datasets
  • Other modules: Loader, Synthesizer, and Evaluator are used together for complete evaluation workflow
ℹ️
The filepath parameter supports all Loader formats, including benchmark:// protocol and regular file paths.