Adapter

petsard.adapter

The Adapter module provides wrapper classes that standardize the execution interface for all PETsARD pipeline components. Each adapter encapsulates a specific module (Loader, Synthesizer, etc.) and provides consistent methods for configuration, execution, and result retrieval.

Design Overview

The Adapter system follows a decorator pattern, wrapping core modules with standardized interfaces for pipeline execution. This design ensures consistent behavior across all pipeline components while maintaining flexibility for module-specific functionality.

Key Principles

Standardization: All adapters implement the same base interface for consistent pipeline execution
Encapsulation: Each adapter wraps its corresponding module, handling configuration and execution details
Error Handling: Comprehensive error logging and exception handling across all adapters
Metadata Management: Consistent metadata handling using the Metadater system

Base Classes

`BaseAdapter`

BaseAdapter(config)

Abstract base class defining the standard interface for all adapters.

Parameters

config (dict): Configuration parameters for the adapter

Methods

run(input): Execute the adapter’s functionality
set_input(status): Configure input data from pipeline status
get_result(): Retrieve the adapter’s output data
get_metadata(): Retrieve metadata associated with the output

Adapter Classes

`LoaderAdapter`

LoaderAdapter(config)

Wraps the Loader module for data loading operations. The adapter automatically detects and handles the benchmark:// protocol, downloading benchmark datasets through the Benchmarker when needed.

Configuration Parameters

filepath (str): Path to the data file or benchmark URL (e.g., ‘benchmark://adult-income’)
column_types (dict, optional): DEPRECATED - Column type specifications (use schema instead)
header_names (list, optional): Custom header names for files without headers
na_values (str/list/dict, optional): DEPRECATED - Custom NA value definitions (use schema instead)
schema (Schema/dict/str, optional): Schema configuration for comprehensive data typing and metadata:
- Schema object: Direct schema configuration
- dict: Dictionary that will be converted to Schema
- str: Path to YAML file containing schema configuration

Special Handling for benchmark:// Protocol

When the filepath starts with ‘benchmark://’, LoaderAdapter:

Automatically creates a BenchmarkerConfig for the specified dataset
Downloads the dataset using the Benchmarker module if not already cached
Loads the data from the downloaded location
Protocol matching is case-insensitive (‘benchmark://’ or ‘Benchmark://’ both work)

Key Methods

get_result(): Returns loaded DataFrame
get_metadata(): Returns SchemaMetadata for the loaded data

`SplitterAdapter`

SplitterAdapter(config)

Wraps the Splitter module for data splitting operations. When method='custom_data' is specified in configuration, the adapter handles loading pre-split data directly using LoaderAdapter instead of using the Splitter module.

Configuration Parameters

train_split_ratio (float): Ratio for training data (default: 0.8)
num_samples (int): Number of split samples (default: 1)
random_state (int/float/str, optional): Random seed

For custom_data mode:

method (str): Set to ‘custom_data’ to load pre-split data
filepath (dict): Paths for ‘ori’ (training) and ‘control’ (validation) data
All LoaderAdapter parameters are supported for each file, including:
- header_names (list): Custom column headers for files without headers
- schema (Schema/dict/str): Schema configuration for data types and metadata
- Additional data loading parameters as needed

Key Methods

get_result(): Returns dict with ’train’ and ‘validation’ DataFrames
get_metadata(): Returns updated SchemaMetadata with split information

`PreprocessorAdapter`

PreprocessorAdapter(config)

Wraps the Processor module for data preprocessing operations.

Configuration Parameters

method (str): Processing method (‘default’ or ‘custom’)
sequence (list, optional): Custom processing sequence
config (dict, optional): Processor-specific configuration

Key Methods

get_result(): Returns preprocessed DataFrame
get_metadata(): Returns updated SchemaMetadata

`SynthesizerAdapter`

SynthesizerAdapter(config)

Wraps the Synthesizer module for synthetic data generation. When syn_method='custom_data' is specified in configuration, the adapter handles loading external synthetic data directly using LoaderAdapter instead of using the Synthesizer module.

Configuration Parameters

method (str): Synthesis method (e.g., ‘sdv’)
model (str): Model type (e.g., ‘GaussianCopula’)
Additional parameters specific to the chosen method

For custom_data mode:

method (str): Set to ‘custom_data’ to load external synthetic data
filepath (str): Path to the pre-synthesized data file
All LoaderAdapter parameters are supported, including:
- header_names (list): Custom column headers for files without headers
- schema (Schema/dict/str): Schema configuration for data types and metadata
- Additional data loading parameters as needed

Key Methods

get_result(): Returns synthetic DataFrame

`PostprocessorAdapter`

PostprocessorAdapter(config)

Wraps the Processor module for data postprocessing operations.

Configuration Parameters

method (str): Processing method (‘default’ or custom)

Key Methods

get_result(): Returns postprocessed DataFrame

`ConstrainerAdapter`

ConstrainerAdapter(config)

Wraps the Constrainer module for applying data constraints.

Configuration Parameters

field_combinations (list): Field combination constraints
target_rows (int, optional): Target number of rows
sampling_ratio (float, optional): Sampling ratio for resampling
max_trials (int, optional): Maximum resampling attempts

Key Methods

get_result(): Returns constrained DataFrame

`EvaluatorAdapter`

EvaluatorAdapter(config)

Wraps the Evaluator module for data quality assessment.

Configuration Parameters

method (str): Evaluation method (e.g., ‘sdmetrics’)
Additional parameters specific to the chosen method

Key Methods

get_result(): Returns dict of evaluation results by metric type

`DescriberAdapter`

DescriberAdapter(config)

Wraps the Describer module for descriptive data analysis.

Configuration Parameters

method (str): Description method
Additional parameters specific to the chosen method

Key Methods

get_result(): Returns dict of descriptive analysis results

`ReporterAdapter`

ReporterAdapter(config)

Wraps the Reporter module for result export and reporting.

Configuration Parameters

method (str): Report method (‘save_data’ or ‘save_report’)
source (str/list): Source modules for data export
granularity (str): Report granularity (‘global’, ‘columnwise’, ‘pairwise’)
output (str, optional): Output filename prefix

Key Methods

get_result(): Returns generated report data

Usage Examples

Basic Adapter Usage

from petsard.adapter import LoaderAdapter

# Create and configure adapter
config = {"filepath": "data.csv"}
loader_adapter = LoaderAdapter(config)

# Set input (typically done by Executor)
input_data = loader_adapter.set_input(status)

# Execute operation
loader_adapter.run(input_data)

# Retrieve results
data = loader_adapter.get_result()
metadata = loader_adapter.get_metadata()

Pipeline Integration

from petsard.config import Config
from petsard.executor import Executor

# Adapters are typically used through Config and Executor
config_dict = {
    "Loader": {"load_data": {"filepath": "data.csv"}},
    "Synthesizer": {"synth": {"method": "sdv", "model": "GaussianCopula"}},
    "Evaluator": {"eval": {"method": "sdmetrics"}}
}

config = Config(config_dict)
executor = Executor(config)
executor.run()

Architecture Benefits

1. Consistent Interface

Standardized methods: All adapters implement the same base interface
Predictable behavior: Consistent execution patterns across all modules

2. Error Handling

Comprehensive logging: Detailed logging for debugging and monitoring
Exception management: Consistent error handling and reporting

3. Pipeline Integration

Status management: Seamless integration with the Status system
Data flow: Standardized data passing between pipeline stages

4. Modularity

Separation of concerns: Each adapter handles one specific functionality
Extensibility: Easy to add new adapters for new modules

The Adapter system provides the foundation for PETsARD’s modular pipeline architecture, ensuring consistent and reliable execution across all data processing stages.

Utils