Adapter
petsard.adapter
The Adapter module provides wrapper classes that standardize the execution interface for all PETsARD pipeline components. Each adapter encapsulates a specific module (Loader, Synthesizer, etc.) and provides consistent methods for configuration, execution, and result retrieval.
Design Overview
The Adapter system follows a decorator pattern, wrapping core modules with standardized interfaces for pipeline execution. This design ensures consistent behavior across all pipeline components while maintaining flexibility for module-specific functionality.
Key Principles
- Standardization: All adapters implement the same base interface for consistent pipeline execution
- Encapsulation: Each adapter wraps its corresponding module, handling configuration and execution details
- Error Handling: Comprehensive error logging and exception handling across all adapters
- Metadata Management: Consistent metadata handling using the Metadater system
Base Classes
BaseAdapter
BaseAdapter(config)
Abstract base class defining the standard interface for all adapters.
Parameters
config
(dict): Configuration parameters for the adapter
Methods
run(input)
: Execute the adapter’s functionalityset_input(status)
: Configure input data from pipeline statusget_result()
: Retrieve the adapter’s output dataget_metadata()
: Retrieve metadata associated with the output
Adapter Classes
LoaderAdapter
LoaderAdapter(config)
Wraps the Loader module for data loading operations. The adapter automatically detects and handles the benchmark://
protocol, downloading benchmark datasets through the Benchmarker when needed.
Configuration Parameters
filepath
(str): Path to the data file or benchmark URL (e.g., ‘benchmark://adult-income’)column_types
(dict, optional): DEPRECATED - Column type specifications (use schema instead)header_names
(list, optional): Custom header names for files without headersna_values
(str/list/dict, optional): DEPRECATED - Custom NA value definitions (use schema instead)schema
(Schema/dict/str, optional): Schema configuration for comprehensive data typing and metadata:- Schema object: Direct schema configuration
- dict: Dictionary that will be converted to Schema
- str: Path to YAML file containing schema configuration
Special Handling for benchmark:// Protocol
When the filepath starts with ‘benchmark://’, LoaderAdapter:
- Automatically creates a BenchmarkerConfig for the specified dataset
- Downloads the dataset using the Benchmarker module if not already cached
- Loads the data from the downloaded location
- Protocol matching is case-insensitive (‘benchmark://’ or ‘Benchmark://’ both work)
Key Methods
get_result()
: Returns loaded DataFrameget_metadata()
: Returns SchemaMetadata for the loaded data
SplitterAdapter
SplitterAdapter(config)
Wraps the Splitter module for data splitting operations. When method='custom_data'
is specified in configuration, the adapter handles loading pre-split data directly using LoaderAdapter instead of using the Splitter module.
Configuration Parameters
train_split_ratio
(float): Ratio for training data (default: 0.8)num_samples
(int): Number of split samples (default: 1)random_state
(int/float/str, optional): Random seed
For custom_data mode:
method
(str): Set to ‘custom_data’ to load pre-split datafilepath
(dict): Paths for ‘ori’ (training) and ‘control’ (validation) data- All LoaderAdapter parameters are supported for each file, including:
header_names
(list): Custom column headers for files without headersschema
(Schema/dict/str): Schema configuration for data types and metadata- Additional data loading parameters as needed
Key Methods
get_result()
: Returns dict with ’train’ and ‘validation’ DataFramesget_metadata()
: Returns updated SchemaMetadata with split information
PreprocessorAdapter
PreprocessorAdapter(config)
Wraps the Processor module for data preprocessing operations.
Configuration Parameters
method
(str): Processing method (‘default’ or ‘custom’)sequence
(list, optional): Custom processing sequenceconfig
(dict, optional): Processor-specific configuration
Key Methods
get_result()
: Returns preprocessed DataFrameget_metadata()
: Returns updated SchemaMetadata
SynthesizerAdapter
SynthesizerAdapter(config)
Wraps the Synthesizer module for synthetic data generation. When syn_method='custom_data'
is specified in configuration, the adapter handles loading external synthetic data directly using LoaderAdapter instead of using the Synthesizer module.
Configuration Parameters
method
(str): Synthesis method (e.g., ‘sdv’)model
(str): Model type (e.g., ‘GaussianCopula’)- Additional parameters specific to the chosen method
For custom_data mode:
method
(str): Set to ‘custom_data’ to load external synthetic datafilepath
(str): Path to the pre-synthesized data file- All LoaderAdapter parameters are supported, including:
header_names
(list): Custom column headers for files without headersschema
(Schema/dict/str): Schema configuration for data types and metadata- Additional data loading parameters as needed
Key Methods
get_result()
: Returns synthetic DataFrame
PostprocessorAdapter
PostprocessorAdapter(config)
Wraps the Processor module for data postprocessing operations.
Configuration Parameters
method
(str): Processing method (‘default’ or custom)
Key Methods
get_result()
: Returns postprocessed DataFrame
ConstrainerAdapter
ConstrainerAdapter(config)
Wraps the Constrainer module for applying data constraints.
Configuration Parameters
field_combinations
(list): Field combination constraintstarget_rows
(int, optional): Target number of rowssampling_ratio
(float, optional): Sampling ratio for resamplingmax_trials
(int, optional): Maximum resampling attempts
Key Methods
get_result()
: Returns constrained DataFrame
EvaluatorAdapter
EvaluatorAdapter(config)
Wraps the Evaluator module for data quality assessment.
Configuration Parameters
method
(str): Evaluation method (e.g., ‘sdmetrics’)- Additional parameters specific to the chosen method
Key Methods
get_result()
: Returns dict of evaluation results by metric type
DescriberAdapter
DescriberAdapter(config)
Wraps the Describer module for descriptive data analysis.
Configuration Parameters
method
(str): Description method- Additional parameters specific to the chosen method
Key Methods
get_result()
: Returns dict of descriptive analysis results
ReporterAdapter
ReporterAdapter(config)
Wraps the Reporter module for result export and reporting.
Configuration Parameters
method
(str): Report method (‘save_data’ or ‘save_report’)source
(str/list): Source modules for data exportgranularity
(str): Report granularity (‘global’, ‘columnwise’, ‘pairwise’)output
(str, optional): Output filename prefix
Key Methods
get_result()
: Returns generated report data
Usage Examples
Basic Adapter Usage
from petsard.adapter import LoaderAdapter
# Create and configure adapter
config = {"filepath": "data.csv"}
loader_adapter = LoaderAdapter(config)
# Set input (typically done by Executor)
input_data = loader_adapter.set_input(status)
# Execute operation
loader_adapter.run(input_data)
# Retrieve results
data = loader_adapter.get_result()
metadata = loader_adapter.get_metadata()
Pipeline Integration
from petsard.config import Config
from petsard.executor import Executor
# Adapters are typically used through Config and Executor
config_dict = {
"Loader": {"load_data": {"filepath": "data.csv"}},
"Synthesizer": {"synth": {"method": "sdv", "model": "GaussianCopula"}},
"Evaluator": {"eval": {"method": "sdmetrics"}}
}
config = Config(config_dict)
executor = Executor(config)
executor.run()
Architecture Benefits
1. Consistent Interface
- Standardized methods: All adapters implement the same base interface
- Predictable behavior: Consistent execution patterns across all modules
2. Error Handling
- Comprehensive logging: Detailed logging for debugging and monitoring
- Exception management: Consistent error handling and reporting
3. Pipeline Integration
- Status management: Seamless integration with the Status system
- Data flow: Standardized data passing between pipeline stages
4. Modularity
- Separation of concerns: Each adapter handles one specific functionality
- Extensibility: Easy to add new adapters for new modules
The Adapter system provides the foundation for PETsARD’s modular pipeline architecture, ensuring consistent and reliable execution across all data processing stages.