update_config()
Update the processor’s configuration settings.
Syntax
def update_config(
config: dict
) -> None
Parameters
- config : dict, required
- New processor configuration
- Required parameter
- Structure:
{processor_type: {field_name: processing_method}}
Returns
None (method modifies instance state)
Description
The update_config()
method updates the processor configuration. It can:
- Override default processing methods
- Set custom processors for specific columns
- Disable processing for specific columns (set to
None
or"none"
)
Basic Example
from petsard import Loader, Processor
# Load data
loader = Loader('data.csv', schema='schema.yaml')
data, schema = loader.load()
# Create processor
processor = Processor(metadata=schema)
# Update configuration
new_config = {
'missing': {
'age': 'missing_median',
'income': 'missing_mean'
},
'encoder': {
'gender': 'encoder_onehot',
'education': 'encoder_label'
}
}
processor.update_config(new_config)
# Use updated configuration
processor.fit(data)
processed_data = processor.transform(data)
Configuration Formats
1. Using Processor Name (String)
config = {
'missing': {
'age': 'missing_mean',
'income': 'missing_median'
},
'outlier': {
'age': 'outlier_zscore',
'income': 'outlier_iqr'
},
'encoder': {
'gender': 'encoder_onehot'
},
'scaler': {
'age': 'scaler_minmax',
'income': 'scaler_standard'
}
}
processor.update_config(config)
2. Using Processor with Parameters (Dictionary)
config = {
'missing': {
'age': {
'method': 'missing_simple',
'value': 0.0
}
},
'scaler': {
'created_at': {
'method': 'scaler_timeanchor',
'reference': 'event_time',
'unit': 'D'
}
},
'encoder': {
'doc_date': {
'method': 'encoder_date',
'input_format': '%MinguoY-%m-%d',
'date_type': 'date'
}
}
}
processor.update_config(config)
3. Disable Specific Processing
config = {
'outlier': {
'age': None, # Don't process age outliers
'income': 'outlier_iqr'
},
'scaler': {
'gender': 'none' # String "none" also disables
}
}
processor.update_config(config)
Partial Update
# Update only some columns, others keep default
processor = Processor(metadata=schema)
# Update only missing value handling for age
processor.update_config({
'missing': {
'age': 'missing_median'
}
})
# Other columns still use default configuration
Multiple Updates
processor = Processor(metadata=schema)
# First update
processor.update_config({
'missing': {'age': 'missing_median'}
})
# Second update (will override or add)
processor.update_config({
'missing': {'income': 'missing_mean'},
'encoder': {'gender': 'encoder_onehot'}
})
# Final configuration includes all updates
Setting Configuration at Initialization
# Can also provide configuration when creating processor
custom_config = {
'missing': {
'age': 'missing_median',
'income': 'missing_mean'
},
'encoder': {
'gender': 'encoder_onehot'
}
}
processor = Processor(metadata=schema, config=custom_config)
# No need to call update_config()
Verify Configuration Update
processor = Processor(metadata=schema)
# Update configuration
new_config = {
'missing': {'age': 'missing_median'},
'encoder': {'gender': 'encoder_onehot'}
}
processor.update_config(new_config)
# Verify update
config = processor.get_config(col=['age', 'gender'])
print("age missing:", type(config['missing']['age']).__name__)
print("gender encoder:", type(config['encoder']['gender']).__name__)
Available Processor Names
Missing Value Processors
missing_mean
: Fill with meanmissing_median
: Fill with medianmissing_mode
: Fill with modemissing_simple
: Fill with custom value (requiresvalue
parameter)missing_drop
: Drop rows with missing values
Outlier Processors
outlier_zscore
: Z-Score methodoutlier_iqr
: Interquartile Range methodoutlier_isolationforest
: Isolation Forest (global)outlier_lof
: Local Outlier Factor (global)
Encoders
encoder_uniform
: Uniform encodingencoder_label
: Label encodingencoder_onehot
: One-Hot encodingencoder_date
: Date format conversion (requires parameters)
Scalers
scaler_standard
: Standardizationscaler_minmax
: Min-Max scalingscaler_zerocenter
: Zero centeringscaler_log
: Logarithmic transformationscaler_log1p
: log(1+x) transformationscaler_timeanchor
: Time anchor scaling (requires parameters)
Discretization
discretizing_kbins
: K-bins discretization (requires parameters)
Notes
- Updating configuration overwrites default settings for that column
- Must call this method before
fit()
- If updating after
fit()
, need to retrain - Invalid processor names will raise
ConfigError
- Setting to
None
or"none"
disables that processing - Processors with parameters must use dictionary format
- Updates are cumulative, won’t clear other columns’ configuration