DescriberAdapter
DescriberAdapter handles data description and comparison, supporting single dataset description and multi-dataset comparative analysis.
Class Architecture
classDiagram class DescriberAdapter { +config: dict +source: dict +describer: Describer +__init__(config) +run() dict~str, DataFrame~ } class Describer { +config: dict +mode: str +method: str +create() +eval(data) dict~str, DataFrame~ } class BaseEvaluator { <<abstract>> +evaluate() dict } class DescriberDescribe { +evaluate() dict } class DescriberCompare { +evaluate() dict } DescriberAdapter ..> Describer : uses for description Describer --> BaseEvaluator : creates BaseEvaluator <|-- DescriberDescribe BaseEvaluator <|-- DescriberCompare %% Style definitions class DescriberAdapter { <<Main Class>> } style DescriberAdapter fill:#E6E6FA class Describer { <<Core Module>> } style Describer fill:#4169E1,color:#fff style BaseEvaluator fill:#9370DB,color:#fff style DescriberDescribe fill:#FFE4E1 style DescriberCompare fill:#FFE4E1 note for DescriberAdapter "1. Describe mode: Single dataset description\n2. Compare mode: Two datasets comparison\n3. Flexible source specification\n4. Auto-aligns data types before description"
Legend:
- Light purple box: DescriberAdapter main class
- Blue box: Core description modules
- Purple box: Data alignment modules
- Light pink box: Configuration classes
..>
: Dependency relationship-->
: Ownership relationship
Main Features
- Unified data description interface
- Flexible data source selection (via
source
parameter) - Two modes support: describe (single dataset), compare (dataset comparison)
- Automatic data type alignment (using Schema)
- Support for various statistical methods and JS Divergence calculation
Method Reference
__init__(config: dict)
Initialize a DescriberAdapter instance.
Parameters:
config
: dict, required- Configuration parameters dictionary
- Must include
source
key (data source) - Optional
method
key: default, describe, compare - Optional
mode
key: automatically determined by source count
run(input: dict)
Execute data description or comparison, including automatic data type alignment.
Parameters:
input
: dict, required- Input parameters dictionary
- Contains
data
dictionary (datasets) - Optional
metadata
for data type alignment
Returns:
No direct return value. Use get_result()
to retrieve results.
set_input(status)
Set input data for the describer.
Parameters:
status
: Status, required- System status object
- Extracts data based on source configuration
Returns:
dict
: Dictionary containing data required for description
get_result()
Retrieve description results.
Returns:
dict[str, pd.DataFrame]
: Dictionary of description results
Usage Examples
Single Dataset Description
from petsard.adapter import DescriberAdapter
# Describe single dataset
adapter = DescriberAdapter({
"source": "Loader", # or ["Loader"]
"method": "describe",
"describe_method": ["mean", "median", "std", "corr"]
})
# Execute description
adapter.run({})
# Get results
results = adapter.get_result()
Dataset Comparison
# Compare two datasets
adapter = DescriberAdapter({
"source": {
"base": "Splitter.train",
"target": "Synthesizer"
},
"method": "compare",
"stats_method": ["mean", "std", "jsdivergence"],
"compare_method": "pct_change"
})
# Execute comparison
adapter.run({})
# Get results
comparison_results = adapter.get_result()
Workflow
- Source Parsing: Parse source parameter to determine data sources
- Mode Determination:
- 1 source: describe mode
- 2 sources: compare mode
- Data Collection: Collect specified data from Status
- Schema Retrieval: Attempt to get metadata for data alignment
- Data Type Alignment (when Schema is available)
- Execute Description or Comparison
Source Parameter Format
Describe Mode (Single Data Source)
# String format
source: "Loader"
# List format
source: ["Synthesizer"]
Compare Mode (Two Data Sources)
# Dictionary format (recommended)
source:
base: "Splitter.train"
target: "Synthesizer"
# Backward compatibility format
source:
ori: "Splitter.train"
syn: "Synthesizer"
Data Source Syntax
- Simple format:
"ModuleName"
- Takes first available data from module - Precise format:
"ModuleName.key"
- Takes specific keyed data from module- Examples:
"Splitter.train"
,"Splitter.validation"
- Examples:
Notes
- This is an internal API, direct usage is not recommended
- Use YAML configuration files and Executor instead
- Compare mode reuses DescriberDescribe’s statistical functionality
- Parameter naming recommends using
base
/target
instead of legacyori
/syn
- Results are cached until next run() call