create()
Process report data and return processed data ready for output.
Syntax
def create(data: dict) -> dict | pd.DataFrame | None
Parameters
- data : dict, required
- Report data dictionary
- Key-value structure:
- Key: Experiment tuple
(module_name, experiment_name, ...)
- Value: Data to report (
pd.DataFrame
orNone
)
- Key: Experiment tuple
- Special keys:
'exist_report'
: For merging previous results (dict format)'timing_data'
: Timing data for save_timing mode
Returns
- dict | pd.DataFrame | None
- Return type depends on reporter mode:
- save_data mode: Returns processed DataFrame dictionary
{expt_name: DataFrame}
- save_report mode: Returns dictionary with granularity-specific results
{'Reporter': {...}}
- save_timing mode: Returns DataFrame with timing information
- save_validation mode: Returns validation result dictionary
- No data processed: Returns
None
or empty dictionary
- save_data mode: Returns processed DataFrame dictionary
- Return type depends on reporter mode:
Description
The create()
method is the first step in Reporter’s functional design, used to process input data without storing it in instance variables. Based on the configuration during initialization, this method performs the following operations:
- Validate input data format (via
_verify_create_input()
) - Transform data based on reporter type
- Apply filter conditions (source, eval, granularity, etc.)
- Return processed data for use by the
report()
method
Data Validation Rules
Input data undergoes strict validation:
- Experiment tuples must have an even number of elements (module names and experiment names in pairs)
- Module names must be valid PETsARD modules
- Duplicate module names are not allowed
- DataFrame values must be
pd.DataFrame
orNone
Basic Examples
save_data Mode
from petsard import Reporter
# Initialize reporter
reporter = Reporter(method='save_data', source='Synthesizer')
# Prepare data (using tuple as key)
data_dict = {
('Synthesizer', 'exp1'): synthetic_df
}
# Process data
processed = reporter.create(data_dict)
# processed contains processed DataFrame dictionary
print(type(processed)) # <class 'dict'>
print(processed.keys()) # dict_keys(['Synthesizer[exp1]'])
save_report Mode
from petsard import Reporter
# Initialize reporter (single granularity)
reporter = Reporter(method='save_report', granularity='global')
# Prepare evaluation results (note experiment name includes granularity marker)
eval_data = {
('Evaluator', 'eval1_[global]'): global_results_df
}
# Process data
processed = reporter.create(eval_data)
# Generate report
reporter.report(processed)
save_timing Mode
from petsard import Reporter
import pandas as pd
# Initialize reporter
reporter = Reporter(
method='save_timing',
time_unit='minutes',
module=['Loader', 'Synthesizer']
)
# Prepare timing data (using special key 'timing_data')
timing_data = {
'timing_data': timing_df
}
# Process data
processed = reporter.create(timing_data)
# processed is a processed DataFrame
print(type(processed)) # <class 'pandas.core.frame.DataFrame'>
Advanced Examples
Multi-Granularity Reports
from petsard import Reporter
# Initialize multi-granularity reporter
reporter = Reporter(
method='save_report',
granularity=['global', 'columnwise', 'details']
)
# Prepare data for multiple granularities
eval_data = {
('Evaluator', 'eval1_[global]'): global_results,
('Evaluator', 'eval1_[columnwise]'): columnwise_results,
('Evaluator', 'eval1_[details]'): details_results
}
# Process data
processed = reporter.create(eval_data)
# processed contains processed results for all granularities
# Structure: {'Reporter': {'global': {...}, 'columnwise': {...}, 'details': {...}}}
Merge Previous Reports
from petsard import Reporter
import pandas as pd
# Read previous reports
previous_global = pd.read_csv('previous_global.csv')
previous_columnwise = pd.read_csv('previous_columnwise.csv')
# Initialize reporter
reporter = Reporter(
method='save_report',
granularity=['global', 'columnwise']
)
# Prepare data and merge previous results
eval_data = {
('Evaluator', 'eval2_[global]'): new_global_results,
('Evaluator', 'eval2_[columnwise]'): new_columnwise_results,
'exist_report': { # Special key for merging
'global': previous_global,
'columnwise': previous_columnwise
}
}
# Process data (merges new and old results)
processed = reporter.create(eval_data)
# Generate merged report
reporter.report(processed)
Using Compact Naming Strategy
from petsard import Reporter
# Use compact naming strategy
reporter = Reporter(
method='save_data',
source='Synthesizer',
naming_strategy='compact'
)
# Prepare data
data_dict = {
('Synthesizer', 'experiment_1'): df1,
('Synthesizer', 'experiment_2'): df2
}
# Process data
processed = reporter.create(data_dict)
# Generate report (filenames will use compact format)
reporter.report(processed)
# Output: petsard_Sy.experiment_1.csv
# petsard_Sy.experiment_2.csv
Filter Specific Evaluation Experiments
from petsard import Reporter
# Only process specific evaluation experiments
reporter = Reporter(
method='save_report',
granularity='global',
eval='eval1' # Only process eval1 results
)
# Prepare data for multiple experiments
eval_data = {
('Evaluator', 'eval1_[global]'): eval1_results,
('Evaluator', 'eval2_[global]'): eval2_results # This will be filtered out
}
# Process data (only eval1 will be processed)
processed = reporter.create(eval_data)
# Generate report
reporter.report(processed)
Multi-Step Pipeline Data
from petsard import Reporter
# Save data that has gone through multiple processing steps
reporter = Reporter(method='save_data', source='Synthesizer')
# Multi-step experiment tuple
data_dict = {
('Loader', 'load1', 'Synthesizer', 'syn1'): synthetic_data
}
# Process data
processed = reporter.create(data_dict)
# processed keys will contain complete experiment path
# e.g., 'Loader[load1]_Synthesizer[syn1]'
Data Format Requirements
Experiment Tuple Format
Experiment tuples must follow this format:
# Single step: (module_name, experiment_name)
('Synthesizer', 'exp1')
('Evaluator', 'eval1_[global]')
# Multi-step: (module1, experiment1, module2, experiment2, ...)
('Loader', 'data_load', 'Synthesizer', 'syn1')
('Loader', 'load1', 'Evaluator', 'eval1_[global]')
Important Rules:
- Tuple length must be even
- Odd positions are module names
- Even positions are experiment names
- Module names cannot be duplicated
Special Keys
exist_report
Used for merging previous report results:
{
('Evaluator', 'eval1_[global]'): new_data,
'exist_report': {
'global': previous_report_df,
'columnwise': previous_columnwise_df
}
}
timing_data
save_timing mode specific:
{
'timing_data': timing_dataframe
}
Error Handling
Invalid Data Format
from petsard import Reporter
reporter = Reporter(method='save_data', source='Synthesizer')
# Error: key is not a tuple
invalid_data = {
'Synthesizer_exp1': df # Should be ('Synthesizer', 'exp1')
}
# Error will be caught and logged in _verify_create_input()
processed = reporter.create(invalid_data)
# Invalid data will be removed, may return empty dictionary
Missing Required Data
from petsard import Reporter
reporter = Reporter(method='save_report', granularity='global')
# Empty data dictionary
empty_data = {}
processed = reporter.create(empty_data)
# Returns result with warnings
# {'Reporter': {'global': {'report': None, 'warnings': '...'}}}
Granularity Mismatch
from petsard import Reporter
reporter = Reporter(method='save_report', granularity='global')
# Data contains wrong granularity marker
wrong_granularity = {
('Evaluator', 'eval1_[columnwise]'): data # Expected [global]
}
processed = reporter.create(wrong_granularity)
# Data will be ignored, returns warning
DataFrame Value Handling
None Values
The create()
method accepts None
as DataFrame values:
data_dict = {
('Evaluator', 'eval1_[global]'): None # None is allowed
}
processed = reporter.create(data_dict)
# None values are handled appropriately without errors
Data Validation
Input data undergoes strict validation:
# Valid data types
valid_data = {
('Synthesizer', 'exp1'): pd.DataFrame(), # ✓
('Evaluator', 'eval1_[global]'): None, # ✓
}
# Invalid data types (will be removed)
invalid_data = {
('Synthesizer', 'exp1'): "string", # ✗
('Evaluator', 'eval1_[global]'): [1,2,3], # ✗
}
Notes
- Functional Design:
create()
does not store data in instance variables; the return value must be passed to thereport()
method - Data Validation: Method validates input data format; invalid formats are logged and removed
- Memory Efficiency: When processing large amounts of data, batch processing is recommended to save memory
- Return Type: Return type varies based on reporter type
- Must Call report():
create()
only processes data; must callreport()
to generate output files - Granularity Matching: For save_report mode, granularity markers in data must match the granularity specified during initialization
- Naming Convention: Experiment tuple naming affects final filenames
- Module Name Validation: Only accepts valid PETsARD module names
- Stateless Operation: Each call to
create()
is independent and unaffected by previous calls