LoaderAdapter
LoaderAdapter handles data loading and automatically processes benchmark:// protocol for benchmark dataset and schema file downloads.
Class Architecture
classDiagram
class LoaderAdapter {
+config: dict
+loader: Loader
+benchmarker: Benchmarker
+__init__(config)
+run() tuple[DataFrame, Schema]
-_handle_benchmark_protocol()
-_init_loader()
}
class Loader {
+filepath: str
+schema: Schema
+load() tuple[DataFrame, Schema]
}
class Benchmarker {
+config: BenchmarkerConfig
+download()
+get_filepath()
}
class BenchmarkerConfig {
+benchmark_name: str
+filepath_raw: str
}
class BenchmarkerRequests {
+download()
}
LoaderAdapter ..> Loader : uses for data loading
LoaderAdapter ..> Benchmarker : uses for benchmark protocol
Benchmarker --> BenchmarkerConfig : has
Benchmarker ..> BenchmarkerRequests : creates
%% Style definitions
class LoaderAdapter {
<<Main Class>>
}
style LoaderAdapter fill:#E6E6FA
class Loader {
<<Core Module>>
}
style Loader fill:#4169E1,color:#fff
class Benchmarker {
<<Benchmark Handler>>
}
style Benchmarker fill:#9370DB,color:#fff
style BenchmarkerConfig fill:#FFE4E1
style BenchmarkerRequests fill:#FFE4E1
note for LoaderAdapter "1. Detects benchmark:// protocol\n2. Uses Benchmarker to download\n3. Uses Loader with local path"Legend:
- Light purple box: LoaderAdapter main class
- Blue box: Core loading module
- Purple box: Benchmark dataset handler module
- Light pink box: Configuration classes
..>: Dependency relationship-->: Has relationship
Main Features
- Unified interface for data loading
- Automatic detection and handling of
benchmark://protocol for both data and schema - Integration of Loader and Benchmarker functionality
- Returns data and Schema metadata
- Supports CSV data files and YAML schema files
Method Reference
__init__(config: dict)
Initializes LoaderAdapter instance with automatic benchmark:// protocol handling.
Parameters:
config: dict, required- Configuration parameter dictionary
- Must contain
filepathkey - Supports
benchmark://protocol - Optional parameters include:
schema: Schema file pathnrows: Load only specified number of rows (for quick testing)delimiter,encoding,header, etc. (pandas read parameters)
run(input: dict)
Executes data loading, including automatic benchmark dataset download.
Parameters:
input: dict, required- Input parameter dictionary
- LoaderAdapter typically receives empty dictionary
{}
Returns:
No direct return value. Use get_result() and get_metadata() to get results.
get_result()
Gets the loaded data.
Returns:
pd.DataFrame: Loaded data
get_metadata()
Gets the data’s Schema metadata.
Returns:
Schema: Data metadata
Usage Example
from petsard.adapter import LoaderAdapter
# Regular file loading
adapter = LoaderAdapter({
"filepath": "data/users.csv",
"schema": "schemas/user.yaml"
})
# Using nrows parameter for quick testing
adapter = LoaderAdapter({
"filepath": "data/large_dataset.csv",
"schema": "schemas/data.yaml",
"nrows": 1000 # Load only first 1000 rows
})
# Or using benchmark:// protocol
# adapter = LoaderAdapter({
# "filepath": "benchmark://adult-income",
# "schema": "benchmark://adult-income_schema"
# })
# Execute loading
adapter.run({})
# Get results
data = adapter.get_result()
metadata = adapter.get_metadata()Workflow
- Protocol Detection: Check if filepath/schema uses
benchmark://protocol - Benchmarker Processing (for benchmark protocol)
- Download files locally
- Verify SHA-256 (warning on mismatch)
- Convert paths to local paths
- Data Loading: Load data and metadata
Notes
- This is an internal API, not recommended for direct use
- Prefer using YAML configuration files and Executor
- Benchmark files are cached after first download
- Results are cached until next run() call