LoaderAdapter
LoaderAdapter handles data loading and automatically processes benchmark://
protocol for benchmark dataset and schema file downloads.
Class Architecture
classDiagram class LoaderAdapter { +config: dict +loader: Loader +benchmarker: Benchmarker +__init__(config) +run() tuple[DataFrame, Schema] -_handle_benchmark_protocol() -_init_loader() } class Loader { +filepath: str +schema: Schema +load() tuple[DataFrame, Schema] } class Benchmarker { +config: BenchmarkerConfig +download() +get_filepath() } class BenchmarkerConfig { +benchmark_name: str +filepath_raw: str } class BenchmarkerRequests { +download() } LoaderAdapter ..> Loader : uses for data loading LoaderAdapter ..> Benchmarker : uses for benchmark protocol Benchmarker --> BenchmarkerConfig : has Benchmarker ..> BenchmarkerRequests : creates %% Style definitions class LoaderAdapter { <<Main Class>> } style LoaderAdapter fill:#E6E6FA class Loader { <<Core Module>> } style Loader fill:#4169E1,color:#fff class Benchmarker { <<Benchmark Handler>> } style Benchmarker fill:#9370DB,color:#fff style BenchmarkerConfig fill:#FFE4E1 style BenchmarkerRequests fill:#FFE4E1 note for LoaderAdapter "1. Detects benchmark:// protocol\n2. Uses Benchmarker to download\n3. Uses Loader with local path"
Legend:
- Light purple box: LoaderAdapter main class
- Blue box: Core loading module
- Purple box: Benchmark dataset handler module
- Light pink box: Configuration classes
..>
: Dependency relationship-->
: Has relationship
Main Features
- Unified interface for data loading
- Automatic detection and handling of
benchmark://
protocol for both data and schema - Integration of Loader and Benchmarker functionality
- Returns data and Schema metadata
- Supports CSV data files and YAML schema files
Method Reference
__init__(config: dict)
Initializes LoaderAdapter instance with automatic benchmark:// protocol handling.
Parameters:
config
: dict, required- Configuration parameter dictionary
- Must contain
filepath
key - Supports
benchmark://
protocol - Optional parameters include:
schema
: Schema file pathnrows
: Load only specified number of rows (for quick testing)delimiter
,encoding
,header
, etc. (pandas read parameters)
run(input: dict)
Executes data loading, including automatic benchmark dataset download.
Parameters:
input
: dict, required- Input parameter dictionary
- LoaderAdapter typically receives empty dictionary
{}
Returns:
No direct return value. Use get_result()
and get_metadata()
to get results.
get_result()
Gets the loaded data.
Returns:
pd.DataFrame
: Loaded data
get_metadata()
Gets the data’s Schema metadata.
Returns:
Schema
: Data metadata
Usage Example
from petsard.adapter import LoaderAdapter
# Regular file loading
adapter = LoaderAdapter({
"filepath": "data/users.csv",
"schema": "schemas/user.yaml"
})
# Using nrows parameter for quick testing
adapter = LoaderAdapter({
"filepath": "data/large_dataset.csv",
"schema": "schemas/data.yaml",
"nrows": 1000 # Load only first 1000 rows
})
# Or using benchmark:// protocol
# adapter = LoaderAdapter({
# "filepath": "benchmark://adult-income",
# "schema": "benchmark://adult-income_schema"
# })
# Execute loading
adapter.run({})
# Get results
data = adapter.get_result()
metadata = adapter.get_metadata()
Workflow
- Protocol Detection: Check if filepath/schema uses
benchmark://
protocol - Benchmarker Processing (for benchmark protocol)
- Download files locally
- Verify SHA-256 (warning on mismatch)
- Convert paths to local paths
- Data Loading: Load data and metadata
Notes
- This is an internal API, not recommended for direct use
- Prefer using YAML configuration files and Executor
- Benchmark files are cached after first download
- Results are cached until next run() call