Benchmarker API
Overview
The Benchmarker module provides download and management functionality for benchmark datasets and their schema files. Through the benchmark://
protocol, predefined benchmark datasets and YAML schema files can be conveniently accessed.
Architecture
classDiagram class BenchmarkerConfig { +benchmark_name: str +filepath_raw: str +benchmark_filename: str +__init__(benchmark_name, filepath_raw) +get_benchmarker_config() dict } class BenchmarkerRequests { +config: dict +__init__(config: dict) +download() void -_get_download_url() str -_download_file(url, path) void } class BenchmarkDatasetsError { <<exception>> +message: str } BenchmarkerConfig ..> BenchmarkerRequests BenchmarkerRequests ..> BenchmarkDatasetsError note for BenchmarkerConfig "Manages benchmark dataset\nconfiguration and paths" note for BenchmarkerRequests "Downloads benchmark datasets\nfrom remote sources" note for BenchmarkDatasetsError "Raised when dataset\ndownload fails" %% 樣式標示 style BenchmarkerConfig fill:#f3e6ff,stroke:#9966cc,stroke-width:2px style BenchmarkerRequests fill:#e6f3ff,stroke:#4a90e2,stroke-width:3px style BenchmarkDatasetsError fill:#ffe6e6,stroke:#ff6666,stroke-width:2px
Legend:
- Purple box: Configuration classes (BenchmarkerConfig)
- Light blue box: Request handler classes (BenchmarkerRequests)
- Light red box: Exception classes (BenchmarkDatasetsError)
- Yellow box: Class annotation notes
-->
: Composition relationship..>
: Dependency relationship
Main Classes
BenchmarkerConfig
Handles configuration management for benchmark datasets and schema files.
from petsard.loader.benchmarker import BenchmarkerConfig
# For data files
config = BenchmarkerConfig(
benchmark_name="adult-income",
filepath_raw="benchmark://adult-income"
)
# For schema files
schema_config = BenchmarkerConfig(
benchmark_name="adult-income_schema",
filepath_raw="benchmark://adult-income_schema"
)
Attributes
benchmark_name
: Benchmark dataset or schema namefilepath_raw
: Raw file path (benchmark:// protocol)benchmark_filename
: Local filename
BenchmarkerRequests
Responsible for downloading benchmark datasets from remote sources.
from petsard.loader.benchmarker import BenchmarkerRequests
downloader = BenchmarkerRequests(config.get_benchmarker_config())
downloader.download()
Workflow
- Protocol Parsing: Parse
benchmark://
protocol - Configuration Creation: Create configuration based on dataset/schema name
- Data Download: Download dataset or schema from remote source
- SHA-256 Verification: Verify file integrity (logs warning on mismatch, doesn’t block)
- Local Storage: Save to
benchmark/
directory
Error Handling
- BenchmarkDatasetsError
- Thrown when download fails
- Thrown when dataset doesn’t exist
- Thrown on network connection issues
- SHA-256 Verification
- Logs warning on mismatch (doesn’t block execution)
- Allows using modified local files for development
Important Notes
- Datasets and schema files are cached locally in
benchmark/
directory after download - First use requires network connection
- Recommended to use through LoaderAdapter rather than direct calls
- Using YAML configuration files is the recommended approach
- Supports both CSV data files and YAML schema files
- SHA-256 verification failures log warnings but don’t block execution (as of v2.0.0)