Benchmarker API
Overview
The Benchmarker module provides download and management functionality for benchmark datasets and their schema files. Through the benchmark:// protocol, predefined benchmark datasets and YAML schema files can be conveniently accessed.
Architecture
classDiagram
class BenchmarkerConfig {
+benchmark_name: str
+filepath_raw: str
+benchmark_filename: str
+__init__(benchmark_name, filepath_raw)
+get_benchmarker_config() dict
}
class BenchmarkerRequests {
+config: dict
+__init__(config: dict)
+download() void
-_get_download_url() str
-_download_file(url, path) void
}
class BenchmarkDatasetsError {
<<exception>>
+message: str
}
BenchmarkerConfig ..> BenchmarkerRequests
BenchmarkerRequests ..> BenchmarkDatasetsError
note for BenchmarkerConfig "Manages benchmark dataset\nconfiguration and paths"
note for BenchmarkerRequests "Downloads benchmark datasets\nfrom remote sources"
note for BenchmarkDatasetsError "Raised when dataset\ndownload fails"
%% 樣式標示
style BenchmarkerConfig fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
style BenchmarkerRequests fill:#e6f3ff,stroke:#4a90e2,stroke-width:3px
style BenchmarkDatasetsError fill:#ffe6e6,stroke:#ff6666,stroke-width:2pxLegend:
- Purple box: Configuration classes (BenchmarkerConfig)
- Light blue box: Request handler classes (BenchmarkerRequests)
- Light red box: Exception classes (BenchmarkDatasetsError)
- Yellow box: Class annotation notes
-->: Composition relationship..>: Dependency relationship
Main Classes
BenchmarkerConfig
Handles configuration management for benchmark datasets and schema files.
from petsard.loader.benchmarker import BenchmarkerConfig
# For data files
config = BenchmarkerConfig(
benchmark_name="adult-income",
filepath_raw="benchmark://adult-income"
)
# For schema files
schema_config = BenchmarkerConfig(
benchmark_name="adult-income_schema",
filepath_raw="benchmark://adult-income_schema"
)Attributes
benchmark_name: Benchmark dataset or schema namefilepath_raw: Raw file path (benchmark:// protocol)benchmark_filename: Local filename
BenchmarkerRequests
Responsible for downloading benchmark datasets from remote sources.
from petsard.loader.benchmarker import BenchmarkerRequests
downloader = BenchmarkerRequests(config.get_benchmarker_config())
downloader.download()Workflow
- Protocol Parsing: Parse
benchmark://protocol - Configuration Creation: Create configuration based on dataset/schema name
- Data Download: Download dataset or schema from remote source
- SHA-256 Verification: Verify file integrity (logs warning on mismatch, doesn’t block)
- Local Storage: Save to
benchmark/directory
Error Handling
- BenchmarkDatasetsError
- Thrown when download fails
- Thrown when dataset doesn’t exist
- Thrown on network connection issues
- SHA-256 Verification
- Logs warning on mismatch (doesn’t block execution)
- Allows using modified local files for development
Important Notes
- Datasets and schema files are cached locally in
benchmark/directory after download - First use requires network connection
- Recommended to use through LoaderAdapter rather than direct calls
- Using YAML configuration files is the recommended approach
- Supports both CSV data files and YAML schema files
- SHA-256 verification failures log warnings but don’t block execution (as of v2.0.0)