Loader API
Data loading module that supports various file formats.
Class Architecture
classDiagram
class Loader {
LoaderConfig config
__init__()
load() DataFrame~Schema
}
class LoaderBase {
dict config
__init__()
load() DataFrame
}
class LoaderPandasCsv {
load() DataFrame
}
class LoaderPandasExcel {
load() DataFrame
}
class LoaderConfig {
string filepath
dict column_types
list header_names
Schema schema
string schema_path
string dir_name
string base_name
string file_name
string file_ext
int file_ext_code
}
class LoaderFileExt {
int CSVTYPE
int EXCELTYPE
get() int
}
class Schema {
string id
string name
string description
dict attributes
}
class Attribute {
string name
string type
string logical_type
bool enable_null
list na_values
}
class SchemaMetadater {
from_dict() Schema
from_yaml() Schema
from_data() Schema
align() DataFrame
}
LoaderBase <|-- LoaderPandasCsv
LoaderBase <|-- LoaderPandasExcel
Loader *-- LoaderConfig
Loader ..> Schema
LoaderConfig *-- Schema
Schema *-- Attribute
Loader ..> LoaderPandasCsv
Loader ..> LoaderPandasExcel
Loader ..> LoaderFileExt
Loader ..> SchemaMetadater
LoaderConfig ..> LoaderFileExt
%% 樣式標示
style Loader fill:#e6f3ff,stroke:#4a90e2,stroke-width:3px
style LoaderBase fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
style LoaderPandasCsv fill:#fff2e6,stroke:#ff9800,stroke-width:2px
style LoaderPandasExcel fill:#fff2e6,stroke:#ff9800,stroke-width:2px
style LoaderConfig fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
style LoaderFileExt fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
style Schema fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
style Attribute fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
style SchemaMetadater fill:#f3e6ff,stroke:#9966cc,stroke-width:2pxLegend:
- Blue box: Main class
- Orange box: Subclass implementations
- Light purple box: Configuration and data classes
<|--: Inheritance relationship*--: Composition relationship..>: Dependency relationship
Basic Usage
from petsard import Loader
# Load CSV file
loader = Loader('data.csv')
data, schema = loader.load()
# Use custom schema
loader = Loader('data.csv', schema='schema.yaml')
data, schema = loader.load()Constructor (init)
Initialize a data loader instance.
Syntax
def __init__(
filepath: str = None,
column_types: dict = None,
header_names: list = None,
na_values: str | list | dict = None,
nrows: int = None,
schema: Schema | dict | str = None
)Parameters
filepath : str, required
- Data file path
- Required parameter
- Supports both relative and absolute paths
column_types : dict, optional
- Deprecated - will be removed in v2.0.0
- Use
schemaparameter instead
header_names : list, optional
- Deprecated - will be removed in v2.0.0
- Column names for headerless data
- Default:
None
na_values : str | list | dict, optional
- Deprecated - will be removed in v2.0.0
- Use
schemaparameter instead
nrows : int, optional
- Number of rows to read from the file
- Useful for quickly testing with a subset of data to reduce memory usage
- Similar to pandas.read_csv’s nrows parameter
- Default:
None(reads all rows)
schema : Schema | dict | str, optional
- Data structure definition configuration
- Can be Schema object, dictionary, or YAML file path
- Default:
None(auto-inferred) - For detailed Schema configuration, refer to Metadater API documentation
Returns
- Loader
- Initialized loader instance
Examples
from petsard import Loader
# Basic usage - Load CSV file
loader = Loader('data.csv')
# Use schema YAML configuration file
loader = Loader('data.csv', schema='schema.yaml')
# Use schema dictionary
schema_dict = {
'id': 'my_schema',
'name': 'My Schema'
}
loader = Loader('data.csv', schema=schema_dict)
# Load data
data, schema = loader.load()For detailed parameter configuration, please refer to the Loader YAML documentation.
Notes
- Deprecated parameters:
column_types,na_values, andheader_namesparameters are deprecated and will be removed in v2.0.0 - Recommendation: Use YAML configuration file rather than direct Python API
- Schema usage: Recommend using Schema to define data structure, for detailed configuration refer to Metadater API documentation
- Loading process: Initialization only creates configuration, actual data loading requires calling
load()method - Excel support: Excel format requires
openpyxlpackage - Documentation note: This documentation is for internal development team reference only, backward compatibility is not guaranteed
- File formats: For supported file formats, refer to Loader YAML documentation