Loader API

Loader API

Data loading module that supports various file formats.

Class Architecture

classDiagram
    class Loader {
        LoaderConfig config
        __init__()
        load() DataFrame~Schema
    }

    class LoaderBase {
        dict config
        __init__()
        load() DataFrame
    }

    class LoaderPandasCsv {
        load() DataFrame
    }

    class LoaderPandasExcel {
        load() DataFrame
    }

    class LoaderConfig {
        string filepath
        dict column_types
        list header_names
        Schema schema
        string schema_path
        string dir_name
        string base_name
        string file_name
        string file_ext
        int file_ext_code
    }

    class LoaderFileExt {
        int CSVTYPE
        int EXCELTYPE
        get() int
    }

    class Schema {
        string id
        string name
        string description
        dict attributes
    }

    class Attribute {
        string name
        string type
        string logical_type
        bool enable_null
        list na_values
    }

    class SchemaMetadater {
        from_dict() Schema
        from_yaml() Schema
        from_data() Schema
        align() DataFrame
    }

    LoaderBase <|-- LoaderPandasCsv
    LoaderBase <|-- LoaderPandasExcel

    Loader *-- LoaderConfig
    Loader ..> Schema
    LoaderConfig *-- Schema
    Schema *-- Attribute

    Loader ..> LoaderPandasCsv
    Loader ..> LoaderPandasExcel
    Loader ..> LoaderFileExt
    Loader ..> SchemaMetadater
    LoaderConfig ..> LoaderFileExt

    %% 樣式標示
    style Loader fill:#e6f3ff,stroke:#4a90e2,stroke-width:3px
    style LoaderBase fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
    style LoaderPandasCsv fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style LoaderPandasExcel fill:#fff2e6,stroke:#ff9800,stroke-width:2px
    style LoaderConfig fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
    style LoaderFileExt fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
    style Schema fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
    style Attribute fill:#f3e6ff,stroke:#9966cc,stroke-width:2px
    style SchemaMetadater fill:#f3e6ff,stroke:#9966cc,stroke-width:2px

Legend:

  • Blue box: Main class
  • Orange box: Subclass implementations
  • Light purple box: Configuration and data classes
  • <|--: Inheritance relationship
  • *--: Composition relationship
  • ..>: Dependency relationship

Basic Usage

from petsard import Loader

# Load CSV file
loader = Loader('data.csv')
data, schema = loader.load()

# Use custom schema
loader = Loader('data.csv', schema='schema.yaml')
data, schema = loader.load()

Constructor (init)

Initialize a data loader instance.

Syntax

def __init__(
    filepath: str = None,
    column_types: dict = None,
    header_names: list = None,
    na_values: str | list | dict = None,
    schema: Schema | dict | str = None
)

Parameters

  • filepath : str, required

    • Data file path
    • Required parameter
    • Supports both relative and absolute paths
  • column_types : dict, optional

    • Deprecated - will be removed in v2.0.0
    • Use schema parameter instead
  • header_names : list, optional

    • Deprecated - will be removed in v2.0.0
    • Column names for headerless data
    • Default: None
  • na_values : str | list | dict, optional

    • Deprecated - will be removed in v2.0.0
    • Use schema parameter instead
  • schema : Schema | dict | str, optional

    • Data structure definition configuration
    • Can be Schema object, dictionary, or YAML file path
    • Default: None (auto-inferred)
    • For detailed Schema configuration, refer to Metadater API documentation

Returns

  • Loader
    • Initialized loader instance

Examples

from petsard import Loader

# Basic usage - Load CSV file
loader = Loader('data.csv')

# Use schema YAML configuration file
loader = Loader('data.csv', schema='schema.yaml')

# Use schema dictionary
schema_dict = {
    'id': 'my_schema',
    'name': 'My Schema'
}
loader = Loader('data.csv', schema=schema_dict)

# Load data
data, schema = loader.load()

For detailed parameter configuration, please refer to the Loader YAML documentation.

Notes

  • Deprecated parameters: column_types, na_values, and header_names parameters are deprecated and will be removed in v2.0.0
  • Recommendation: Use YAML configuration file rather than direct Python API
  • Schema usage: Recommend using Schema to define data structure, for detailed configuration refer to Metadater API documentation
  • Loading process: Initialization only creates configuration, actual data loading requires calling load() method
  • Excel support: Excel format requires openpyxl package
  • Documentation note: This documentation is for internal development team reference only, backward compatibility is not guaranteed
  • File formats: For supported file formats, refer to Loader YAML documentation