NaN Handling

Define how to handle missing values when specific fields contain NaN.

Usage Examples

Click the button below to run examples in Colab:

Open In Colab

nan_groups:
  # Required field: delete row if missing
  workclass: 'delete'

  # Related fields: if occupation is NA, income should also be NA
  occupation:
    erase: 'income'

  # Supplementary field: if age is NA, use educational-num to fill
  age:
    copy: 'educational-num'

  # Conditional NaN: if workclass is Never-worked, capital-gain should be NA
  capital-gain:
    nan_if_condition:
      workclass: 'Never-worked'

Supported Actions

delete - Delete Entire Row

Delete the entire row when the specified field is NA.

Syntax:

main_field_name: 'delete'

erase - Clear Other Fields

When the main field is NA, set other specified fields to NA. Supports single or multiple target fields.

Syntax:

main_field_name:
  erase: 'target_field_name'

or

main_field_name:
  erase:
    - 'target_field_name1'
    - 'target_field_name2'

copy - Copy Values

When the main field has a value and the target field is NA, copy the main field’s value to the target field.

Syntax:

main_field_name:
  copy: 'target_field_name'

nan_if_condition - Conditional NaN Setting

Set the main field to NA when the condition field meets specific conditions.

Syntax:

main_field_name:
  nan_if_condition:
    condition_field_name: 'value'

or

main_field_name:
  nan_if_condition:
    condition_field_name:
      - 'value1'
      - 'value2'

Important Notes

  • Irreversible: delete operation permanently removes data rows
  • Use copy carefully: Ensure both fields have compatible value domains
  • Condition checking: nan_if_condition checks if target field values meet conditions
  • Case sensitive: Condition value matching is case-sensitive