fit()
Train the synthesis model.
Syntax
def fit(data: pd.DataFrame)
Parameters
- data : pd.DataFrame, required
- Dataset for training
- Must be a pandas DataFrame
- Cannot be None
Returns
None. Updates the synthesizer’s internal state.
Description
The fit()
method is used to train the synthesis model. This method uses the provided dataset to learn the statistical properties and patterns of the data for subsequent synthetic data generation.
This method performs the following operations:
- Validates the input data
- Passes data to the underlying synthesizer implementation
- Executes the model training process
- Saves the trained model state
The specific training process details depend on the selected synthesis method:
- GaussianCopula: Learns marginal distributions and correlation structure
- CTGAN/CopulaGAN: Trains generative adversarial networks
- TVAE: Trains variational autoencoder
- Custom methods: Executes custom training logic
Example
from petsard import Synthesizer, Metadater
import pandas as pd
# Prepare training data
df = pd.read_csv('training_data.csv')
metadata = Metadater.from_data(df)
# Initialize and train synthesizer
synthesizer = Synthesizer(method='default')
synthesizer.create(metadata=metadata)
synthesizer.fit(data=df)
# After training, can generate synthetic data
synthetic_data = synthesizer.sample()
Notes
- Must call
create()
before callingfit()
- Training time depends on data size, complexity, and chosen synthesis method
- Deep learning methods (CTGAN, TVAE) typically require longer training time
- Training process may consume significant memory, especially for large datasets
- Some synthesizers (like CTGAN) may benefit from GPU acceleration
- After training, model state is saved internally in the synthesizer
- Can call
sample()
multiple times without retraining - To retrain, simply call
fit()
method again