fit_sample()

Perform training and generation in sequence.

Syntax

def fit_sample(data: pd.DataFrame) -> pd.DataFrame

Parameters

data : pd.DataFrame, required
- Dataset for training
- Must be a pandas DataFrame
- Cannot be None

Returns

pd.DataFrame
- The generated synthetic data
- Same columns as original training data

Description

The fit_sample() method combines the functionality of fit() and sample(), completing model training and synthetic data generation in a single call. This is the most commonly used method, particularly suitable for standard synthetic data generation workflows.

This method performs the following operations:

Trains the model using provided data (equivalent to calling fit())
Generates synthetic data from the trained model (equivalent to calling sample())
Returns the generated synthetic data

Example

from petsard import Synthesizer, Metadater
import pandas as pd

# Load data
df = pd.read_csv('data.csv')
metadata = Metadater.from_data(df)

# Train and generate in one step
synthesizer = Synthesizer(method='default')
synthesizer.create(metadata=metadata)
synthetic_data = synthesizer.fit_sample(data=df)

# Access synthetic data
print(f"Generated {len(synthetic_data)} synthetic rows")

# Save to file if needed
synthetic_data.to_csv('synthetic_output.csv', index=False)

Notes

Must call create() before using fit_sample()
This method overwrites any previous training state
Each call retrains the model, even with identical data
For multiple generations with different quantities, recommend using fit() and sample() separately
Training time depends on data size and chosen synthesis method
Suitable for one-time training and generation needs
The number of rows generated is determined during create() or from training data

sample()