sample()
Generate synthetic data.
Syntax
def sample() -> pd.DataFrame
Parameters
None
Returns
- pd.DataFrame
- The generated synthetic data
- Same columns as original training data
Description
The sample()
method is used to generate synthetic data from a trained model. This method must be called after training is completed with fit()
.
This method performs the following operations:
- Checks if model is trained
- Generates synthetic data rows (number determined by configuration)
- Returns the generated synthetic data as a DataFrame
Example
from petsard import Synthesizer, Metadater
import pandas as pd
# Load and prepare data
df = pd.read_csv('data.csv')
metadata = Metadater.from_data(df)
# Initialize, create and train synthesizer
synthesizer = Synthesizer(method='default')
synthesizer.create(metadata=metadata)
synthesizer.fit(data=df)
# Generate synthetic data
synthetic_data = synthesizer.sample()
print(f"Generated {len(synthetic_data)} synthetic rows")
# Save to file if needed
synthetic_data.to_csv('synthetic_data.csv', index=False)
Notes
- Must complete
fit()
training before callingsample()
- The number of rows generated is determined during
create()
orfit()
based on metadata or training data - Large data generation may require significant time and memory
- Some synthesizers may have generation quantity limits
- To specify the number of rows to generate, set
sample_num_rows
parameter when initializing the Synthesizer