API Reference
This section contains the complete API documentation for Tabula Rasa.
Core Modules
tabula_rasa
|
Tabula Rasa: Production Table Knowledge LLM. |
Main Classes
TabulaRasa
The main model class for table question answering.
TableQADataset
Dataset class for table QA tasks.
-
class tabula_rasa.TableQADataset(df, sketch, n_samples=1000)[source]
Bases: Dataset
Dataset for table QA with synthetic query generation.
-
__init__(df, sketch, n_samples=1000)[source]
Initialize dataset with synthetic query generation.
- Parameters:
df (DataFrame) – Source DataFrame
sketch (dict) – Statistical sketch of the DataFrame
n_samples (int) – Number of training samples to generate
-
__len__()[source]
Return number of samples.
- Return type:
int
-
__getitem__(idx)[source]
Get a single sample.
- Return type:
dict
Training
Classes and functions for training models.
Training components for table QA models.
-
class tabula_rasa.training.TableQADataset(df, sketch, n_samples=1000)[source]
Bases: Dataset
Dataset for table QA with synthetic query generation.
-
__getitem__(idx)[source]
Get a single sample.
- Return type:
dict
-
__init__(df, sketch, n_samples=1000)[source]
Initialize dataset with synthetic query generation.
- Parameters:
df (DataFrame) – Source DataFrame
sketch (dict) – Statistical sketch of the DataFrame
n_samples (int) – Number of training samples to generate
-
__len__()[source]
Return number of samples.
- Return type:
int
-
class tabula_rasa.training.ProductionTrainer(model, df, sketch, lr=0.0001, batch_size=16, device='cpu')[source]
Bases: object
Production training with best practices.
-
__init__(model, df, sketch, lr=0.0001, batch_size=16, device='cpu')[source]
Initialize the trainer.
- Parameters:
model (ProductionTableQA) – ProductionTableQA model to train
df (DataFrame) – Training DataFrame
sketch (dict) – Statistical sketch of the DataFrame
lr (float) – Learning rate
batch_size (int) – Batch size for training
device (str) – Device to train on (‘cpu’ or ‘cuda’)
-
train(n_epochs=10, n_train_samples=1000, n_val_samples=200)[source]
Training loop with validation.
- Parameters:
n_epochs (int) – Number of training epochs
n_train_samples (int) – Number of training samples to generate
n_val_samples (int) – Number of validation samples to generate
- Return type:
tuple[float, dict]
- Returns:
Tuple of (best_val_loss, history_dict)
Evaluation
Classes and functions for model evaluation.
Data Processing
Utilities for data processing and augmentation.
Utilities
Helper functions and utilities.
Utility functions and helpers.
CLI
Command-line interface documentation.
Command-line interface for tabula-rasa.