API Reference

This section contains the complete API documentation for Tabula Rasa.

Core Modules

tabula_rasa

Tabula Rasa: Production Table Knowledge LLM.

Main Classes

TabulaRasa

The main model class for table question answering.

TableQADataset

Dataset class for table QA tasks.

class tabula_rasa.TableQADataset(df, sketch, n_samples=1000)[source]

Bases: Dataset

Dataset for table QA with synthetic query generation.

__init__(df, sketch, n_samples=1000)[source]

Initialize dataset with synthetic query generation.

Parameters:
  • df (DataFrame) – Source DataFrame

  • sketch (dict) – Statistical sketch of the DataFrame

  • n_samples (int) – Number of training samples to generate

__len__()[source]

Return number of samples.

Return type:

int

__getitem__(idx)[source]

Get a single sample.

Return type:

dict

Training

Classes and functions for training models.

Training components for table QA models.

class tabula_rasa.training.TableQADataset(df, sketch, n_samples=1000)[source]

Bases: Dataset

Dataset for table QA with synthetic query generation.

__getitem__(idx)[source]

Get a single sample.

Return type:

dict

__init__(df, sketch, n_samples=1000)[source]

Initialize dataset with synthetic query generation.

Parameters:
  • df (DataFrame) – Source DataFrame

  • sketch (dict) – Statistical sketch of the DataFrame

  • n_samples (int) – Number of training samples to generate

__len__()[source]

Return number of samples.

Return type:

int

class tabula_rasa.training.ProductionTrainer(model, df, sketch, lr=0.0001, batch_size=16, device='cpu')[source]

Bases: object

Production training with best practices.

__init__(model, df, sketch, lr=0.0001, batch_size=16, device='cpu')[source]

Initialize the trainer.

Parameters:
  • model (ProductionTableQA) – ProductionTableQA model to train

  • df (DataFrame) – Training DataFrame

  • sketch (dict) – Statistical sketch of the DataFrame

  • lr (float) – Learning rate

  • batch_size (int) – Batch size for training

  • device (str) – Device to train on (‘cpu’ or ‘cuda’)

train(n_epochs=10, n_train_samples=1000, n_val_samples=200)[source]

Training loop with validation.

Parameters:
  • n_epochs (int) – Number of training epochs

  • n_train_samples (int) – Number of training samples to generate

  • n_val_samples (int) – Number of validation samples to generate

Return type:

tuple[float, dict]

Returns:

Tuple of (best_val_loss, history_dict)

Trainer

TrainingArguments

Evaluation

Classes and functions for model evaluation.

Evaluator

Metrics

Data Processing

Utilities for data processing and augmentation.

Utilities

Helper functions and utilities.

Utility functions and helpers.

CLI

Command-line interface documentation.

Command-line interface for tabula-rasa.