Model Training Integration

Avala produces labeled datasets. Your training pipeline consumes them. This page covers how to connect the two — from exporting annotations in the right format to building automated training loops that re-import model predictions for active learning.

Export Formats

Avala supports exporting annotations in standard ML formats. Choose the format that matches your training framework.

Format	Annotation Types	Frameworks
Avala JSON	All types	Custom pipelines, Avala SDK
COCO	Bounding boxes, polygons, keypoints, segmentation	Detectron2, MMDetection, PyTorch
YOLO	Bounding boxes	Ultralytics, YOLOv5/v8
Pascal VOC	Bounding boxes	TensorFlow, older pipelines
Segmentation masks	Semantic/instance segmentation	Any framework (PNG masks)
KITTI	3D cuboids, 2D boxes	Autonomous driving pipelines

Creating an Export

from avala import Client

client = Client()

export = client.exports.create(
    project="prj_abc123",
    format="coco",
    include_approved_only=True
)

# Poll for completion
export = client.exports.get(export.uid)
print(f"Status: {export.status}")
print(f"Download: {export.download_url}")

Export Filtering

Control exactly which annotations are included in your export:

Filter	Description
`include_approved_only`	Only include annotations that passed QC review
`dataset_uids`	Limit export to specific datasets within the project
`slice_uids`	Export only items in specific slices
`label_filter`	Include only specific object classes

PyTorch Integration

Loading COCO Exports with torchvision

import torch
from torchvision.datasets import CocoDetection
from torchvision import transforms

# Download and extract your Avala COCO export
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

dataset = CocoDetection(
    root="exports/avala-coco/images",
    annFile="exports/avala-coco/annotations.json",
    transform=transform
)

dataloader = torch.utils.data.DataLoader(
    dataset,
    batch_size=16,
    shuffle=True,
    num_workers=4,
    collate_fn=lambda x: tuple(zip(*x))
)

for images, targets in dataloader:
    # Train your model
    pass

Loading with Detectron2

from detectron2.data.datasets import register_coco_instances
from detectron2.config import get_cfg
from detectron2.engine import DefaultTrainer

# Register your Avala export as a Detectron2 dataset
register_coco_instances(
    "avala_train",
    {},
    "exports/avala-coco/annotations.json",
    "exports/avala-coco/images"
)

cfg = get_cfg()
cfg.DATASETS.TRAIN = ("avala_train",)
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 5  # Match your Avala label count

trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

For a complete PyTorch training example, see the PyTorch framework guide.

Hugging Face Integration

Loading with Hugging Face Datasets

from datasets import load_dataset

# Load COCO-format export
dataset = load_dataset(
    "json",
    data_files="exports/avala-coco/annotations.json"
)

# Or load directly from Avala using a custom script
from avala import Client

client = Client()
export = client.exports.create(
    project="prj_abc123",
    format="coco",
    include_approved_only=True
)

# Wait for export, then load
# See the Hugging Face guide for a full example

For detailed Hugging Face integration, see the Hugging Face framework guide.

Training Loop Automation

End-to-End Pipeline

Combine Avala exports with webhooks to trigger training automatically when new annotations are approved.

Annotators submit work
  → QC review approves annotations
  → Webhook fires: export.completed
  → Your pipeline downloads the export
  → Model trains on new data
  → Model predictions imported back to Avala
  → Annotators review and correct predictions
  → Repeat

Webhook-Triggered Training

# webhook_handler.py
from flask import Flask, request
from avala import Client
import subprocess

app = Flask(__name__)
client = Client()

@app.route("/webhook", methods=["POST"])
def handle_webhook():
    event = request.json

    if event["event_type"] == "export.completed":
        export_uid = event["data"]["export_uid"]
        export = client.exports.get(export_uid)

        # Download the export
        subprocess.run([
            "wget", "-O", "latest_export.zip",
            export.download_url
        ])

        # Trigger training
        subprocess.run([
            "python", "train.py",
            "--data", "latest_export.zip"
        ])

    return {"status": "ok"}

Scheduling Periodic Exports

For pipelines that do not need real-time triggers, schedule periodic exports:

# scheduled_export.py
from avala import Client
import time

client = Client()

def export_and_download(project_uid: str, output_path: str) -> str:
    """Create an export and wait for it to complete."""
    export = client.exports.create(
        project=project_uid,
        format="coco",
        include_approved_only=True
    )

    # Poll until complete
    while export.status != "completed":
        time.sleep(10)
        export = client.exports.get(export.uid)

        if export.status == "failed":
            raise RuntimeError(f"Export failed: {export.uid}")

    return export.download_url

Active Learning Loop

Use model predictions to prioritize which data gets annotated next, creating a feedback loop between your model and your annotation team.

How It Works

Train an initial model on a small labeled dataset
Run inference on unlabeled data
Score uncertainty — identify items where the model is least confident
Import predictions into Avala as pre-annotations
Prioritize uncertain items for human annotation using work batches
Annotators review and correct the model predictions (faster than labeling from scratch)
Export the corrected annotations and retrain

Importing Model Predictions

Use batch auto-labeling to import model predictions as pre-annotations:

from avala import Client

client = Client()

# After running inference, import predictions
# See the Batch Auto-Labeling guide for format details

Measuring Improvement

Track these metrics across active learning iterations:

Metric	Description	Goal
Model mAP	Mean average precision on held-out test set	Increasing each iteration
Annotation time per item	Average time annotators spend per item	Decreasing (pre-annotations save time)
Correction rate	% of pre-annotations that need human correction	Decreasing each iteration
Items labeled per iteration	Number of new items added to training set	Depends on budget

Dataset Versioning

Keep track of which data was used to train which model.

Using Slices for Versioning

Slices let you create named subsets of a dataset without duplicating data:

from avala import Client

client = Client()

# Create a slice for training data v1
slice_v1 = client.slices.create(
    name="training-v1",
    dataset_uid="ds_abc123"
)

# Add items to the slice
client.slices.add_items(
    slice_uid=slice_v1.uid,
    item_uids=["itm_001", "itm_002", "itm_003"]
)

# Export only this slice for training
export = client.exports.create(
    project="prj_abc123",
    format="coco",
    slice_uids=[slice_v1.uid]
)

Versioning Best Practices

Practice	Benefit
Create a new slice for each training run	Reproducible experiments
Include the model version in the slice name	Easy cross-reference
Export with `include_approved_only=True`	Only train on reviewed data
Keep a held-out test slice	Consistent evaluation across versions

Next Steps

PyTorch Guide

Complete integration guide for PyTorch and Detectron2.

Hugging Face Guide

Load Avala exports into Hugging Face Datasets and Transformers.

Batch Auto-Labeling

Import model predictions as pre-annotations for review.

Exports API

Full API reference for creating and managing exports.

Resources

Framework Guides

Updates

Export Formats

Creating an Export

Export Filtering

PyTorch Integration

Loading COCO Exports with torchvision

Loading with Detectron2

Hugging Face Integration

Loading with Hugging Face Datasets

Training Loop Automation

End-to-End Pipeline

Webhook-Triggered Training

Scheduling Periodic Exports

Active Learning Loop

How It Works

Importing Model Predictions

Measuring Improvement

Dataset Versioning

Using Slices for Versioning

Versioning Best Practices

Next Steps

PyTorch Guide

Hugging Face Guide

Batch Auto-Labeling

Exports API

Resources

Framework Guides

Updates

​Export Formats

​Creating an Export

​Export Filtering

​PyTorch Integration

​Loading COCO Exports with torchvision

​Loading with Detectron2

​Hugging Face Integration

​Loading with Hugging Face Datasets

​Training Loop Automation

​End-to-End Pipeline

​Webhook-Triggered Training

​Scheduling Periodic Exports

​Active Learning Loop

​How It Works

​Importing Model Predictions

​Measuring Improvement

​Dataset Versioning

​Using Slices for Versioning

​Versioning Best Practices

​Next Steps

PyTorch Guide

Hugging Face Guide

Batch Auto-Labeling

Exports API

Export Formats

Creating an Export

Export Filtering

PyTorch Integration

Loading COCO Exports with torchvision

Loading with Detectron2

Hugging Face Integration

Loading with Hugging Face Datasets

Training Loop Automation

End-to-End Pipeline

Webhook-Triggered Training

Scheduling Periodic Exports

Active Learning Loop

How It Works

Importing Model Predictions

Measuring Improvement

Dataset Versioning

Using Slices for Versioning

Versioning Best Practices

Next Steps