Skip to main content
Avala produces labeled datasets. Your training pipeline consumes them. This page covers how to connect the two — from exporting annotations in the right format to building automated training loops that re-import model predictions for active learning.

Export Formats

Avala supports exporting annotations in standard ML formats. Choose the format that matches your training framework.
FormatAnnotation TypesFrameworks
Avala JSONAll typesCustom pipelines, Avala SDK
COCOBounding boxes, polygons, keypoints, segmentationDetectron2, MMDetection, PyTorch
YOLOBounding boxesUltralytics, YOLOv5/v8
Pascal VOCBounding boxesTensorFlow, older pipelines
Segmentation masksSemantic/instance segmentationAny framework (PNG masks)
KITTI3D cuboids, 2D boxesAutonomous driving pipelines

Creating an Export

from avala import Client

client = Client()

export = client.exports.create(
    project="prj_abc123",
    format="coco",
    include_approved_only=True
)

# Poll for completion
export = client.exports.get(export.uid)
print(f"Status: {export.status}")
print(f"Download: {export.download_url}")

Export Filtering

Control exactly which annotations are included in your export:
FilterDescription
include_approved_onlyOnly include annotations that passed QC review
dataset_uidsLimit export to specific datasets within the project
slice_uidsExport only items in specific slices
label_filterInclude only specific object classes

PyTorch Integration

Loading COCO Exports with torchvision

import torch
from torchvision.datasets import CocoDetection
from torchvision import transforms

# Download and extract your Avala COCO export
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

dataset = CocoDetection(
    root="exports/avala-coco/images",
    annFile="exports/avala-coco/annotations.json",
    transform=transform
)

dataloader = torch.utils.data.DataLoader(
    dataset,
    batch_size=16,
    shuffle=True,
    num_workers=4,
    collate_fn=lambda x: tuple(zip(*x))
)

for images, targets in dataloader:
    # Train your model
    pass

Loading with Detectron2

from detectron2.data.datasets import register_coco_instances
from detectron2.config import get_cfg
from detectron2.engine import DefaultTrainer

# Register your Avala export as a Detectron2 dataset
register_coco_instances(
    "avala_train",
    {},
    "exports/avala-coco/annotations.json",
    "exports/avala-coco/images"
)

cfg = get_cfg()
cfg.DATASETS.TRAIN = ("avala_train",)
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 5  # Match your Avala label count

trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
For a complete PyTorch training example, see the PyTorch framework guide.

Hugging Face Integration

Loading with Hugging Face Datasets

from datasets import load_dataset

# Load COCO-format export
dataset = load_dataset(
    "json",
    data_files="exports/avala-coco/annotations.json"
)

# Or load directly from Avala using a custom script
from avala import Client

client = Client()
export = client.exports.create(
    project="prj_abc123",
    format="coco",
    include_approved_only=True
)

# Wait for export, then load
# See the Hugging Face guide for a full example
For detailed Hugging Face integration, see the Hugging Face framework guide.

Training Loop Automation

End-to-End Pipeline

Combine Avala exports with webhooks to trigger training automatically when new annotations are approved.
Annotators submit work
  → QC review approves annotations
  → Webhook fires: export.completed
  → Your pipeline downloads the export
  → Model trains on new data
  → Model predictions imported back to Avala
  → Annotators review and correct predictions
  → Repeat

Webhook-Triggered Training

# webhook_handler.py
from flask import Flask, request
from avala import Client
import subprocess

app = Flask(__name__)
client = Client()

@app.route("/webhook", methods=["POST"])
def handle_webhook():
    event = request.json

    if event["event_type"] == "export.completed":
        export_uid = event["data"]["export_uid"]
        export = client.exports.get(export_uid)

        # Download the export
        subprocess.run([
            "wget", "-O", "latest_export.zip",
            export.download_url
        ])

        # Trigger training
        subprocess.run([
            "python", "train.py",
            "--data", "latest_export.zip"
        ])

    return {"status": "ok"}

Scheduling Periodic Exports

For pipelines that do not need real-time triggers, schedule periodic exports:
# scheduled_export.py
from avala import Client
import time

client = Client()

def export_and_download(project_uid: str, output_path: str) -> str:
    """Create an export and wait for it to complete."""
    export = client.exports.create(
        project=project_uid,
        format="coco",
        include_approved_only=True
    )

    # Poll until complete
    while export.status != "completed":
        time.sleep(10)
        export = client.exports.get(export.uid)

        if export.status == "failed":
            raise RuntimeError(f"Export failed: {export.uid}")

    return export.download_url

Active Learning Loop

Use model predictions to prioritize which data gets annotated next, creating a feedback loop between your model and your annotation team.

How It Works

  1. Train an initial model on a small labeled dataset
  2. Run inference on unlabeled data
  3. Score uncertainty — identify items where the model is least confident
  4. Import predictions into Avala as pre-annotations
  5. Prioritize uncertain items for human annotation using work batches
  6. Annotators review and correct the model predictions (faster than labeling from scratch)
  7. Export the corrected annotations and retrain

Importing Model Predictions

Use batch auto-labeling to import model predictions as pre-annotations:
from avala import Client

client = Client()

# After running inference, import predictions
# See the Batch Auto-Labeling guide for format details

Measuring Improvement

Track these metrics across active learning iterations:
MetricDescriptionGoal
Model mAPMean average precision on held-out test setIncreasing each iteration
Annotation time per itemAverage time annotators spend per itemDecreasing (pre-annotations save time)
Correction rate% of pre-annotations that need human correctionDecreasing each iteration
Items labeled per iterationNumber of new items added to training setDepends on budget

Dataset Versioning

Keep track of which data was used to train which model.

Using Slices for Versioning

Slices let you create named subsets of a dataset without duplicating data:
from avala import Client

client = Client()

# Create a slice for training data v1
slice_v1 = client.slices.create(
    name="training-v1",
    dataset_uid="ds_abc123"
)

# Add items to the slice
client.slices.add_items(
    slice_uid=slice_v1.uid,
    item_uids=["itm_001", "itm_002", "itm_003"]
)

# Export only this slice for training
export = client.exports.create(
    project="prj_abc123",
    format="coco",
    slice_uids=[slice_v1.uid]
)

Versioning Best Practices

PracticeBenefit
Create a new slice for each training runReproducible experiments
Include the model version in the slice nameEasy cross-reference
Export with include_approved_only=TrueOnly train on reviewed data
Keep a held-out test sliceConsistent evaluation across versions

Next Steps