Skip to main content
Avala provides multiple ways to ingest data depending on your dataset size, infrastructure, and automation needs. This page covers each import method, when to use it, and how to build automated data pipelines.

Import Methods Overview

MethodBest ForMax SizeAutomationSetup
Mission Control uploadSmall datasets, one-off imports5 GBManualNone
Presigned URL uploadProgrammatic uploads from any language5 GB per fileFullAPI key
Cloud storage (S3/GCS)Large datasets, zero-copy accessUnlimitedFullBucket config
MCAP importMulti-sensor robotics data10 GB per fileFullAPI key
SDK bulk uploadMedium datasets with progress tracking5 GB per fileFullSDK installed

Mission Control Upload

The simplest way to get data into Avala. Drag and drop files directly in the web interface.

Steps

  1. Go to Mission Control > Datasets > Create Dataset
  2. Name your dataset and select the data type
  3. Drag files into the upload area or click Browse
  4. Wait for processing to complete

Limitations

  • Browser-based upload is limited by your connection speed and browser memory
  • Not suitable for datasets with more than 1,000 files
  • No resumable uploads — interrupted uploads must restart
For datasets larger than a few hundred files, use the SDK or presigned URL approach instead.

Presigned URL Upload

Presigned URLs let you upload files directly to Avala’s storage from any HTTP client. This is the most flexible programmatic upload method and works from any language or tool that can make HTTP requests.

How It Works

  1. Request a presigned upload URL from the Avala API
  2. Upload your file directly to the presigned URL using an HTTP PUT request
  3. Confirm the upload to register the item in the dataset

Example: Upload with cURL

# Step 1: Get a presigned upload URL
curl -X POST https://api.avala.ai/api/v1/datasets/{dataset_uid}/items/upload-url/ \
  -H "X-Avala-Api-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "filename": "frame_001.jpg",
    "content_type": "image/jpeg"
  }'

# Response:
# { "upload_url": "https://s3.amazonaws.com/...", "item_uid": "itm_abc123" }

# Step 2: Upload the file to the presigned URL
curl -X PUT "https://s3.amazonaws.com/..." \
  -H "Content-Type: image/jpeg" \
  --data-binary @frame_001.jpg

# Step 3: Confirm the upload
curl -X POST https://api.avala.ai/api/v1/datasets/{dataset_uid}/items/{item_uid}/confirm/ \
  -H "X-Avala-Api-Key: your-api-key"

Example: Upload with Python SDK

from avala import Client

client = Client()

dataset = client.datasets.list(name="my-dataset").items[0]

# Upload a single file
client.datasets.upload_items(
    dataset_uid=dataset.uid,
    files=["path/to/image.jpg"]
)

# Upload a directory of files
import glob
files = glob.glob("data/images/*.jpg")
client.datasets.upload_items(
    dataset_uid=dataset.uid,
    files=files
)

Cloud Storage Integration

For large-scale datasets, connect your own S3 or GCS bucket so Avala reads data directly from your storage — no file transfers, no copies.

When to Use Cloud Storage

ScenarioUse Cloud Storage?
Dataset > 10,000 itemsYes
Dataset > 100 GB totalYes
Data must stay in your infrastructureYes
Quick prototype with < 100 itemsNo — direct upload is faster
Data is spread across multiple bucketsYes — connect multiple storage configs

Setup

  1. Configure your bucket with the appropriate IAM policy (see Cloud Storage guide)
  2. Add the storage configuration in Mission Control > Settings > Storage
  3. Create a dataset and select your connected storage as the data source
  4. Reference items by their storage paths

Example: Create Dataset from S3

from avala import Client

client = Client()

# Create a dataset backed by cloud storage
dataset = client.datasets.create(
    name="driving-data-2026-02",
    data_type="image",
    storage_config_uid="stg_your_config_uid"
)

# Register items by their S3 paths
items = [
    {"path": "s3://your-bucket/captures/frame_001.jpg"},
    {"path": "s3://your-bucket/captures/frame_002.jpg"},
    {"path": "s3://your-bucket/captures/frame_003.jpg"},
]

for item in items:
    client.datasets.create_item(
        dataset_uid=dataset.uid,
        source_url=item["path"]
    )
Cloud storage datasets load faster in the annotation editor because images are served directly from your bucket’s region, avoiding cross-region transfers.

MCAP Import

MCAP files contain synchronized multi-sensor data (cameras, LiDAR, IMU). Avala parses MCAP files to extract and align sensor streams for annotation.

Supported Message Types

Message TypeDescription
sensor_msgs/ImageCamera images
sensor_msgs/CompressedImageCompressed camera images
sensor_msgs/PointCloud2LiDAR point clouds
sensor_msgs/ImuIMU readings
geometry_msgs/TransformStampedSensor transforms (TF)
sensor_msgs/NavSatFixGPS coordinates

Import Workflow

  1. Upload MCAP files via the SDK or presigned URLs
  2. Avala processes the file, extracting camera frames and point cloud scans
  3. Sensor streams are synchronized by timestamp
  4. Camera images and projected LiDAR data appear together in the annotation editor
For detailed MCAP setup, see the MCAP / ROS integration guide.

Building Import Pipelines

For production workflows, automate data ingestion so new data flows into Avala as it is collected.

Pipeline Architecture

Data Source                  Avala
┌──────────────┐            ┌─────────────────┐
│ Collection   │            │ Dataset          │
│ System       │──upload──→ │ (items created)  │
│ (cameras,    │            │                  │
│  sensors)    │            │ Project          │
└──────────────┘            │ (tasks assigned) │
                            └────────┬────────┘

                              webhook │

                            ┌─────────────────┐
                            │ Your Pipeline    │
                            │ (export, train)  │
                            └─────────────────┘

Example: Automated Ingestion with Webhooks

Combine the SDK upload with webhooks to build a fully automated pipeline:
# upload_pipeline.py
import glob
import os
from avala import Client

client = Client()

DATASET_UID = os.environ["AVALA_DATASET_UID"]

def ingest_new_data(data_directory: str) -> int:
    """Upload all new images from a directory to Avala."""
    files = glob.glob(os.path.join(data_directory, "*.jpg"))
    if not files:
        return 0

    client.datasets.upload_items(
        dataset_uid=DATASET_UID,
        files=files
    )
    return len(files)

if __name__ == "__main__":
    count = ingest_new_data("/data/incoming")
    print(f"Uploaded {count} items")
Schedule this script with cron, Airflow, or any task scheduler to periodically ingest new data.

Example: Watch Directory and Upload

#!/bin/bash
# watch_and_upload.sh - Upload new files as they appear

WATCH_DIR="/data/incoming"
DATASET_UID="ds_abc123"

inotifywait -m -e create "$WATCH_DIR" --format '%f' | while read filename; do
    if [[ "$filename" == *.jpg || "$filename" == *.png ]]; then
        avala datasets upload-items "$DATASET_UID" "$WATCH_DIR/$filename"
        echo "Uploaded: $filename"
    fi
done

Choosing an Import Method

Use this decision tree to select the right approach:
QuestionIf YesIf No
Fewer than 100 files?Mission Control uploadContinue
Data already in S3/GCS?Cloud storage integrationContinue
MCAP or ROS bag files?MCAP importContinue
Need automation?SDK bulk upload or presigned URLsMission Control upload
Using Python or TypeScript?SDK bulk uploadPresigned URL (any language)

Next Steps