Skip to main content
Best practices for getting the most out of the Avala platform — covering data management, API usage patterns, annotation workflow design, and cost optimization.

Data Management

Organize Datasets by Purpose

Structure your datasets around how they will be used, not just how the raw data is organized.
PatternWhen to UseExample
By task typeDifferent annotation workflowspedestrian-detection, lane-segmentation
By data sourceMultiple cameras or collection runsfront-camera-2026-02, lidar-top-2026-02
By model versionTraining successive model iterationstraining-v1, training-v2, validation
By priorityTriage incoming dataurgent-review, standard-queue, backlog
Use slices to create virtual subsets of a dataset without duplicating data. This is cheaper and more flexible than creating separate datasets.

Optimize File Sizes

Large files slow down uploads, viewer loading, and annotator productivity.
Data TypeRecommended MaxFormat Tips
Images20 MBUse JPEG at 85-95% quality for photos; PNG only for diagrams or screenshots
Video2 GBH.264 codec, 1080p resolution is sufficient for most annotation tasks
Point clouds500 MB per frameDownsample to relevant density; remove ground points if not needed
MCAP bags5 GBSplit long recordings into shorter segments (2-5 minutes)
Gaussian Splats500 MBUse compressed PLY format

Use Cloud Storage for Large Datasets

For datasets over 10,000 items or 100 GB total, use cloud storage integration instead of direct uploads. Benefits:
  • No data transfer: Avala reads directly from your S3 or GCS bucket
  • Your encryption: Data stays encrypted with your KMS keys
  • Your retention: Control lifecycle policies independently
  • Faster onboarding: No upload step — just point Avala to your bucket

API Usage

Paginate Large Result Sets

Never fetch all records in a single request. Use cursor-based pagination to iterate through results efficiently.
from avala import Client

client = Client()

# Iterate through all datasets automatically
page = client.datasets.list()
for dataset in page:
    print(dataset.name)

# The SDK handles pagination internally via CursorPage

Respect Rate Limits

Avala enforces per-endpoint rate limits. Build retry logic into your integration from the start — don’t wait for production traffic to hit limits.
import time
from avala import Client
from avala.errors import RateLimitError

client = Client()

def fetch_with_retry(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError as e:
            wait = e.retry_after or (2 ** attempt)
            time.sleep(wait)
    raise Exception("Max retries exceeded")

# Usage
dataset = fetch_with_retry(lambda: client.datasets.get("dataset-uid"))
See Rate Limits for per-endpoint limits and detailed retry examples.

Use Exports for Bulk Data Retrieval

Don’t loop through individual items to download annotations. Use the export API to generate a single export file containing all annotations for a dataset or project.
from avala import Client

client = Client()

# Create an export (much faster than fetching items individually)
export = client.exports.create(dataset="your-dataset-uid")

print(f"Export status: {export.status}")
print(f"Download URL: {export.download_url}")

Annotation Workflows

Design Projects with Clear Instructions

Well-defined annotation guidelines reduce rework and improve consistency. Effective project setup checklist:
  • Label taxonomy: Define all labels before annotating. Adding labels mid-project creates inconsistency.
  • Examples: Provide 5-10 annotated examples for each label class, covering edge cases.
  • Edge case rules: Document what to do with partially occluded objects, truncated objects at image boundaries, and ambiguous cases.
  • Quality bar: Define what “good enough” looks like — perfect pixel-level accuracy is not always necessary.

Use Multi-Stage Review Pipelines

For production annotation workflows, use a multi-stage review pipeline:
Annotation → Spot Check (10-20%) → Targeted Review → Approved
  • Spot check: Randomly review 10-20% of submissions to identify systemic issues
  • Targeted review: Focus reviews on annotations flagged by AutoTag or low-confidence predictions
  • Full review: Reserve for high-value or safety-critical datasets
See Quality Control for detailed setup.

Leverage Consensus for Validation

For critical datasets, have multiple annotators label the same items independently. Consensus scoring identifies:
  • Items where annotators disagree (review these first)
  • Annotators who consistently deviate from the group
  • Label classes that are ambiguously defined

Batch Work Effectively

Group work into batches of 100-500 items for optimal throughput:
Batch SizeProsCons
< 50 itemsQuick turnaroundHigh overhead per item
100-500 itemsGood balance of throughput and review cycles
> 1000 itemsFewest batches to manageLong wait for review; hard to catch errors early
See Work Batches for batch management.

Performance Optimization

Optimize Upload Throughput

For large dataset uploads, parallelize your upload requests. See Performance Tuning for detailed concurrency recommendations and code examples.

Optimize Export Performance

  • Export by dataset, not by individual items
  • Use COCO format for the fastest export generation
  • For very large datasets (100K+ items), exports run asynchronously — poll the export status instead of waiting synchronously

Monitor with the MCP Server

Use the MCP server to monitor your workflows from your IDE or AI assistant:
"List my datasets and their item counts"
"Show recent exports for project X"
"What's the task completion rate for project Y?"

Cost Management

Right-Size Your Data

Not all data needs annotation. Filter before annotating:
  1. Remove duplicates: Deduplicate images/frames before uploading
  2. Sample strategically: For video, annotate every Nth frame instead of every frame (common: every 5th or 10th frame)
  3. Use active learning: Prioritize items where the model is least confident, not random sampling
  4. Pre-filter with models: Use Batch Auto-Labeling to auto-label easy cases and focus human annotation on hard cases

Minimize API Calls

Instead ofDo this
Fetching items one at a timeUse list endpoints with pagination
Polling export status in a tight loopUse exponential backoff (1s, 2s, 4s, 8s)
Re-fetching unchanged dataCache responses with ETags or timestamps
Downloading all annotationsUse export API for bulk retrieval

Security

Protect Your API Keys

  • Store keys in environment variables, never in source code
  • Rotate keys periodically (generate new key, update integrations, delete old key)
  • Use separate keys for development and production
  • Keys are only displayed once at creation — store them securely
See Authentication for key management details.

Use Cloud Storage with Least Privilege

When connecting S3 or GCS buckets, grant only the permissions Avala needs:
  • Read-only for datasets: s3:GetObject, s3:ListBucket
  • Read-write for exports: Add s3:PutObject
  • Never grant s3:DeleteObject unless absolutely necessary
See Cloud Storage for IAM policy templates.

Next Steps

  • Quickstart — Get up and running in under 60 seconds
  • Examples — Code examples for common workflows
  • Rate Limits — Understand API limits and retry strategies
  • Cloud Storage — Connect your own S3 or GCS bucket