Skip to main content
Technical guidance for optimizing performance across the Avala platform — uploads, exports, the annotation viewer, and API throughput.

Upload Optimization

Parallel Uploads

The upload API supports concurrent requests. Use parallelism to maximize throughput for large datasets.
ConcurrencyUse CaseExpected Throughput
1 (serial)Small datasets (< 100 items)Baseline
5-10Medium datasets (100-10K items)5-8x baseline
10-20Large datasets (10K+ items)8-15x baseline
> 20Not recommendedDiminishing returns, risk of rate limiting
import asyncio
from avala import AsyncClient

async def parallel_upload(dataset_uid: str, files: list[str], concurrency: int = 10):
    """Upload files in parallel using presigned URLs."""
    client = AsyncClient()
    semaphore = asyncio.Semaphore(concurrency)

    async def upload_one(file_path: str):
        async with semaphore:
            # 1. Request presigned URL from the API
            # 2. Upload file directly to storage
            # See REST API docs for the presigned URL flow
            pass

    tasks = [upload_one(f) for f in files]
    results = await asyncio.gather(*tasks, return_exceptions=True)

    success = sum(1 for r in results if not isinstance(r, Exception))
    failed = len(results) - success
    print(f"Uploaded {success}/{len(files)} files ({failed} failed)")

Optimize File Formats Before Upload

Pre-processing files before upload improves both upload speed and viewer performance. Images:
# Convert to optimized JPEG (saves 30-60% file size with minimal quality loss)
convert input.png -quality 90 -strip output.jpg

# Resize oversized images (4K+ is rarely needed for annotation)
convert input.jpg -resize "1920x1080>" -quality 90 output.jpg
Video:
# Re-encode with H.264 for optimal playback
ffmpeg -i input.mov -c:v libx264 -crf 23 -preset medium -c:a aac output.mp4

# Extract frames at specific intervals (every 5th frame)
ffmpeg -i input.mp4 -vf "select=not(mod(n\,5))" -vsync vfn frame_%04d.jpg
Point clouds:
# Downsample with Open3D (Python)
import open3d as o3d

pcd = o3d.io.read_point_cloud("dense.pcd")
downsampled = pcd.voxel_down_sample(voxel_size=0.05)  # 5cm voxels
o3d.io.write_point_cloud("optimized.pcd", downsampled)

Export Optimization

Choose the Right Format

Export format affects generation speed and file size.
FormatSpeedFile SizeBest For
COCO JSONFastSmallObject detection, segmentation, keypoints
Avala JSONFastMediumFull-fidelity export with all metadata
YOLO TXTFastSmallestBounding box training pipelines
Pascal VOCMediumLarge (XML)Legacy training pipelines

Handle Large Exports Asynchronously

Exports over 10,000 items run asynchronously. Poll the status with exponential backoff instead of busy-waiting.
import time
from avala import Client

client = Client()

# Create the export
export = client.exports.create(
    dataset_uid="your-dataset-uid",
    format="coco"
)

# Poll with exponential backoff
wait = 1
while export.status == "processing":
    time.sleep(wait)
    export = client.exports.get(export.uid)
    wait = min(wait * 2, 30)  # Cap at 30 seconds

if export.status == "completed":
    print(f"Download: {export.download_url}")
else:
    print(f"Export failed: {export.status}")

Export by Slice for Faster Iteration

For iterative model training, export specific slices instead of entire datasets. Smaller exports generate faster.

Viewer Performance

Browser Recommendations

The Mission Control viewer performs best on:
BrowserVersionStatus
Chrome113+Recommended (WebGPU enabled by default)
Edge113+Recommended (WebGPU enabled by default)
FirefoxLatestSupported (WebGL fallback)
SafariLatestSupported (WebGL fallback)
Chrome and Edge with WebGPU support provide significantly better performance for 3D point cloud and Gaussian Splat visualization.

Optimize Point Cloud Viewing

For large point cloud datasets:
  • Reduce point density: Downsample to 0.05-0.1m voxel size before upload
  • Split large scans: Break scans larger than 500 MB into smaller segments
  • Close unused panels: Hide camera views you are not actively using
  • Use a dedicated GPU: Integrated graphics struggle with point clouds over 1M points

Optimize Video Annotation

  • Use H.264 codec: Other codecs (H.265, VP9) may decode slower in browsers
  • Limit resolution: 1080p is sufficient for most annotation tasks; 4K adds load time with marginal benefit
  • Shorter sequences: Split long videos into 2-5 minute clips for faster loading and saving

API Throughput

Batch Reads with List Endpoints

Avoid fetching individual resources in a loop. Use list endpoints with filters.
# Bad: N+1 API calls
for uid in dataset_uids:
    dataset = client.datasets.get(uid)  # One call per dataset

# Good: Single call with pagination
page = client.datasets.list()
for dataset in page:
    if dataset.uid in dataset_uids:
        process(dataset)

Cache Responses

For data that changes infrequently (project configuration, label definitions), cache the response locally to avoid redundant API calls.
import json
from pathlib import Path
from datetime import datetime, timedelta

CACHE_DIR = Path(".avala_cache")
CACHE_TTL = timedelta(hours=1)

def get_cached_or_fetch(key: str, fetch_fn):
    cache_file = CACHE_DIR / f"{key}.json"

    if cache_file.exists():
        cached = json.loads(cache_file.read_text())
        if datetime.fromisoformat(cached["expires"]) > datetime.now():
            return cached["data"]

    data = fetch_fn()
    CACHE_DIR.mkdir(exist_ok=True)
    cache_file.write_text(json.dumps({
        "data": data,
        "expires": (datetime.now() + CACHE_TTL).isoformat()
    }))
    return data

Rate Limit Awareness

Build rate limit handling into your client from day one. See Rate Limits for per-endpoint limits.
Endpoint CategoryLimitStrategy
Read endpointsHigher limitsSafe for moderate parallelism
Write endpointsLower limitsSerialize or use low concurrency
Export creationLowest limitsQueue exports, poll for completion
AuthenticationStrict limitsCache tokens, don’t re-authenticate per request

Benchmarking Your Integration

Measure Upload Throughput

# Time a batch upload of 100 images
time find ./test-images -name "*.jpg" | head -100 | \
  xargs -P 10 -I {} curl -s -o /dev/null -w "%{http_code}\n" \
    -X PUT "$PRESIGNED_URL" -H "Content-Type: image/jpeg" --data-binary @{}

Measure API Latency

# Measure average response time for 10 list calls
for i in $(seq 1 10); do
  curl -s -o /dev/null -w "%{time_total}\n" \
    "$BASE_URL/datasets/" -H "X-Avala-Api-Key: $AVALA_API_KEY"
done | awk '{sum+=$1} END {print "Avg:", sum/NR, "seconds"}'

Next Steps