Technical guidance for optimizing performance across the Avala platform — uploads, exports, the annotation viewer, and API throughput.
Upload Optimization
Parallel Uploads
The upload API supports concurrent requests. Use parallelism to maximize throughput for large datasets.
| Concurrency | Use Case | Expected Throughput |
|---|
| 1 (serial) | Small datasets (< 100 items) | Baseline |
| 5-10 | Medium datasets (100-10K items) | 5-8x baseline |
| 10-20 | Large datasets (10K+ items) | 8-15x baseline |
| > 20 | Not recommended | Diminishing returns, risk of rate limiting |
import asyncio
from avala import AsyncClient
async def parallel_upload(dataset_uid: str, files: list[str], concurrency: int = 10):
"""Upload files in parallel using presigned URLs."""
client = AsyncClient()
semaphore = asyncio.Semaphore(concurrency)
async def upload_one(file_path: str):
async with semaphore:
# 1. Request presigned URL from the API
# 2. Upload file directly to storage
# See REST API docs for the presigned URL flow
pass
tasks = [upload_one(f) for f in files]
results = await asyncio.gather(*tasks, return_exceptions=True)
success = sum(1 for r in results if not isinstance(r, Exception))
failed = len(results) - success
print(f"Uploaded {success}/{len(files)} files ({failed} failed)")
Pre-processing files before upload improves both upload speed and viewer performance.
Images:
# Convert to optimized JPEG (saves 30-60% file size with minimal quality loss)
convert input.png -quality 90 -strip output.jpg
# Resize oversized images (4K+ is rarely needed for annotation)
convert input.jpg -resize "1920x1080>" -quality 90 output.jpg
Video:
# Re-encode with H.264 for optimal playback
ffmpeg -i input.mov -c:v libx264 -crf 23 -preset medium -c:a aac output.mp4
# Extract frames at specific intervals (every 5th frame)
ffmpeg -i input.mp4 -vf "select=not(mod(n\,5))" -vsync vfn frame_%04d.jpg
Point clouds:
# Downsample with Open3D (Python)
import open3d as o3d
pcd = o3d.io.read_point_cloud("dense.pcd")
downsampled = pcd.voxel_down_sample(voxel_size=0.05) # 5cm voxels
o3d.io.write_point_cloud("optimized.pcd", downsampled)
Export Optimization
Export format affects generation speed and file size.
| Format | Speed | File Size | Best For |
|---|
| COCO JSON | Fast | Small | Object detection, segmentation, keypoints |
| Avala JSON | Fast | Medium | Full-fidelity export with all metadata |
| YOLO TXT | Fast | Smallest | Bounding box training pipelines |
| Pascal VOC | Medium | Large (XML) | Legacy training pipelines |
Handle Large Exports Asynchronously
Exports over 10,000 items run asynchronously. Poll the status with exponential backoff instead of busy-waiting.
import time
from avala import Client
client = Client()
# Create the export
export = client.exports.create(
dataset_uid="your-dataset-uid",
format="coco"
)
# Poll with exponential backoff
wait = 1
while export.status == "processing":
time.sleep(wait)
export = client.exports.get(export.uid)
wait = min(wait * 2, 30) # Cap at 30 seconds
if export.status == "completed":
print(f"Download: {export.download_url}")
else:
print(f"Export failed: {export.status}")
Export by Slice for Faster Iteration
For iterative model training, export specific slices instead of entire datasets. Smaller exports generate faster.
Browser Recommendations
The Mission Control viewer performs best on:
| Browser | Version | Status |
|---|
| Chrome | 113+ | Recommended (WebGPU enabled by default) |
| Edge | 113+ | Recommended (WebGPU enabled by default) |
| Firefox | Latest | Supported (WebGL fallback) |
| Safari | Latest | Supported (WebGL fallback) |
Chrome and Edge with WebGPU support provide significantly better performance for 3D point cloud and Gaussian Splat visualization.
Optimize Point Cloud Viewing
For large point cloud datasets:
- Reduce point density: Downsample to 0.05-0.1m voxel size before upload
- Split large scans: Break scans larger than 500 MB into smaller segments
- Close unused panels: Hide camera views you are not actively using
- Use a dedicated GPU: Integrated graphics struggle with point clouds over 1M points
Optimize Video Annotation
- Use H.264 codec: Other codecs (H.265, VP9) may decode slower in browsers
- Limit resolution: 1080p is sufficient for most annotation tasks; 4K adds load time with marginal benefit
- Shorter sequences: Split long videos into 2-5 minute clips for faster loading and saving
API Throughput
Batch Reads with List Endpoints
Avoid fetching individual resources in a loop. Use list endpoints with filters.
# Bad: N+1 API calls
for uid in dataset_uids:
dataset = client.datasets.get(uid) # One call per dataset
# Good: Single call with pagination
page = client.datasets.list()
for dataset in page:
if dataset.uid in dataset_uids:
process(dataset)
Cache Responses
For data that changes infrequently (project configuration, label definitions), cache the response locally to avoid redundant API calls.
import json
from pathlib import Path
from datetime import datetime, timedelta
CACHE_DIR = Path(".avala_cache")
CACHE_TTL = timedelta(hours=1)
def get_cached_or_fetch(key: str, fetch_fn):
cache_file = CACHE_DIR / f"{key}.json"
if cache_file.exists():
cached = json.loads(cache_file.read_text())
if datetime.fromisoformat(cached["expires"]) > datetime.now():
return cached["data"]
data = fetch_fn()
CACHE_DIR.mkdir(exist_ok=True)
cache_file.write_text(json.dumps({
"data": data,
"expires": (datetime.now() + CACHE_TTL).isoformat()
}))
return data
Rate Limit Awareness
Build rate limit handling into your client from day one. See Rate Limits for per-endpoint limits.
| Endpoint Category | Limit | Strategy |
|---|
| Read endpoints | Higher limits | Safe for moderate parallelism |
| Write endpoints | Lower limits | Serialize or use low concurrency |
| Export creation | Lowest limits | Queue exports, poll for completion |
| Authentication | Strict limits | Cache tokens, don’t re-authenticate per request |
Benchmarking Your Integration
Measure Upload Throughput
# Time a batch upload of 100 images
time find ./test-images -name "*.jpg" | head -100 | \
xargs -P 10 -I {} curl -s -o /dev/null -w "%{http_code}\n" \
-X PUT "$PRESIGNED_URL" -H "Content-Type: image/jpeg" --data-binary @{}
Measure API Latency
# Measure average response time for 10 list calls
for i in $(seq 1 10); do
curl -s -o /dev/null -w "%{time_total}\n" \
"$BASE_URL/datasets/" -H "X-Avala-Api-Key: $AVALA_API_KEY"
done | awk '{sum+=$1} END {print "Avg:", sum/NR, "seconds"}'
Next Steps