Supported Models - Avala Documentation

Avala integrates ML models for AI-assisted annotation — generating pre-annotations that human annotators review, accept, or correct. This page documents all supported and planned models.

Currently Available

SAM (Segment Anything Model)

Property	Value
Provider	Amazon SageMaker
Task	Interactive segmentation
Input	Image + point/box prompt
Output	Segmentation masks
Data Types	2D images

SAM generates pixel-level segmentation masks from point or bounding box prompts. In the annotation editor, click a point on an object or draw a bounding box, and SAM returns a precise mask. Supported interactions:

Point prompt — Click on an object to generate a mask
Box prompt — Draw a bounding box to constrain the mask region
Multi-point — Click multiple positive/negative points to refine

Annotator clicks point → Avala sends to SageMaker → SAM returns mask → Mask rendered in editor

SAM runs on a managed SageMaker endpoint. Latency is typically 200–500ms per prompt. The model is warmed and shared across your organization — no cold starts after initial setup.

YOLO (You Only Look Once)

Property	Value
Provider	Amazon SageMaker
Task	Object detection
Input	Image
Output	Bounding boxes + labels + confidence scores
Data Types	2D images

YOLO detects objects in images and returns bounding boxes with class labels and confidence scores. Use it to pre-annotate entire images with detections that annotators then review. Capabilities:

Multi-class object detection
Confidence score per detection
Configurable confidence threshold (default: 0.5)
Filters low-confidence detections automatically

Label mapping: YOLO predictions are mapped to your project’s label set. Only predictions matching configured labels are returned.

Planned Models

The following models are on the roadmap but not yet available. Timelines are approximate. Contact support@avala.ai if you need early access.

SAM 2

Property	Value
Status	Planned
Task	Video segmentation + tracking
Input	Video frames + point/box prompt
Output	Segmentation masks propagated across frames
Data Types	Video sequences

SAM 2 extends SAM to video — prompt on a single frame and propagate masks across the entire sequence. This dramatically reduces annotation time for video datasets. Key improvements over SAM:

Temporal consistency across frames
Object tracking with mask propagation
Memory-efficient streaming inference
Support for occlusion handling

Florence-2

Property	Value
Status	Planned
Task	Multi-task visual understanding
Input	Image + text prompt
Output	Bounding boxes, labels, captions, OCR
Data Types	2D images

Florence-2 is a multi-task vision-language model that can perform detection, captioning, OCR, and grounding from text prompts. Use it for:

Open-vocabulary detection — Detect objects by describing them in natural language
Image captioning — Auto-generate descriptions for images
OCR — Extract text from images
Visual grounding — Find objects matching a text description

RADIO

Property	Value
Status	Planned
Task	Feature extraction + similarity search
Input	Image
Output	Dense feature embeddings
Data Types	2D images, 3D point clouds

RADIO (Robust And Diverse Image-Output) generates dense feature embeddings for images and point clouds. Use it for:

Similarity search — Find visually similar items across your dataset
Clustering — Group similar images for batch annotation
Active learning — Identify the most informative samples to annotate next
Anomaly detection — Find outliers in your dataset

Model Configuration

Models are configured per-organization. SageMaker endpoints are set up by the Avala team during onboarding. Custom HTTP endpoints can be configured self-service via the API.

Current Setup Process

Contact Avala to enable inference for your organization
Avala deploys a SageMaker endpoint in your preferred AWS region
The endpoint is configured in Mission Control under Settings > Inference
Models are available in the annotation editor and via the auto-label API

Using Models in the Annotation Editor

Open a sequence in the annotation editor
Select the AI Assist tool from the toolbar
Choose the model (SAM or YOLO)
For SAM: click a point or draw a box on the object
For YOLO: click Detect All to run on the entire image
Review and accept/modify the predictions

Using Models via API

Trigger auto-labeling programmatically:

curl -X POST "https://api.avala.ai/api/v1/projects/{project_uid}/auto-label" \
  -H "X-Avala-Api-Key: $AVALA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "provider_id": "inf_abc123",
    "confidence_threshold": 0.6
  }'

Current limitations:

SageMaker endpoints are configured by the Avala team during onboarding.
Custom HTTP endpoints can be configured via the REST API (see Model Inference).
Batch auto-labeling is limited to 5,000 items per job (see Batch Auto-Labeling).

Prediction Format

All models return predictions in Avala’s standard annotation format:

{
  "annotations": [
    {
      "type": "bounding_box",
      "label": "car",
      "confidence": 0.94,
      "coordinates": { "x": 120, "y": 340, "width": 200, "height": 150 }
    },
    {
      "type": "segmentation_mask",
      "label": "pedestrian",
      "confidence": 0.87,
      "mask_url": "https://..."
    }
  ]
}

See Supported Prediction Types for the full list.

Next Steps

Model Inference setup for connecting your own models
Auto-labeling workflow for running inference at scale
Quality Control for reviewing AI-generated annotations

Integrations

​Currently Available

​SAM (Segment Anything Model)

​YOLO (You Only Look Once)

​Planned Models

​SAM 2

​Florence-2

​RADIO

​Model Configuration

​Current Setup Process

​Using Models in the Annotation Editor

​Using Models via API

​Prediction Format

​Next Steps

Currently Available

SAM (Segment Anything Model)

YOLO (You Only Look Once)

Planned Models

SAM 2

Florence-2

RADIO

Model Configuration

Current Setup Process

Using Models in the Annotation Editor

Using Models via API

Prediction Format

Next Steps