Skip to main content
Avala integrates ML models for AI-assisted annotation — generating pre-annotations that human annotators review, accept, or correct. This page documents all supported and planned models.

Currently Available

SAM (Segment Anything Model)

PropertyValue
ProviderAmazon SageMaker
TaskInteractive segmentation
InputImage + point/box prompt
OutputSegmentation masks
Data Types2D images
SAM generates pixel-level segmentation masks from point or bounding box prompts. In the annotation editor, click a point on an object or draw a bounding box, and SAM returns a precise mask. Supported interactions:
  • Point prompt — Click on an object to generate a mask
  • Box prompt — Draw a bounding box to constrain the mask region
  • Multi-point — Click multiple positive/negative points to refine
Annotator clicks point → Avala sends to SageMaker → SAM returns mask → Mask rendered in editor
SAM runs on a managed SageMaker endpoint. Latency is typically 200–500ms per prompt. The model is warmed and shared across your organization — no cold starts after initial setup.

YOLO (You Only Look Once)

PropertyValue
ProviderAmazon SageMaker
TaskObject detection
InputImage
OutputBounding boxes + labels + confidence scores
Data Types2D images
YOLO detects objects in images and returns bounding boxes with class labels and confidence scores. Use it to pre-annotate entire images with detections that annotators then review. Capabilities:
  • Multi-class object detection
  • Confidence score per detection
  • Configurable confidence threshold (default: 0.5)
  • Filters low-confidence detections automatically
Label mapping: YOLO predictions are mapped to your project’s label set. Only predictions matching configured labels are returned.

Planned Models

The following models are on the roadmap but not yet available. Timelines are approximate. Contact support@avala.ai if you need early access.

SAM 2

PropertyValue
StatusPlanned
TaskVideo segmentation + tracking
InputVideo frames + point/box prompt
OutputSegmentation masks propagated across frames
Data TypesVideo sequences
SAM 2 extends SAM to video — prompt on a single frame and propagate masks across the entire sequence. This dramatically reduces annotation time for video datasets. Key improvements over SAM:
  • Temporal consistency across frames
  • Object tracking with mask propagation
  • Memory-efficient streaming inference
  • Support for occlusion handling

Florence-2

PropertyValue
StatusPlanned
TaskMulti-task visual understanding
InputImage + text prompt
OutputBounding boxes, labels, captions, OCR
Data Types2D images
Florence-2 is a multi-task vision-language model that can perform detection, captioning, OCR, and grounding from text prompts. Use it for:
  • Open-vocabulary detection — Detect objects by describing them in natural language
  • Image captioning — Auto-generate descriptions for images
  • OCR — Extract text from images
  • Visual grounding — Find objects matching a text description

RADIO

PropertyValue
StatusPlanned
TaskFeature extraction + similarity search
InputImage
OutputDense feature embeddings
Data Types2D images, 3D point clouds
RADIO (Robust And Diverse Image-Output) generates dense feature embeddings for images and point clouds. Use it for:
  • Similarity search — Find visually similar items across your dataset
  • Clustering — Group similar images for batch annotation
  • Active learning — Identify the most informative samples to annotate next
  • Anomaly detection — Find outliers in your dataset

Model Configuration

Models are configured per-organization. SageMaker endpoints are set up by the Avala team during onboarding. Custom HTTP endpoints can be configured self-service via the API.

Current Setup Process

  1. Contact Avala to enable inference for your organization
  2. Avala deploys a SageMaker endpoint in your preferred AWS region
  3. The endpoint is configured in Mission Control under Settings > Inference
  4. Models are available in the annotation editor and via the auto-label API

Using Models in the Annotation Editor

  1. Open a sequence in the annotation editor
  2. Select the AI Assist tool from the toolbar
  3. Choose the model (SAM or YOLO)
  4. For SAM: click a point or draw a box on the object
  5. For YOLO: click Detect All to run on the entire image
  6. Review and accept/modify the predictions

Using Models via API

Trigger auto-labeling programmatically:
curl -X POST "https://api.avala.ai/api/v1/projects/{project_uid}/auto-label" \
  -H "X-Avala-Api-Key: $AVALA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "provider_id": "inf_abc123",
    "confidence_threshold": 0.6
  }'
Current limitations:
  • SageMaker endpoints are configured by the Avala team during onboarding.
  • Custom HTTP endpoints can be configured via the REST API (see Model Inference).
  • Batch auto-labeling is limited to 5,000 items per job (see Batch Auto-Labeling).

Prediction Format

All models return predictions in Avala’s standard annotation format:
{
  "annotations": [
    {
      "type": "bounding_box",
      "label": "car",
      "confidence": 0.94,
      "coordinates": { "x": 120, "y": 340, "width": 200, "height": 150 }
    },
    {
      "type": "segmentation_mask",
      "label": "pedestrian",
      "confidence": 0.87,
      "mask_url": "https://..."
    }
  ]
}
See Supported Prediction Types for the full list.

Next Steps