Avala integrates ML models for AI-assisted annotation — generating pre-annotations that human annotators review, accept, or correct. This page documents all supported and planned models.
Currently Available
SAM (Segment Anything Model)
| Property | Value |
|---|
| Provider | Amazon SageMaker |
| Task | Interactive segmentation |
| Input | Image + point/box prompt |
| Output | Segmentation masks |
| Data Types | 2D images |
SAM generates pixel-level segmentation masks from point or bounding box prompts. In the annotation editor, click a point on an object or draw a bounding box, and SAM returns a precise mask.
Supported interactions:
- Point prompt — Click on an object to generate a mask
- Box prompt — Draw a bounding box to constrain the mask region
- Multi-point — Click multiple positive/negative points to refine
Annotator clicks point → Avala sends to SageMaker → SAM returns mask → Mask rendered in editor
SAM runs on a managed SageMaker endpoint. Latency is typically 200–500ms per prompt. The model is warmed and shared across your organization — no cold starts after initial setup.
YOLO (You Only Look Once)
| Property | Value |
|---|
| Provider | Amazon SageMaker |
| Task | Object detection |
| Input | Image |
| Output | Bounding boxes + labels + confidence scores |
| Data Types | 2D images |
YOLO detects objects in images and returns bounding boxes with class labels and confidence scores. Use it to pre-annotate entire images with detections that annotators then review.
Capabilities:
- Multi-class object detection
- Confidence score per detection
- Configurable confidence threshold (default: 0.5)
- Filters low-confidence detections automatically
Label mapping: YOLO predictions are mapped to your project’s label set. Only predictions matching configured labels are returned.
Planned Models
The following models are on the roadmap but not yet available. Timelines are approximate. Contact support@avala.ai if you need early access.
SAM 2
| Property | Value |
|---|
| Status | Planned |
| Task | Video segmentation + tracking |
| Input | Video frames + point/box prompt |
| Output | Segmentation masks propagated across frames |
| Data Types | Video sequences |
SAM 2 extends SAM to video — prompt on a single frame and propagate masks across the entire sequence. This dramatically reduces annotation time for video datasets.
Key improvements over SAM:
- Temporal consistency across frames
- Object tracking with mask propagation
- Memory-efficient streaming inference
- Support for occlusion handling
Florence-2
| Property | Value |
|---|
| Status | Planned |
| Task | Multi-task visual understanding |
| Input | Image + text prompt |
| Output | Bounding boxes, labels, captions, OCR |
| Data Types | 2D images |
Florence-2 is a multi-task vision-language model that can perform detection, captioning, OCR, and grounding from text prompts. Use it for:
- Open-vocabulary detection — Detect objects by describing them in natural language
- Image captioning — Auto-generate descriptions for images
- OCR — Extract text from images
- Visual grounding — Find objects matching a text description
RADIO
| Property | Value |
|---|
| Status | Planned |
| Task | Feature extraction + similarity search |
| Input | Image |
| Output | Dense feature embeddings |
| Data Types | 2D images, 3D point clouds |
RADIO (Robust And Diverse Image-Output) generates dense feature embeddings for images and point clouds. Use it for:
- Similarity search — Find visually similar items across your dataset
- Clustering — Group similar images for batch annotation
- Active learning — Identify the most informative samples to annotate next
- Anomaly detection — Find outliers in your dataset
Model Configuration
Models are configured per-organization. SageMaker endpoints are set up by the Avala team during onboarding. Custom HTTP endpoints can be configured self-service via the API.
Current Setup Process
- Contact Avala to enable inference for your organization
- Avala deploys a SageMaker endpoint in your preferred AWS region
- The endpoint is configured in Mission Control under Settings > Inference
- Models are available in the annotation editor and via the auto-label API
Using Models in the Annotation Editor
- Open a sequence in the annotation editor
- Select the AI Assist tool from the toolbar
- Choose the model (SAM or YOLO)
- For SAM: click a point or draw a box on the object
- For YOLO: click Detect All to run on the entire image
- Review and accept/modify the predictions
Using Models via API
Trigger auto-labeling programmatically:
curl -X POST "https://api.avala.ai/api/v1/projects/{project_uid}/auto-label" \
-H "X-Avala-Api-Key: $AVALA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"provider_id": "inf_abc123",
"confidence_threshold": 0.6
}'
Current limitations:
- SageMaker endpoints are configured by the Avala team during onboarding.
- Custom HTTP endpoints can be configured via the REST API (see Model Inference).
- Batch auto-labeling is limited to 5,000 items per job (see Batch Auto-Labeling).
All models return predictions in Avala’s standard annotation format:
{
"annotations": [
{
"type": "bounding_box",
"label": "car",
"confidence": 0.94,
"coordinates": { "x": 120, "y": 340, "width": 200, "height": 150 }
},
{
"type": "segmentation_mask",
"label": "pedestrian",
"confidence": 0.87,
"mask_url": "https://..."
}
]
}
See Supported Prediction Types for the full list.
Next Steps