Skip to main content
Avala supports five data modalities, each with purpose-built annotation workflows and tooling. This page covers supported formats, capabilities, and upload requirements for each type.

Image

Single-frame images are the most common data type for object detection, classification, and segmentation tasks. Supported formats: JPEG, PNG, WebP Annotation workflow: Each image is annotated independently as a single frame. All 2D annotation tools are available. Use cases: Object detection, instance segmentation, semantic segmentation, image classification, keypoint detection.

Video

Video files are automatically converted to frame sequences on upload, enabling frame-by-frame annotation with object tracking across frames. Supported formats: MP4, MOV Annotation workflow: Videos are split into individual frames grouped as a sequence. Annotators navigate frame-by-frame and can track objects across the timeline. Object IDs persist across frames for consistent tracking. Use cases: Object tracking, action recognition, temporal event detection, driving scene labeling.
Video processing happens in the background after upload. Large videos may take several minutes to convert. You can monitor sequence status in Mission Control or via the API.

LiDAR / Point Cloud

3D point cloud data from LiDAR sensors, used for 3D object detection and scene understanding. Supported formats: PCD, PLY Annotation workflow: Point clouds are rendered in a 3D viewer with bird’s-eye view, perspective view, and side views. Annotators place 3D cuboids with full position, dimension, and rotation control. Use cases: 3D object detection, autonomous driving perception, robotics navigation, scene reconstruction.

MCAP

MCAP is a multi-sensor container format commonly used in robotics and autonomous vehicle development. It packages camera images, LiDAR scans, IMU data, and other sensor streams into a single recording. Supported formats: MCAP (with ROS message support) Annotation workflow: Avala parses MCAP files to extract and synchronize sensor streams. Camera images are displayed alongside projected LiDAR data, enabling multi-camera annotation with 3D context. Annotators can work across camera views with consistent 3D cuboid projections. Use cases: Multi-sensor fusion, surround-view perception, autonomous vehicle data labeling, robotics sensor calibration.
MCAP support includes automatic extraction of camera intrinsics and extrinsics for accurate LiDAR-to-camera projection. See the MCAP / ROS integration guide for setup details.

Splat

3D Gaussian Splat data for annotating reconstructed 3D scenes. Supported formats: Gaussian Splat Annotation workflow: Splat scenes are rendered in a 3D viewer where annotators can navigate the reconstructed environment and place 3D annotations directly in the scene. Use cases: 3D scene understanding, novel view synthesis annotation, spatial AI training data.

Capabilities Comparison

The following table shows which annotation tools are available for each data type:
Annotation ToolImageVideoPoint CloudMCAPSplat
Bounding BoxYesYes
PolygonYesYes
3D CuboidYesYesYes
SegmentationYesYes
PolylineYesYes
KeypointsYesYes
ClassificationYesYesYesYesYes
Object TrackingYesYesYes

Upload Requirements

PropertyLimit
Max file size (images)20 MB per file
Max file size (video)2 GB per file
Max file size (point cloud)500 MB per file
Max file size (MCAP)5 GB per file
Supported image formatsJPEG, PNG, WebP
Supported video formatsMP4, MOV
Supported point cloud formatsPCD, PLY
Supported multi-sensor formatsMCAP
Upload limits may vary depending on your plan. Contact support@avala.ai if you need to upload files that exceed these limits.

Next Steps