Image
Single-frame images are the most common data type for object detection, classification, and segmentation tasks. Supported formats: JPEG, PNG, WebP Annotation workflow: Each image is annotated independently as a single frame. All 2D annotation tools are available. Use cases: Object detection, instance segmentation, semantic segmentation, image classification, keypoint detection.Video
Video files are automatically converted to frame sequences on upload, enabling frame-by-frame annotation with object tracking across frames. Supported formats: MP4, MOV Annotation workflow: Videos are split into individual frames grouped as a sequence. Annotators navigate frame-by-frame and can track objects across the timeline. Object IDs persist across frames for consistent tracking. Use cases: Object tracking, action recognition, temporal event detection, driving scene labeling.Video processing happens in the background after upload. Large videos may take several minutes to convert. You can monitor sequence status in Mission Control or via the API.
LiDAR / Point Cloud
3D point cloud data from LiDAR sensors, used for 3D object detection and scene understanding. Supported formats: PCD, PLY Annotation workflow: Point clouds are rendered in a 3D viewer with bird’s-eye view, perspective view, and side views. Annotators place 3D cuboids with full position, dimension, and rotation control. Use cases: 3D object detection, autonomous driving perception, robotics navigation, scene reconstruction.MCAP
MCAP is a multi-sensor container format commonly used in robotics and autonomous vehicle development. It packages camera images, LiDAR scans, IMU data, and other sensor streams into a single recording. Supported formats: MCAP (with ROS message support) Annotation workflow: Avala parses MCAP files to extract and synchronize sensor streams. Camera images are displayed alongside projected LiDAR data, enabling multi-camera annotation with 3D context. Annotators can work across camera views with consistent 3D cuboid projections. Use cases: Multi-sensor fusion, surround-view perception, autonomous vehicle data labeling, robotics sensor calibration.MCAP support includes automatic extraction of camera intrinsics and extrinsics for accurate LiDAR-to-camera projection. See the MCAP / ROS integration guide for setup details.
Splat
3D Gaussian Splat data for annotating reconstructed 3D scenes. Supported formats: Gaussian Splat Annotation workflow: Splat scenes are rendered in a 3D viewer where annotators can navigate the reconstructed environment and place 3D annotations directly in the scene. Use cases: 3D scene understanding, novel view synthesis annotation, spatial AI training data.Capabilities Comparison
The following table shows which annotation tools are available for each data type:| Annotation Tool | Image | Video | Point Cloud | MCAP | Splat |
|---|---|---|---|---|---|
| Bounding Box | Yes | Yes | — | — | — |
| Polygon | Yes | Yes | — | — | — |
| 3D Cuboid | — | — | Yes | Yes | Yes |
| Segmentation | Yes | Yes | — | — | — |
| Polyline | Yes | Yes | — | — | — |
| Keypoints | Yes | Yes | — | — | — |
| Classification | Yes | Yes | Yes | Yes | Yes |
| Object Tracking | — | Yes | Yes | Yes | — |
Upload Requirements
| Property | Limit |
|---|---|
| Max file size (images) | 20 MB per file |
| Max file size (video) | 2 GB per file |
| Max file size (point cloud) | 500 MB per file |
| Max file size (MCAP) | 5 GB per file |
| Supported image formats | JPEG, PNG, WebP |
| Supported video formats | MP4, MOV |
| Supported point cloud formats | PCD, PLY |
| Supported multi-sensor formats | MCAP |