Skip to main content
This guide shows how to combine Avala’s annotation APIs with LangChain to build LLM-powered data workflows — auto-tag datasets, generate label suggestions, or build QA pipelines over your annotations.

Prerequisites

pip install avala langchain langchain-openai

Query Your Datasets with an LLM

Use the Avala SDK to fetch project data, then let an LLM reason over it:
from avala import Client
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

client = Client()  # reads AVALA_API_KEY
llm = ChatOpenAI(model="gpt-4o")

# Fetch dataset metadata
datasets = client.datasets.list()
dataset_info = [
    {"name": d.name, "uid": d.uid, "items": d.item_count}
    for d in datasets
]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a data management assistant. Help the user understand their annotation datasets."),
    ("user", "Here are my datasets:\n{datasets}\n\nWhich dataset should I prioritize for labeling and why?"),
])

chain = prompt | llm
response = chain.invoke({"datasets": dataset_info})
print(response.content)

Auto-Tag Exports with LLM Classification

After exporting annotations, use an LLM to add metadata tags:
import json
import time
from avala import Client
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

client = Client()
llm = ChatOpenAI(model="gpt-4o-mini")

# Export and download
export = client.exports.create(project="proj_abc123")
while export.status != "completed":
    time.sleep(2)
    export = client.exports.get(export.uid)

import requests
annotations = requests.get(export.download_url).json()

# Classify each annotation's complexity
prompt = ChatPromptTemplate.from_messages([
    ("system", "Classify the annotation complexity as 'simple', 'moderate', or 'complex' based on the number and type of objects."),
    ("user", "Annotations: {annotations}\n\nReturn only the classification word."),
])

chain = prompt | llm

for ann in annotations[:10]:  # Sample
    result = chain.invoke({"annotations": json.dumps(ann.get("annotations", []))})
    ann["complexity"] = result.content.strip().lower()
    print(f"{ann['file_name']}: {ann['complexity']}")

Build a QA Chain Over Annotation Data

Create a question-answering system that lets you query your annotation data naturally:
from avala import Client
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

client = Client()
llm = ChatOpenAI(model="gpt-4o")

# Fetch project and task stats
projects = list(client.projects.list())
project_summaries = []

for project in projects:
    tasks = list(client.tasks.list(project=project.uid))
    status_counts = {}
    for task in tasks:
        status = task.status or "unknown"
        status_counts[status] = status_counts.get(status, 0) + 1

    project_summaries.append({
        "name": project.name,
        "status": project.status,
        "tasks": status_counts,
    })

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a project analytics assistant. Answer questions about annotation project progress."),
    ("user", "Project data:\n{projects}\n\nQuestion: {question}"),
])

chain = prompt | llm

# Ask questions
questions = [
    "Which project has the most pending tasks?",
    "What's the overall completion rate across all projects?",
    "Which projects might need more annotators?",
]

for q in questions:
    answer = chain.invoke({"projects": project_summaries, "question": q})
    print(f"Q: {q}\nA: {answer.content}\n")

Use with the MCP Server

For the most natural AI integration, use the Avala MCP Server with LangChain’s MCP tool support. This lets LLMs call Avala tools directly without writing SDK code:
# With Claude Desktop, Cursor, or VS Code — the MCP server
# exposes tools like list_datasets, get_project, create_export
# that AI assistants can call directly.
See the MCP setup guide for configuration.
The SDK is currently read-only for datasets. LangChain workflows can read and analyze data, but dataset creation and uploads require the REST API. See File Uploads.

Next Steps