Scalable Multimodal AI System

The challenge

The customer was scaling a multimodal model into seven scientific domains and needed a partner who could keep up — not just on volume, but on consistency across text, image, and audio modalities. They had been burned previously by vendors whose pass rate dropped sharply once headcount grew.

Our approach

Pod-of-pods structure

Rather than one large pool of annotators, we ran the program as seven domain pods (one per scientific area) with a central review layer. Each pod reported to a senior expert from that field; the central layer enforced cross-pod consistency.

Calibrated growth

The team grew to roughly 600 expert makers over four months. Every new annotator went through the same calibration set as the founding cohort, so quality stayed flat as headcount climbed.

LLM-assisted pre-labelling

For high-volume image and audio prompts, we used model-assisted pre-labelling with human-in-the-loop verification. The reviewer's time was spent on edge cases, not on copy-paste work.

Outcome

48,000 high-quality visual prompts delivered across seven scientific domains.
~600 vetted expert makers active by month four.
90% sustained pass rate on the customer's hold-out evaluation.
Full ramp from zero to delivery in four months.

What made it work

The pod-of-pods structure meant that scaling did not dilute domain expertise. The customer was able to hand us a new domain mid-program without losing speed in the existing six.

Scalable Multimodal AI System

The challenge

Our approach

Pod-of-pods structure

Calibrated growth

LLM-assisted pre-labelling

Outcome

What made it work

Ready to run a similar program?

More case studies

Terminal Bench: Agent Evaluation Platform

Physical AI: Custom Robotics Data Programs

High-Accuracy CAD Annotation