Why teams come to us
Modern humanoid and manipulation programs need data that does not exist on the public internet: egocentric video, motion-capture aligned with robot kinematics, hand pose under realistic clutter, and scene-aware captures that match the customer's deployment environment. Off-the-shelf datasets get teams to a demo, but rarely to a deployable model.
How we run a program
1. Scope per task and robot body
We start by mapping the customer's downstream task — pick-and-place, bimanual manipulation, household navigation — to the capture modalities and ground-truth signals the model actually consumes. Output is a one-page capture spec per task.
2. Multi-modal capture
Programs combine egocentric and third-person video, optical motion capture, IMU streams, hand pose, and scene metadata. Sessions are run in our partner studios in Asia with calibrated rigs and operator scripts written for the customer's task.
3. Reference-aligned delivery
Every clip is exported in a customer-specified schema (RLDS, parquet, or custom) and aligned to the reference robot body. QC is gated on per-frame consistency, not just clip-level review.
What gets delivered
- Capture sessions sized to the customer task — typically delivered in monthly batches.
- Aligned, labelled exports in the customer's preferred schema.
- Operator notes and edge-case logs so the model team can debug failure modes.
Engagement model
Programs run on rolling SOWs — capture cadence and modality mix are revisited every two weeks against the customer's evaluation results. Pricing is per-session for capture, per-clip for annotation.

