VLA Robot Arm

VLAs
Embedded Systems
Robotics
Computer Vision
VLA Robot Arm
Collaborators: Patrick Slade (Principal Investigator), Alex Ko (Co-Intern) Conducted human-in-the-loop experiments to evaluate the performance of the vision-language-action (VLA) models on activities of daily living tasks involving human interaction. We adapted the open-source LeRobot SO-101 Arm with SmolVLA with a wearable human-mounted harness, adding hard-stops to ensure user safety. We also collected 10,000+ frames of egocentric training data, which we used to fine-tune the SmolVLA model to our specific tasks, and designed experimental task protocols with varying levels of complexity & human motion to measure VLA performance under realistic conditions. As a proof of concept, we began by implementing a custom 1-DOF robotic arm based on the SmolVLA architecture with the Meta Project Aria Gen 1 Glasses as our vision system. I developed a custom camera handler to undistort, crop, and stream video into the SmolVLA recording/inference pipeline, wrote custom firmware for the Arduino Uno to control the arm, and implemented a teleoperation recorder to log camera frames, servo angles, and tasks to curate labeled dataset for model fine-tuning.

Videos

Task: Pick up the glasses and hand them to me

(Proof of Concept) Task: Point to the glasses

Presentation