Skip to main content

Capstone Project: The Autonomous Humanoid

Project Goal

The capstone project integrates all modules to create a simulated autonomous humanoid robot that can:

  1. Receive voice commands via natural language
  2. Plan a path and navigate around obstacles
  3. Identify objects in the environment using perception
  4. Manipulate objects (pick, place, interact) in simulation

This project demonstrates end-to-end Physical AI: from human communication to robot action execution.

System Architecture

High-Level Architecture

┌─────────────────┐
│ Voice Input │ (Module 4: VLA)
│ (Whisper ASR) │
└────────┬────────┘


┌─────────────────┐
│ Task Planning │ (Module 4: VLA)
│ (LLM: GPT-4) │
└────────┬────────┘


┌─────────────────┐
│ Action Executor│ (Module 1: ROS 2)
│ (ROS 2 Actions)│
└────────┬────────┘

┌────┴────┐
│ │
▼ ▼
┌────────┐ ┌──────────┐
│Navigation│ │Manipulation│
│ (Nav2) │ │ (MoveIt) │
└────┬───┘ └─────┬────┘
│ │
▼ ▼
┌─────────────────────┐
│ Perception │ (Module 3: Isaac)
│ (VSLAM, Detection) │
└──────────┬──────────┘


┌─────────────────────┐
│ Digital Twin │ (Module 2: Gazebo)
│ (Simulated Robot) │
└─────────────────────┘

Component Integration

Module 1 (ROS 2) provides:

  • Communication infrastructure (topics, services, actions)
  • Action execution framework
  • System integration and coordination

Module 2 (Gazebo) provides:

  • Physics simulation environment
  • Robot model and sensors
  • Realistic physics for manipulation

Module 3 (Isaac) provides:

  • Advanced perception (VSLAM, object detection)
  • High-fidelity sensor simulation
  • Sim-to-real ready models

Module 4 (VLA) provides:

  • Natural language understanding
  • Task planning from commands
  • Human-robot interaction interface

Workflow Example

Scenario: "Clean the room"

Step 1: Voice Input

  • User speaks: "Clean the room"
  • Whisper transcribes: "Clean the room"

Step 2: Task Planning

  • LLM receives command
  • LLM generates plan:
    {
    "goal": "clean_room",
    "tasks": [
    {"id": 1, "action": "navigate", "target": "living_room"},
    {"id": 2, "action": "scan", "purpose": "find_objects"},
    {"id": 3, "action": "detect", "object_type": "trash"},
    {"id": 4, "action": "navigate", "target": "trash_location"},
    {"id": 5, "action": "pick", "object": "trash"},
    {"id": 6, "action": "navigate", "target": "trash_bin"},
    {"id": 7, "action": "place", "object": "trash", "location": "trash_bin"}
    ]
    }

Step 3: Action Execution

  • Action executor processes plan sequentially
  • Each task triggers ROS 2 actions:
    • Navigation actions → Nav2 stack
    • Perception actions → Isaac ROS pipelines
    • Manipulation actions → MoveIt or custom controllers

Step 4: Robot Behavior

  • Simulated humanoid executes behaviors in Gazebo
  • Real-time feedback to action executor
  • Status updates reported back to user

Milestones

Milestone 1: Voice → Text → Plan

Goal: Convert voice commands to structured task plans

Tasks:

  • Set up Whisper ASR (local or API)
  • Integrate LLM (GPT-4, Claude, or local model)
  • Create prompt engineering for task planning
  • Parse LLM output into structured JSON plan
  • Test with sample commands: "go to kitchen", "pick up cup"

Success Criteria:

  • System accepts voice/text input
  • LLM generates reasonable task plans
  • Plans are in structured, parseable format

Milestone 2: Plan → ROS 2 Actions

Goal: Execute simple actions based on plans

Tasks:

  • Create ROS 2 action server for basic behaviors
  • Implement action executor that processes plans
  • Test with simple mobile base or proxy robot
  • Execute navigation commands (move forward, turn)
  • Add error handling and status reporting

Success Criteria:

  • Action executor can parse and execute plans
  • At least 3 different action types work (e.g., navigate, wait, report)
  • Robot responds to commands in simulation

Milestone 3: Perception and Manipulation

Goal: Add perception and manipulation capabilities

Tasks:

  • Integrate Isaac ROS perception (VSLAM, object detection)
  • Add manipulation actions (pick, place, grasp)
  • Test object detection in simulation
  • Test manipulation with simple objects
  • Combine navigation + perception + manipulation

Success Criteria:

  • Robot can detect objects in environment
  • Robot can navigate to detected objects
  • Robot can manipulate objects (pick/place)
  • End-to-end workflow works for simple scenarios

Milestone 4: Full Integration

Goal: Complete capstone with all modules integrated

Tasks:

  • Integrate all milestones into single system
  • Test complex scenarios (multi-step tasks)
  • Add error recovery and replanning
  • Create demonstration video
  • Document architecture and workflow

Success Criteria:

  • Complete voice-to-action pipeline works
  • System handles at least 3 different command types
  • Error cases are handled gracefully
  • Documentation is complete and clear

Assessment

Minimal Working Capstone

To pass the capstone, you must demonstrate:

  1. Voice/Text Input: System accepts natural language commands
  2. Task Planning: LLM generates structured plans from commands
  3. Action Execution: At least 2 different ROS 2 actions execute
  4. Simulation: Robot performs behaviors in Gazebo or Isaac Sim
  5. Documentation: Clear explanation of architecture and workflow

Stretch Goals

  • Advanced Perception: Object detection, semantic segmentation
  • Complex Manipulation: Multi-step manipulation sequences
  • Error Recovery: Handle failures and replan
  • Real Robot: Deploy to physical robot (if available)
  • Multi-Robot: Coordinate multiple robots

Implementation Tips

Start Simple

  • Begin with text input (skip voice initially)
  • Use simple commands ("go forward", "turn left")
  • Test with basic mobile base before humanoid
  • Add complexity incrementally

Modular Development

  • Develop each module independently
  • Use ROS 2 topics/services for integration
  • Test interfaces between modules
  • Document APIs clearly

Simulation First

  • Develop entirely in simulation
  • Test extensively before physical deployment
  • Use Gazebo for physics, Isaac Sim for perception
  • Validate sim-to-real assumptions

Iterative Refinement

  • Get minimal version working first
  • Add features one at a time
  • Test after each addition
  • Refactor for maintainability

Resources

Next Steps

  • Review all modules to understand integration points
  • Start with Milestone 1 (voice → plan)
  • Build incrementally toward full system
  • Document your progress and challenges

Good luck with your capstone project!