New to embodied AI? This glossary covers the 50 most important terms you’ll encounter, explained without jargon.

A

Action Space — The set of all possible actions a robot can take. For a robot arm, this might be 6 joint angles. For a mobile robot, it might be forward speed and turning rate.

ACT (Action Chunking with Transformers) — A popular imitation learning algorithm that predicts a chunk of future actions at once instead of one action at a time. This makes policies smoother and more robust.

B

Behavior Cloning — The simplest form of imitation learning. Train a neural network to map from observations to actions using demonstration data. Simple but can fail when the robot encounters states not seen in demonstrations.

Bipedal Locomotion — Walking on two legs. The defining challenge of humanoid robots. Much harder than wheeled or quadruped locomotion due to the inherent instability.

C

CALVIN — A popular benchmark for language-conditioned robot manipulation. Tests a robot’s ability to follow natural language instructions like “pick up the red block and put it in the drawer.”

Contact-Rich Manipulation — Tasks where the robot makes complex physical contact with objects — assembly, tool use, insertion. These are the hardest tasks in manipulation because the physics are hard to simulate.

Cobot (Collaborative Robot) — A robot designed to work alongside humans safely. Unlike traditional industrial robots behind cages, cobots have force-limiting and collision detection.

D

Dexterous Manipulation — Using multi-fingered robot hands to manipulate objects with human-like skill. Requires coordinating many degrees of freedom simultaneously.

Diffusion Policy — A policy architecture that uses diffusion models (the same technology behind image generation) to generate robot actions. Produces smooth, multi-modal action distributions.

DOF (Degrees of Freedom) — The number of independent ways a robot can move. A typical robot arm has 6 DOF. A humanoid robot has 20-40+ DOF.

Domain Randomization — Training a policy on many randomized simulation environments so it generalizes to the real world. Vary textures, lighting, physics parameters, etc.

E

Embodied AI — AI that interacts with the physical world through a body (robot, drone, autonomous vehicle). Distinct from disembodied AI (chatbots, image generators) that only processes information.

End-Effector — The tool at the end of a robot arm — a gripper, suction cup, welding torch, etc. End-effector control means controlling the position/orientation of this tool rather than individual joint angles.

F

FAST (Frequency-domain Action Sequence Tokenization) — A method for converting continuous robot actions into discrete tokens using frequency-domain analysis (DCT). Enables VLA models to use the same token-prediction approach as language models.

Force/Torque Sensor — A sensor that measures forces and torques at a robot’s joints or end-effector. Critical for contact-rich manipulation and safe human-robot interaction.

G

Generalization — A robot’s ability to handle situations it wasn’t explicitly trained on. The holy grail of robot learning — and the biggest gap between simulation results and real-world deployment.

GR00T — NVIDIA’s Vision-Language-Action foundation model for humanoid robots. Part of the Isaac robotics platform.

Grasping — The fundamental robot manipulation skill — picking up objects. Seems simple, but robust grasping of diverse objects in cluttered environments remains an active research area.

H

Humanoid Robot — A robot with a human-like body plan — bipedal legs, two arms, a torso, and a head. The form factor is controversial — some argue it’s essential for human environments, others say it’s unnecessarily complex.

I

Imitation Learning — Training a robot by showing it what to do (demonstrations), rather than specifying a reward function. More practical than RL for many real-world tasks.

Isaac Sim — NVIDIA’s GPU-accelerated robot simulation platform. The industry standard for large-scale robot training in simulation.

J

Joint Space — The space of all possible joint angles of a robot. Planning in joint space means computing trajectories for each individual joint.

K

Kinematics — The study of robot motion without considering forces. Forward kinematics: given joint angles, where is the end-effector? Inverse kinematics: given a desired end-effector position, what joint angles achieve it?

L

LeRobot — Hugging Face’s open-source framework for robot learning. Standardizes data collection, policy training, and model sharing.

LIBERO — A benchmark for lifelong robot learning. Tests a robot’s ability to learn new tasks without forgetting old ones.

M

Manipulation — Using robot arms and hands to interact with objects — grasping, placing, assembly, tool use. The core skill that makes robots useful.

MuJoCo — A fast physics simulator widely used in robot learning research. Originally from DeepMind, now open-source.

N

Neurosymbolic — Combining neural networks with symbolic reasoning. In robotics, this typically means using a symbolic planner for high-level decisions and neural networks for perception and low-level control.

O

OXE (Open X-Embodiment) — A large-scale dataset of robot demonstrations across many different robot types. Used to train general-purpose VLA models.

P

PDDL (Planning Domain Definition Language) — A formal language for describing planning problems. Used in neurosymbolic robotics to specify task-level plans that are then executed by neural controllers.

Policy — A mapping from observations to actions. This is what the robot “runs” — it takes in what the robot sees and outputs what the robot should do.

PPO (Proximal Policy Optimization) — A popular reinforcement learning algorithm. Widely used for training robot locomotion and manipulation policies.

R

Reinforcement Learning (RL) — Learning through trial and error with a reward signal. The robot tries actions, receives rewards or penalties, and gradually learns to maximize reward.

ROS 2 (Robot Operating System 2) — The standard middleware for robot software. Handles communication between sensors, actuators, and algorithms.

S

Sim-to-Real Transfer — Deploying a policy trained in simulation on a real robot. The gap between simulation and reality is the central challenge.

State Estimation — Figuring out the current state of the robot and its environment from sensor data. Includes localization, object pose estimation, and contact state detection.

T

Teleoperation — A human directly controlling a robot, usually for data collection. Common methods include VR controllers, leader-follower arms, and keyboard/gamepad.

Tactile Sensing — Sensors that measure physical contact — pressure, texture, slip detection. Essential for dexterous manipulation.

V

VLA (Vision-Language-Action) — A type of foundation model that takes visual input and language instructions and outputs robot actions. The “foundation model for robots.”

W

Waypoint — A target position that a robot should move through. Trajectories are often specified as sequences of waypoints.

Z

Zero-Shot Transfer — Deploying a pre-trained model on a new task or environment without any additional training. The ultimate test of generalization.


Missing a term? Let us know and we’ll add it.