Logbook

RL Arm Tracking

Learning RL by building a 3D trajectory-tracking policy for a simulated Franka arm. PPO via Stable-Baselines3 + MuJoCo.

10 entries

May 2026
May 24, 2026

Submission and future scope of work

Polished the GitHub repo for submission (library code into an `arm/` package, dead code removed, proper README pointing at this build log). Then some thoughts on GPU training, ripping out IK, learning physics directly, and comments on RL's observation space.
- reflection
May 23, 2026

v9 has arrived

v9 trains with an action delta penalty (β=0.1) and 5s episode duration. Jerk dropped 25%, mean error dropped 13%, and time in the 5cm band jumped from 54% to 73%. Shipping v9 as the submission model!
- experiment
May 22, 2026

New metrics for v5 and planning v9

Passed on MuJoCo Warp (too much JAX rewrite for the time left), added action and tracking-lag metrics, and used them to characterise v5 as "noisy but on-time" versus classical's "clean but late". Set the stage for v9 with an action delta penalty.
- research
May 21, 2026

Acceleration in obs and accidentally rewarding termination

Audited yesterday's collision penalty, removed it, and the eval mean error dropped by ~3 cm. Added target acceleration to the observation space and v5 beats classical by 4.55 cm with time-in-band more than doubled. Then tried letting RL learn collision avoidance with early termination, which caused a "model races to terminate" failure mode.
- experiment
May 20, 2026

Comparing classical vs RL, then mixing in new trajectories

Got the first side-by-side classical vs RL comparison out the door: RL is ~30% tighter on mean error and spends 15 more points of time in the 5 cm band, but jerkier. Moved the tracked point to a green tip site past the gripper, tried DLS for the IK folding issue (no effect), then added a figure-8 and a random-walk "fly" trajectory and kicked off a 5M-step retrain with a self-collision penalty.
- experiment
- decision
May 19, 2026

Wiring the RL pipeline end-to-end

Finished `trajectory.py` and wired `env.py`, `train.py`, `eval.py` end-to-end. Drilled the RL terminology along the way and watched a PPO policy learn to lead moving targets over a 1M-step training run. Self-collisions and EE visibility flagged for the next session.
- research
- experiment
May 17, 2026

Scoping RL and mapping the arm's reach

Settled on approach iv (RL on top of IK+PD) as the starting scope. Mapped the Panda arm's reachable workspace from 300,000 sampled joint configurations and laid out a plan to get training running.
- decision
- experiment
May 15, 2026

Turning physics on, and a nagging decision

Turned dynamics on for the first time with a classical IK + PD controller tracking a moving circle, and found the Panda's actuators are already PD tuned. Now weighing how much RL to take on before the deadline.
- research
- experiment
May 14, 2026

Implementing inverse kinematics

Built the Jacobian pseudo-inverse IK solver. It drives the Panda's wrist to a hardcoded 3D target, and degrades as expected when the target is out of reach. This is the foundation for both the classical baseline and the RL Cartesian action wrapper.
- research
- experiment
May 13, 2026

Challenge accepted.

Kickoff for an 11-day RL challenge: train a policy to track a 3D Cartesian trajectory on a simulated Franka arm. Scoped the problem, set up the repo with uv, and got Franka loaded in MuJoCo for first signs of life.
- research
- experiment

Amjad Yaghi

RL Arm Tracking

Submission and future scope of work

v9 has arrived

New metrics for v5 and planning v9

Acceleration in obs and accidentally rewarding termination

Comparing classical vs RL, then mixing in new trajectories

Wiring the RL pipeline end-to-end

Scoping RL and mapping the arm's reach

Turning physics on, and a nagging decision

Implementing inverse kinematics

Challenge accepted.