Logbook

RL Arm Tracking

Learning RL by building a 3D trajectory-tracking policy for a simulated Franka arm. PPO via Stable-Baselines3 + MuJoCo.

10 entries

  1. May 2026
  2. Submission and future scope of work

    Polished the GitHub repo for submission (library code into an `arm/` package, dead code removed, proper README pointing at this build log). Then some thoughts on GPU training, ripping out IK, learning physics directly, and comments on RL's observation space.

    • reflection
  3. v9 has arrived

    v9 trains with an action delta penalty (β=0.1) and 5s episode duration. Jerk dropped 25%, mean error dropped 13%, and time in the 5cm band jumped from 54% to 73%. Shipping v9 as the submission model!

    • experiment
  4. New metrics for v5 and planning v9

    Passed on MuJoCo Warp (too much JAX rewrite for the time left), added action and tracking-lag metrics, and used them to characterise v5 as "noisy but on-time" versus classical's "clean but late". Set the stage for v9 with an action delta penalty.

    • research
  5. Acceleration in obs and accidentally rewarding termination

    Audited yesterday's collision penalty, removed it, and the eval mean error dropped by ~3 cm. Added target acceleration to the observation space and v5 beats classical by 4.55 cm with time-in-band more than doubled. Then tried letting RL learn collision avoidance with early termination, which caused a "model races to terminate" failure mode.

    • experiment
  6. Comparing classical vs RL, then mixing in new trajectories

    Got the first side-by-side classical vs RL comparison out the door: RL is ~30% tighter on mean error and spends 15 more points of time in the 5 cm band, but jerkier. Moved the tracked point to a green tip site past the gripper, tried DLS for the IK folding issue (no effect), then added a figure-8 and a random-walk "fly" trajectory and kicked off a 5M-step retrain with a self-collision penalty.

    • experiment
    • decision
  7. Wiring the RL pipeline end-to-end

    Finished `trajectory.py` and wired `env.py`, `train.py`, `eval.py` end-to-end. Drilled the RL terminology along the way and watched a PPO policy learn to lead moving targets over a 1M-step training run. Self-collisions and EE visibility flagged for the next session.

    • research
    • experiment
  8. Scoping RL and mapping the arm's reach

    Settled on approach iv (RL on top of IK+PD) as the starting scope. Mapped the Panda arm's reachable workspace from 300,000 sampled joint configurations and laid out a plan to get training running.

    • decision
    • experiment
  9. Turning physics on, and a nagging decision

    Turned dynamics on for the first time with a classical IK + PD controller tracking a moving circle, and found the Panda's actuators are already PD tuned. Now weighing how much RL to take on before the deadline.

    • research
    • experiment
  10. Implementing inverse kinematics

    Built the Jacobian pseudo-inverse IK solver. It drives the Panda's wrist to a hardcoded 3D target, and degrades as expected when the target is out of reach. This is the foundation for both the classical baseline and the RL Cartesian action wrapper.

    • research
    • experiment
  11. Challenge accepted.

    Kickoff for an 11-day RL challenge: train a policy to track a 3D Cartesian trajectory on a simulated Franka arm. Scoped the problem, set up the repo with uv, and got Franka loaded in MuJoCo for first signs of life.

    • research
    • experiment