1. Hardware
Our Trossen AI Stationary was purchased May 2025. We are still running Trossen Arm Driver v1.7.8. Our local computer is a System76 desktop running Ubuntu 22.0.4 with an RTX 5090 GPU. The RTX 5090 is used for LoRA fine tuning of pi0, but for full fine tuning we have been using H100 gpus remotely on runpod.
Back to top2. Software
We have started by augmenting and tweaking the gym-aloha environment, as well as the (recently deprecated) Trossen lerobot framework, with the goal of providing seamless sim to sim, sim to real, and real to sim support for the Trossen AI Stationary robot. We have also been tweaking the lerobot software for smoother Trossen AI Stationary real robot dataset acquisition. In addition, we are in the process of adding real and simulated Trossen AI Stationary support to the openpi framework. Our forks are at github.com/anredlich. Highlights:
- gym-aloha: we added *.xml mujoco files to the assets folder and augmented the sim.py and sim_end_effector.py simulator code to give gym-aloha the ability to simulate the Trossen AI Stationary robot (mujoco files and code adapted from trossen_arm_mujoco). This includes both joint controlled and end-effector controlled simulations for the transfer-cube task. For this task, we added environmental options such as box size, box position, box orientation, and box color, as well as some control over lighting, robot joint reference angles, and robot base positions.
- lerobot: we added control_sim_robot.py which uses the augmented gym-aloha environment to create and replay simulated datasets for the Trossen AI Stationary robot. We also added scripted_policy.py, a heuristic waypoint policy adapted from trossen_arm_mujoco, for the simulated robot rollouts. In addition, we modified train.py and eval.py so that they can train and evaluate policies for the simulated Trossen AI Stationary robot. Together these additions allow full sim to sim, sim to real, and real to sim evaluations. Combining simulated and real robot replay can also be used to calibrate/match the simulated to the real robot. We added better text to voice and additional voice prompts to improve real robot dataset acquisition workflow. Also added 4 new evaluate_*.py and train_*.py example files for both the old aloha and the new Trossen AI simulated robots.
- openpi: we have added hardware driver support to run pi0 policies on the Trossen AI Stationary Robot within the openpi framework. This was done by adapting the trossen_ai example code, in particular the TrossenOpenPIBridge class in main.py. Currently, we use the v1.7.8 driver with older (deprecated) lerobot code, but we recommend using the Trossen Robotics fork of openpi, with a couple of very minor modifications to main.py and aloha_policy.py, outlined here. We have also added full simulated Trossen AI Stationary robot support in our aloha_sim_trossen_ai example, which calls our gym-aloha fork in place of the gym-aloha that downloads with openpi.
3. Optimizations
This is a non-exhaustive list of small optimizations and problem resolutions that may be helpful to other Trossen AI Stationary Robot users.
- robot: Do NOT let pets or small children near the leader arms: they can swing and swoop down violently, especially if you play with the arm joint_characteristics. Almost learned this the hard way.
- robot: The right arm gripper was a bit sticky (it feels like static friction) and would over-shoot. Improved this by adjusting the embedded arm joint_characteristics variable, friction_viscous_coef, for the gripper (joint 6) from 202.61772... to 25.0. See the Trossen documentation for how to do this.
- lerobot: There was a dataset version error which prevented lerobot simulation testing and dataset visualization for older aloha and pusht datasets. Converted this to a warning.
- lerobot: Fixed a model writing error in train.py: the checkpoint config.json file was missing the "type: act" or "type: diffusion" line so the model could not be read, e.g. by eval.py. Solved this by adding type: str = "act" line to configuration_act.py and type: str = "diffusion" to configuration_diffusion.py.
- lerobot: For real robot rollouts, we found that setting the robot.max_relative_target to 0.05-0.1 radians makes a huge difference in whether a learned policy succeeds. This argument clips the maximum joint angle change in one step, thereby reducing jerky motions which seem to take the robot out of the learning distribution and often lead to failure.
- openpi: Key optimizations include: norm_stats.json specific for each dataset, the correct joint_flip_mask in aloha_policy.py for the Trossen AI Stationary robot, image resize with padding, and sim to real joint calibration and home pose. See the experimental details page for details.
4. Datasets
We have been acquiring and uploading -- to huggingface -- both real robot and simulated robot datasets. The real robot datasets were acquired using the lerobot control_robot.py with the record option. The simulated datasets were aquired using our control_sim_robot.py with the record option. These datasets can be visualized using lerobot's visualize_dataset.py or online at lerobot/visualize_dataset . See the anredlich/lerobot readme for more details. Datasets have 50-100 episodes. Here are the dataset repo_ids:
Real robot:
- ANRedlich/trossen_ai_stationary_transfer_20mm_cube_01
see video on home page - ANRedlich/trossen_ai_stationary_transfer_40mm_cube_02
- ANRedlich/trossen_ai_stationary_transfer_multi_cube_03
- ANRedlich/trossen_ai_stationary_place_lids_04
- ANRedlich/trossen_ai_stationary_pour_box_05
see video on home page - ANRedlich/trossen_ai_stationary_pop_lid_06
see video on home page
Simulated robot:
- ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_07
cube color=red, size=40mm, tabletop=black, background=none, lighting=bright - ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_08
cube_color=dark red, size=40mm, tabletop=mine, background=mine, lighting=medium - ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_10
cube_color=r,g,b, size=25,40mm, tabletop=mine, background=none, lighting=bright
see video below - ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_13
cube_color=red, tabletop=mine, background=mine, lighting=medium
tabletop=mine is image of my tabletop, background=mine is crudely images of my office walls
5. Models
We have been acquiring and uploading -- to huggingface -- learned models/policies for both the real and simulated robot datasets. So far, these are ACT models used as a baseline, with chunk_size=100, trained for 100K steps. Both the real and simulated models can be tested in the simulator using lerobot eval.py, or for individual episodes, using our evaluate_trossen_ai_stationary_policy.py. See our lerobot readme for more details. Here are the huggingface policy paths:
Real robot ACT models:
- ANRedlich/trossen_ai_stationary_real_act2_3
best real to sim, try in evaluate_pretrained_trossen_ai_policy.py, still only about 20% correct! - ANRedlich/trossen_ai_stationary_real_act5
see video on home page - ANRedlich/trossen_ai_stationary_real_act6
see video on home page
Simulated robot ACT models:
- ANRedlich/trossen_ai_stationary_sim_act7
- ANRedlich/trossen_ai_stationary_sim_act8
- ANRedlich/trossen_ai_stationary_sim_act10
see video below - ANRedlich/trossen_ai_stationary_sim_act13
best sim to real policy, but still very sensitive to conditions
Real robot pi0 models:
- ANRedlich/trossen_ai_stationary_real_pi03
LoRA fine tuned from pi0_base, see Figs 8. - ANRedlich/trossen_ai_stationary_real_pi04
Full fine tuned from pi0_base, see Figs 9 and 10.
Simulated robot pi0 models:
- ANRedlich/trossen_ai_stationary_sim_pi013
LoRA fine tuned from pi0_base, see Figs 5-7, .
6. Experiments (ACT)
The following experiments use the baseline ACT algorithm. Note that in all real robot policy rollouts robot.max_relative_target=0.05 which clips the maximum 1 step joint angle change, and is critical to smoothing the rollouts and getting good results for ACT.
- Sim to real:
Conclusion: very sensitive to matching the simulated and real environments. Able to pickup cube but not to complete the transfer.
Best model: ANRedlich/trossen_ai_stationary_sim_act13 with ~75% correct pickup, but no completed transfers. It was trained on the ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_13 dataset which is the closest match to the real environment.
Robustness: multiple cube colors and sizes and tabletop, background, and lighting variations do not seem to improve performance for the ACT algorithm in this context.
Generalization: ANRedlich/trossen_ai_stationary_sim_act7 give ~25% correct even though the env is very different from the real environment, e.g. the tabletop is black. This shows there can be some generalization, maybe due to the ACT Resnet, but this has been unreliable across experiments.
Cube color: moderately sensitive: ANRedlich/trossen_ai_stationary_sim_act8, drop to ~33%, although it was trained on an environment identical to the ..._cube_13, above, except for a slightly darker red cube.
Cube size: very sensitive.
Tabletop: moderately sensitive.
Background: moderately sensitive.
Lighting: very sensitive to the simulated environment lighting, and also the real robot lighting.
Joint angles and Arm base positions: unlike for real to sim, see below, adjusting joint angle and base position did not help sim to real performance. Not sure why? - Real to sim:
Best model: ANRedlich/trossen_ai_stationary_real_act2_3, which only gets ~20% correct in the simulated environment.
Environment: except for lighting, best environment is the same as for sim to real, see ..._cube_13 above. This is the environment that best matches the real robot.
Lighting: the best simulated lighting is different than used for sim to real: it is closer to the lighting in the real robot dataset. This lighting for sim to real, however, is not best.
Joint angles: the arms_ref env option in anredlich/gym-aloha adds a +/- shift (pos0) to the simulated robot joint angles. In the calibration, below, it was discovered that this was necessary for joints 1 and 2, to get real and sim to match. We believe this is due to gravity weighing down the real arms.
Arm base position: the arms_pos env option was used to place the simulated arm base positions where they should be based on both the calibration and measuring their actual position on the real robot.
- Calibration:
Replay: using the replay option in control_sim_robot.py in our lerobot fork, any of the real robot datasets can be replayed in simulation. Likewise, any of the simulated datasets can be replayed by control_robot.py on the real robot. This allows precise alignment of sim and real for an actual task.
Joint angles: to get the real and sim replays to work perfectly, it was necessary to shift joints 1 and 2 by -0.025 and 0.025 radians, respectively, using anredlich/gym-aloha arms_ref option which is implemented in sim.py using physics.named.model.qpos0. We believe this compensates for some slight sag in the real robot due to gravity.
Arm base position: using the arms_pos option in anredlich/gym-aloha, implemented with physics.model.body_pos in sim.py, the simulated robot base was moved to the y=0.0 position, which is consistent with physical measurement on the real robot.
- Pretraining:
Training: In train.py, the pretrained model policy.path=ANRedlich/trossen_ai_stationary_sim_act7, was used as pretraining to then learn an ACT policy for the dataset ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_13 . Note that the simulated environment, ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_07, used to learn ..._act_7 is very different from the ..._cube_13 environment.
Sim to sim: After only 10K steps, the learned model was correct 98% of the time, using eval.py on out-of-sample examples. This compares to only 90% correct learning from scratch using 100K steps. Training for 10K steps from scratch did not work well.
Sim to real with sim pretraining: After only 10K steps, the sim model using pretraining was approximately as good on sim to real as ..._act13, see above, which trained from scratch for 100K steps.
Real to real with sim pretraining: The sim model ..._act13 was used as the pretrained model to the continue training on the real dataset ANRedlich/trossen_ai_stationary_transfer_40mm_cube_02 for 10K steps. This gave as good a result on real to real as training on ..._cube_02 from scratch for 100K steps. Training for 10K steps from scratch did not work well.
- Real robot ACT successes:
Transfer cube: This task, for either a 20mm or 40mm cube, was easily learned by ACT from e.g. ANRedlich/trossen_ai_stationary_transfer_20mm_cube_01, see video on home page.
Pour cup to cup: This task, was easily learned by ACT from from ANRedlich/trossen_ai_stationary_pour_box_05. It works for the same range of cup placements of ~2-3 inches as in the datasets. See video on home page.
Pop lid: This task, learned from ANRedlich/trossen_ai_stationary_pop_lid_06, works well, but only if the "takeout" container is positioned carefully on the tabletop! The dataset did not have much position variety, so further experiments are planned. Also, the lid was very snug, so some crushing was necessary, even by a human using only two fingers, so again further experiments with better containers are planned. However, it does succeed! See video on home page!
- Real robot ACT failures:
Multiple cube colors, sizes, and orientations: The ACT algorithm did not learn the task in dataset ANRedlich/trossen_ai_stationary_transfer_multi_cube_03. It may be that the number of examples needs increasing, but we suspect that there is just too much task variety for ACT.
Place lids: The dataset ANRedlich/trossen_ai_stationary_place_lids_04 has many different pot and lid colors and shapes at many locations, probably too much variety for ACT, but also the number of examples might be too few.
Conclusion, tentative: ACT seems to work well for tasks with limited task and environmental variety. Not sure if this is because our datasets are too small, or if this is a fundamental limitation of ACT.
7. Experiments (openpi)
We are beginning to experiment with pi0, with plans to test pi0 on all of the above datasets, sim and real, and compare to ACT. The experiments use our fork of the openpi repository. For the real robot, we adapt the trossen_ai example (TrossenOpenPIBridge in main.py) from the Trossen fork of openpi, but for compatibility we continue to use the older 1.7.8 trossen driver and (deprecated) lerobot control code. For the simulated robot, we added the aloha_sim_trossen_ai example to our fork of openpi and we connect to our fork of the gym-aloha environment in which we have incorporated a mujoco model of the trossen ai stationary robot.
Experimental details are given here. We also discuss some minor changes needed to use the Trossen fork of openpi -- recommended -- to perform these experiments.
Aloha sim: (aloha_sim example)
- action_horizon: Just running this example with the given pi0_aloha_sim model gave about ~40% correct performance. However, increasing the default action_horizon: int = 10 in main.py to 50 -- the default during learning -- improved performance to ~85%.
- LoRA fine tuning: starting with the base policy, pi0_base, we trained on the repo_id=lerobot/aloha_sim_transfer_cube_human dataset for the original Aloha robot simulator. On an Ubuntu computer with RTX5090 gpu, 100K steps of training took about 4 hours. The results gave better performance than the pre-trained example policy, above, giving about 95-100% correct.
Trossen AI Stationary sim: (aloha_sim_trossen_ai example)
- LoRA fine tuning: starting with the base policy, pi0_base, we trained on the repo_id = ANRedlich/trossen_ai_stationary_sim_transfer_40mm_cube_13 dataset from the Trossen AI Stationary robot simulator. On an Ubuntu computer with RTX5090 gpu, 20K steps of training took about 16 hours. When tested on new examples from the same simulated environment, the performance is 95-100%, as long as success is defined as touching either left or right finger, not just the default left finger. See Fig 5.
- Generalization: the above policy learned from the trossen_ai_stationary_sim_transfer_40mm_cube_13 dataset, was then tested on a simulated environment with very different environmental parameters: wood -> black tabletop, background -> no background, medium -> bright lights, and red -> blue (or other colors). Still, performance is ~75%! To compare, see Fig 5 vs Fig 6.
- Sim to real: works really well here! As seen by comparing Fig 5 to Fig 7, the real robot environment has lighting and cube color which are different than the sim env, and yet the real robot picks up and transfers the cube successfully ~90%! Compared to sim to real for ACT, Fig 2, this is much more robust: ACT is very sensitive to lighting, for example!
- Out of distribution robustness: in both sim to sim, Fig 5 -> Fig 6, and sim to real, Fig 5 -> Fig 7, there is evidence of out-of-distribution robustness: the learning dataset paths are noise-free waypoint based, so very clean, while the grippers in Fig 6 and Fig 7 deliberately make a number of small adjustments before picking up the cube. This was not learned from the fine tuning dataset, so we believe it is prior knowledge in pi0 coming from its large scale pre-training.
Trossen AI Stationary real robot: (trossen_ai example)
- Real robot pi0 successes:
Multiple cube colors, sizes, and orientations: We trained a pi0 policy for this small (50 examples, 12min total) but moderately difficult dataset, ANRedlich/ trossen_ai_stationary_transfer_multi_cube_03, see figures 4a,b, which had failed to be learned by ACT. LoRA trainging was used for 10K steps, but with batch_size=64, which took about 12 hours on a remote H100PCIe gpu at runpod.io. (We believe a 20K step run on our local RTX5090 with default batch_size=32 would give a similar result). The real robot picked up and transferred blue cubes correctly about 80% of the time, see Fig8a, while yellow cubes achieved ~50% success, Fig 8b, and green and red ~30-50% success. These results are very encouraging given the complexity of the problem and the small number of dataset examples. They are much much better than we achieved with ACT on the same dataset!
Place lids: The dataset ANRedlich/ trossen_ai_stationary_place_lids_04, see Figs 4c,d, has 6 lids and 8 pots of multiple colors and shapes at many locations, but is small: 50 episodes, 12 min total. Our first attempt was LoRA training for 20K steps on our local RTX5090 for 16hours, with poor results, so we resumed training for another 20K steps and achieved good results for some of the lid/pot combos, including one small lid which requires high accuracy, see Fig9a. We then retrained from scratch using full fine tuning for 20K steps on a H100PCIe remote gpu which took about 12hours. The results were somewhat improved and overall very encouraging, again given the dataset size vs complexity. The robot is able to pick up 3/6 of the lids and places and drops them crudely on the pots, see Figs 9b,c,d, and it comes very close to picking up the other 3 lids.
- Real robot pi0 failures (but close):
Multiple cube colors, sizes, and orientations: As mentioned above, the pi0 LoRA policy does well with this difficult dataset, getting 30-80% correct, depending on cube color, and when it fails it gets pretty close, although sometimes it gets confused and rotates the wrist in the wrong direction. Most likely this dataset is too small!
Place lids: As mentioned above, about 3/6 lids are not picked up, but the policy can clearly see the lids and comes very close to picking them up, see Figs 10a,b. Most likely the dataset needs to be larger!
- pi0 vs ACT: The datasets where ACT fails seem to be the ones with a variety of object types and locations. This is evident from the multi-cube and lids-on-pots datasets. Also, pi0 seems to generalize much better as seen in the sim to real and sim generalization experiments, Figs 6 and 7. Also, pi0's pre-training seems to improve robustness, as is most evident in Fig7.
- LoRA vs full finetune: Full training seems to be able to learn a greater variety of objects, as is evident from the lids and pots dataset. For example, although pi0-full does not pick up the lid in Fig 10b, it gets much closer than pi0-lora which doesn't seem to "see" the metal lid at all and gets confused (not shown).
- Dataset size: Although we are seeing very encouraging results with the above datasets, the results are not perfect. We believe that this is most likely due to the small size of the datasets relative to their complexity. They each have 50 examples for a total of 12mins of data. This compares to 5-100 hours of data for task fine tuning in pi0