Monthly Updates

Tracking the progress, challenges, and milestones of my Final Year Project.

January 2026

December: Aerial Manipulator Integration & Simulation Validation

Work Carried Out

In December, I focused on validating and extending the hexacopter and aerial manipulator simulation. The hexacopter SDF model was validated through motor tests in Gazebo, where stable hover was achieved at approximately 459 rad/s per motor with the current mass and configuration, confirming that the simulation closely matches the real platform.

I then integrated the 3-DoF manipulator with the hexacopter in simulation and tested the combined aerial manipulator system. These tests revealed significant dynamic disturbances on the hexacopter caused by manipulator motion, highlighting important coupling effects that must be addressed in future control design.

In parallel, I worked on the planning and design considerations for a residual reinforcement learning controller. This included defining the action space, observation space, and reward structure, and outlining how learned residual actions would augment the baseline PX4 controller while preserving stability and supporting future sim-to-real transfer. No RL training or implementation was carried out at this stage.

I also contributed to the mid-review presentation and report documentation, summarizing simulation results, system integration progress, and planned control developments.

Problems Encountered

A key issue arose from the model spawning mechanism. The hexacopter was spawned through PX4 SITL, while the manipulator was spawned separately by Gazebo, leading to difficulties when attaching the manipulator at the world level and resulting in incorrect model linkage and dynamics.

Solutions / How Issues Were Addressed

This was resolved by integrating the manipulator directly into the hexacopter SDF model, ensuring that both the hexacopter and manipulator were spawned together under PX4 SITL. This provided correct kinematic and dynamic coupling and enabled reliable combined-system simulation and testing.

November 2025

November: Residual RL Control Setup and SDF-to-URDF Conversion

Work Carried Out

In November, I worked on the control and simulation setup for the reinforcement learning–based aerial manipulator. I reviewed research on residual reinforcement learning and decided to use a residual RL approach that combines the existing PX4 controller with a learned RL component. This keeps the stability of the classical controller while allowing RL to fine-tune the behaviour, which supports smoother transfer from simulation to the real hexacopter.

I finalized the SDF model of the hexacopter to closely match the real platform and created an equivalent URDF model for use in ROS 2 and related tools. I also updated the PX4 SITL firmware to support residual control by adding the necessary residual control topics, such as residual thrust and residual torque, for the future RL agent.

In addition, I performed an open-loop motor test and verified hover at approximately 459 rad/s, which validated rotor parameters and supported simulation tuning.

In parallel, I evaluated cost-effective compute options with enough resources for Gazebo simulation and RL training. This included comparing GPU-enabled VMs and workstation options based on capability and pricing.

Problems Encountered

The main difficulty was converting the detailed hexacopter SDF into a valid URDF, as existing SDF-to-URDF conversion tools often failed or produced incorrect models.

Solutions / How Issues Were Addressed

I used GitHub Copilot together with a step-by-step divide-and-conquer approach. I manually converted links, joints, and inertia parameters from SDF to URDF in small sections, checking and fixing errors incrementally until the URDF loaded correctly in the required tools.

October 2025

October: Hexacopter Model Refinement and PX4 SITL Integration

Work Carried Out

In October, I refined the hexacopter SDF model to closely represent the real platform. Each link and joint was positioned to match the physical frame, and I calculated and documented inertia matrices for all links to improve dynamic realism. The updated model was integrated with the PX4 SITL setup and verified through multiple test runs to confirm stability and responsiveness.

I also identified and corrected an issue in rotor modeling. Previously, maximum rotor speeds were estimated without considering motor current and power limits. After incorporating these constraints, I recalculated the thrust and torque constants (kf and mc) and updated the PX4 airframe configuration to keep the simulation consistent with the physical system.

Problems Encountered

The main challenge was determining accurate inertia matrices for the hexacopter links without access to a detailed CAD model.

Solutions / How Issues Were Addressed

I derived link inertias using analytical approximations of symmetric 3D shapes. I aligned coordinate frames with the principal inertia axes, which allowed the inertia matrices to be simplified to diagonal form while maintaining physical realism. This improved simulation stability and produced more realistic dynamics.

September 2025

September: Hexacopter Simulation Setup and Parameter Selection

Work Carried Out

In September, I followed a structured course on ROS 2 and Gazebo simulation to support the development of our aerial manipulator platform. I reviewed an open-source hexacopter SDF repository to understand best practices for structuring multicopter models, sensor plugins, and rotor dynamics configuration. Since our platform is a custom hexacopter with different geometry and parameters, I did not reuse the provided models directly. Instead, I used the repository mainly as a learning reference to guide how to build and organize our own hexacopter SDF.

I then developed our custom hexacopter SDF for Gazebo Harmonic and added sensor plugins for GPS, barometer, and IMU. Rotor dynamics were implemented using the MulticopterMotorModel plugins. I successfully spawned the hexacopter in Gazebo Harmonic and tested its basic stability and hover behavior.

In addition, I derived motor and rotor parameters based on SunnySky X2216 KV1250 motors with 11×5.7-inch propellers. I documented the estimated values and assumptions to support integration and future testing.

Problems Encountered

The reference SDF model I studied was built for Ignition Gazebo, so I encountered multiple compatibility errors while developing our custom SDF for Gazebo Harmonic. Many plugin and model components required changes to work correctly in the Harmonic setup.

Solutions / How Issues Were Addressed

I referred to official documentation and resolved issues step by step by building and testing the model in smaller parts. By carefully adjusting each plugin and component and validating behavior incrementally, I was able to assemble a working custom SDF that runs in Gazebo Harmonic.

August 2025

August: Reward Shaping and Improved Waypoint Tracking

Work Carried Out

In August, I improved the initial RL implementation and refined the policy through reward shaping to achieve smoother waypoint following. I updated the setup so the quadrotor could follow waypoints continuously without stopping at intermediate points by modifying the state space and reward function.

The state space was expanded to a 20-dimensional vector using scaled position, velocity, attitude, and body rates, together with the relative position to the current and next waypoint and a desired final yaw target. To improve generalization, waypoint trajectories were generated from line, curve, and helix paths.

The reward design was updated to encourage tighter tracking and better final behavior. This included a stronger position-error penalty while keeping velocity and rate penalties for smooth motion. A large reward was given at each waypoint. After reaching the final waypoint, additional rewards encouraged a smooth stop with low velocity, alignment to the target yaw, and maintaining a stable, level attitude for a short hold period. Crash and out-of-bounds penalties were kept unchanged.

I trained multiple policies and analyzed reward progression to identify the best-performing model. I also plotted key dynamic parameters such as position, angular velocity, roll, pitch, yaw, linear acceleration, forces, and torques to better understand system behavior.

3D flight path while tracking a single waypoint using the best-performing RL checkpoint (dashed: executed path history).

Time-series of quadrotor states and control outputs for the same single-waypoint flight path shown above, generated using the best performing RL checkpoint.

Problems Encountered

The main challenge was selecting the best trained policy among multiple training runs.

Solutions / How Issues Were Addressed

I trained up to 15 million time steps and saved checkpoints every 100,000 steps. Using TensorBoard, I compared cumulative reward trends and selected the policy with the highest overall performance and most consistent improvement.

July 2025

July: RL Foundations & Initial PPO Waypoint Tracking

Work Carried Out

In July, I focused on building a strong theoretical and practical foundation in Reinforcement Learning (RL). I followed a YouTube series on RL fundamentals and completed an Udemy course to strengthen core concepts and implementation details. I also reviewed four research papers on RL-based control for aerial manipulators to understand prior work, common challenges, and evaluation practices.

In parallel, I studied quadrotor dynamics and implemented custom RL environments using the Gymnasium library. I developed an initial quadcopter environment and trained a PPO (Proximal Policy Optimization) agent to perform a simple waypoint-following task in a Python-based simulation.

As part of this implementation, I built an RL-based controller for autonomous navigation with a focus on training stability. The environment uses scaled observation states to improve convergence during PPO training. The state space was defined as a 17-dimensional vector including position, velocity, attitude (quaternion), body rates, relative position to the current waypoint, and a final-waypoint flag.

The reward function was designed to encourage stable and goal-directed motion. It included a distance-to-waypoint penalty, velocity and body-rate penalties to promote smooth flight, and a time-step penalty to encourage faster completion. Progress rewards and waypoint-arrival bonuses were added to guide the agent through the path, along with a final stopping bonus based on low linear and angular speed. Strong penalties were applied for crashes to discourage unsafe behavior.

Additionally, I researched different simulation tools and their requirements, including AirSim, Isaac Sim, and PyBullet, to evaluate potential alternatives to Gazebo for future integration and testing.

Problems Encountered

The main challenge was the absence of a complete simulation environment for realistic testing. This slowed down early experimentation and iteration.

How I Addressed the Issues

As a temporary solution, I used a Python-based quadrotor simulator from GitHub to validate and debug the PPO training pipeline. This enabled faster iteration and helped confirm that the environment design and training setup were working correctly.