DiPPeST: Diffusion-based Path Planner for Synthesizing Trajectories Applied on Quadruped Robots

Maria Stamatopoulou*             Jianwei Liu*             Dimitrios Kanoulas

Robot Perception Lab (RPL), Computer Science @ UCL.




Code Release ETA: End of October 2024



Abstract

We present DiPPeST, a novel image and goal conditioned diffusion-based trajectory generator for quadrupedal robot path planning. DiPPeST is a zero-shot adaptation of our previously introduced diffusion-based 2D global trajectory generator (DiPPeR). The introduced system incorporates a novel strategy for local real-time path refinements, that is re- active to camera input, without requiring any further training, image processing, or environment interpretation techniques. DiPPeST achieves 92% success rate in obstacle avoidance for nominal environments and an average of 88% success rate when tested in environments that are up to 3.5 times more complex in pixel variation than DiPPeR. A visual-servoing framework is developed to allow for real-world execution, tested on the quadruped robot, achieving 80% success rate in different environments and showcasing improved behavior than complex state-of-the-art local planners, in narrow environments.

DiPPeST is a zero-shot transfer model trained purely on black and white mazes with a top-down view, presented below. In the following sections, we present DiPPeST's performance and generalization capabilities in different camera input scenarios to perform real-time vision-based local planning.








Generalisation Performance


We validate DiPPeST's generalization capabilities for variations in image input size, pixel intensity variation, and point of view. This will test the ability of the model to generate trajectories in RGB egocentric scenarios, with the obstacle and traversable regions being of similar colors.



1. Variation of Image Pixel Intensity


We investigate the cases where the obstacles are similar in pixel intensity (PI) to the traversable region, as well as the case of the traversable region including pixels of multiple intensities, to showcase that DiPPeST can generate paths from RGB input beyond the white traversable and black non-traversable regions of the training dataset.




Obstacle-Floor PI Difference = 26%
Obstacle-Floor PI Difference = 28%
Image PI Variation = 82%


2. Variation of Input Image Size

DiPPeST performs local refinements based on visual input, hence being subject to variable camera FoV based on hardware specifications.

Training Dataset: [100,3,3]
iPhone 11 Camera: [3264,2448,3]
RealSense D435i Camera [720,1280,3]


3. Variation of Camera Point-of-View

DiPPeST should generalize to input images of varying PoV, reflecting variations in camera angle for real-world scenarios. The training dataset includes top-down maps, hence proving generalization from an egocentric perspective is essential to achieve local refinements.



Top-Down
Human-View
Robot-View




Real World Deployment

For real-world evaluation, the Unitree Go1 robot is used with DiPPeST, taking input images from an Intel RealSense D435i camera mounted on the front at an angle of 10 degrees depression. For all experiments, the global plan is generated from the first frame while the robot remains stationary and at each frame a local path is generated to perform real-time refinements. We test DiPPeST’s performance in a) static environments and b) dynamic environments.


1. Static Environments








2. Dynamic Environments