DiPPeST: Diffusion-based Path Planner for Synthesizing Trajectories Applied on Quadruped Robots

We present DiPPeST, a novel image and goal conditioned diffusion-based trajectory generator for quadrupedal robot path planning. DiPPeST is a zero-shot adaptation of our previously introduced diffusion-based 2D global trajectory generator (DiPPeR). The introduced system incorporates a novel strategy for local real-time path refinements, that is re- active to camera input, without requiring any further training, image processing, or environment interpretation techniques. DiPPeST achieves 92% success rate in obstacle avoidance for nominal environments and an average of 88% success rate when tested in environments that are up to 3.5 times more complex in pixel variation than DiPPeR. A visual-servoing framework is developed to allow for real-world execution, tested on the quadruped robot, achieving 80% success rate in different environments and showcasing improved behavior than complex state-of-the-art local planners, in narrow environments.

DiPPeST is a zero-shot transfer model trained purely on black and white mazes with a top-down view, presented below. In the following sections, we present DiPPeST's performance and generalization capabilities in different camera input scenarios to perform real-time vision-based local planning.

DiPPeST: Diffusion-based Path Planner for Synthesizing Trajectories Applied on Quadruped Robots

Abstract

Generalisation Performance

We validate DiPPeST's generalization capabilities for variations in image input size, pixel intensity variation, and point of view. This will test the ability of the model to generate trajectories in RGB egocentric scenarios, with the obstacle and traversable regions being of similar colors.

1. Variation of Image Pixel Intensity

Obstacle-Floor PI Difference = 26%

Obstacle-Floor PI Difference = 28%

Image PI Variation = 82%

2. Variation of Input Image Size

DiPPeST performs local refinements based on visual input, hence being subject to variable camera FoV based on hardware specifications.

Training Dataset: [100,3,3]

iPhone 11 Camera: [3264,2448,3]

RealSense D435i Camera [720,1280,3]

3. Variation of Camera Point-of-View

DiPPeST should generalize to input images of varying PoV, reflecting variations in camera angle for real-world scenarios. The training dataset includes top-down maps, hence proving generalization from an egocentric perspective is essential to achieve local refinements.

Top-Down

Human-View

Robot-View

Real World Deployment

1. Static Environments

2. Dynamic Environments