Path Planning API Reference#

This page documents ManipulaPy.path_planning, the module for optimized trajectory generation with adaptive GPU/CPU execution, CUDA acceleration, and collision avoidance.

Tip

For conceptual explanations, see Trajectory Planning User Guide.

Quick Navigation#

OptimizedTrajectoryPlanning Class#

class ManipulaPy.path_planning.OptimizedTrajectoryPlanning(serial_manipulator, urdf_path, dynamics, joint_limits, torque_limits=None, *, use_cuda=None, cuda_threshold=10, memory_pool_size_mb=None, enable_profiling=False, auto_optimize=True, kernel_type='auto', target_speedup=40.0)[source]#

Bases: object

Highly optimized trajectory planning class with adaptive GPU/CPU execution, memory pooling, and batch processing capabilities for 40x+ speedups.

Highly optimized trajectory planning class with adaptive GPU/CPU execution, memory pooling, and batch processing capabilities.

Constructor

Parameters:
  • use_cuda (bool | None)

  • cuda_threshold (int)

  • memory_pool_size_mb (int | None)

  • enable_profiling (bool)

  • auto_optimize (bool)

  • kernel_type (str)

  • target_speedup (float)

__init__(serial_manipulator, urdf_path, dynamics, joint_limits, torque_limits=None, *, use_cuda=None, cuda_threshold=10, memory_pool_size_mb=None, enable_profiling=False, auto_optimize=True, kernel_type='auto', target_speedup=40.0)[source]#

Enhanced trajectory planner with advanced CUDA optimizations.

Parameters:
  • serial_manipulator (SerialManipulator)

  • urdf_path (str)

  • dynamics (ManipulatorDynamics)

  • joint_limits (list[tuple[float,float]])

  • torque_limits (list[tuple[float,float]], optional)

  • use_cuda (None | bool) –

    • None β†’ auto-detect (default)

    • True β†’ force GPU (raise if CUDA absent)

    • False β†’ force CPU

  • cuda_threshold (int) – Min. (N Γ— joints) before we bother launching the GPU.

  • memory_pool_size_mb (int | None) – If set, resize the global CUDA memory pool (in MB).

  • enable_profiling (bool) – Enable CUDA profiling for performance analysis.

  • auto_optimize (bool) – Automatically setup CUDA environment for maximum performance.

  • kernel_type (str) – Kernel selection strategy: β€œauto”, β€œstandard”, β€œvectorized”, β€œmemory_optimized”, β€œwarp_optimized”, β€œcache_friendly”, β€œauto_tune”

  • target_speedup (float) – Target speedup over CPU (used for recommendations).

Return type:

None

Parameters:

  • serial_manipulator (SerialManipulator) – Robot kinematics object

  • urdf_path (str) – Path to URDF file for collision checking

  • dynamics (ManipulatorDynamics) – Robot dynamics object

  • joint_limits (list) – Joint limits as [(min, max), …] tuples

  • torque_limits (list, optional) – Torque limits as [(min, max), …] tuples

  • use_cuda (bool, optional) – Force GPU (True), CPU (False), or auto-detect (None)

  • cuda_threshold (int) – Minimum problem size before using GPU (default: 50)

  • memory_pool_size_mb (int, optional) – GPU memory pool size in MB

  • enable_profiling (bool) – Enable CUDA profiling for performance analysis

Attributes Created:

  • collision_checker (CollisionChecker) – URDF-based collision detection

  • potential_field (PotentialField) – Artificial potential field for obstacle avoidance

  • cuda_available (bool) – Whether CUDA acceleration is available

  • gpu_properties (dict) – GPU device properties for optimization

  • performance_stats (dict) – Performance tracking for GPU/CPU usage

Core Trajectory Generation

joint_trajectory(thetastart, thetaend, Tf, N, method, kernel_type=None, enable_monitoring=None)[source]#

Enhanced joint trajectory generation with advanced CUDA optimizations.

Parameters:
  • thetastart (numpy.ndarray) – The starting joint angles.

  • thetaend (numpy.ndarray) – The ending joint angles.

  • Tf (float) – The final time for the trajectory.

  • N (int) – The number of steps in the trajectory.

  • method (int) – The method to use (3=cubic, 5=quintic).

  • kernel_type (str, optional) – Override default kernel selection.

  • enable_monitoring (bool, optional) – Override default monitoring.

Returns:

A dictionary containing positions, velocities, and accelerations.

Return type:

dict

Parameters:

  • thetastart (array_like) – Starting joint angles in radians

  • thetaend (array_like) – Target joint angles in radians

  • Tf (float) – Total trajectory time in seconds

  • N (int) – Number of trajectory points

  • method (int) – Time scaling method (3=cubic, 5=quintic)

Returns:

  • trajectory (dict) – Dictionary containing:

    • positions (numpy.ndarray) – Joint positions (NΓ—n)

    • velocities (numpy.ndarray) – Joint velocities (NΓ—n)

    • accelerations (numpy.ndarray) – Joint accelerations (NΓ—n)

Adaptive Execution: Automatically chooses GPU or CPU based on problem size and hardware availability.

CUDA Acceleration: Uses optimized 2D parallelized kernels with shared memory optimization.

Collision Avoidance: Automatically applies potential field-based collision avoidance.

batch_joint_trajectory(thetastart_batch, thetaend_batch, Tf, N, method, kernel_type=None)[source]#

Enhanced batch trajectory generation with optimal kernel selection.

Parameters:
  • thetastart_batch (numpy.ndarray) – Starting angles (batch_size, num_joints)

  • thetaend_batch (numpy.ndarray) – Ending angles (batch_size, num_joints)

  • Tf (float) – Final time for all trajectories

  • N (int) – Number of trajectory points

  • method (int) – Time scaling method

  • kernel_type (str, optional) – Override kernel selection

Returns:

Batch trajectory data with shape (batch_size, N, num_joints)

Return type:

dict

Parameters:

  • thetastart_batch (numpy.ndarray) – Starting angles for multiple trajectories (batch_size, num_joints)

  • thetaend_batch (numpy.ndarray) – Ending angles for multiple trajectories (batch_size, num_joints)

  • Tf (float) – Total trajectory time in seconds

  • N (int) – Number of trajectory points

  • method (int) – Time scaling method (3=cubic, 5=quintic)

Returns:

  • trajectory (dict) – Batch trajectory data with shape (batch_size, N, num_joints)

3D Parallelization: Uses CUDA 3D grids for (batch, time, joint) parallel computation.

Memory Optimization: Efficient memory pooling for large batch processing.

cartesian_trajectory(Xstart, Xend, Tf, N, method)[source]#

Enhanced Cartesian trajectory generation with optimal kernel selection.

Parameters:
  • Xstart (np.ndarray) – Initial end-effector configuration (SE(3) matrix).

  • Xend (np.ndarray) – Final end-effector configuration (SE(3) matrix).

  • Tf (float) – Total time of motion.

  • N (int) – Number of trajectory points.

  • method (int) – Time-scaling method (3=cubic, 5=quintic).

Returns:

Dictionary with positions, velocities, accelerations, and orientations.

Return type:

dict

Parameters:

  • Xstart (numpy.ndarray) – Initial SE(3) transformation matrix (4Γ—4)

  • Xend (numpy.ndarray) – Target SE(3) transformation matrix (4Γ—4)

  • Tf (float) – Total trajectory time in seconds

  • N (int) – Number of trajectory points

  • method (int) – Time scaling method (3=cubic, 5=quintic)

Returns:

  • trajectory (dict) – Dictionary containing:

    • positions (numpy.ndarray) – Cartesian positions (NΓ—3)

    • velocities (numpy.ndarray) – Linear velocities (NΓ—3)

    • accelerations (numpy.ndarray) – Linear accelerations (NΓ—3)

    • orientations (numpy.ndarray) – Rotation matrices (NΓ—3Γ—3)

Hybrid Computation: Uses GPU for position derivatives, CPU for complex orientation interpolation.

SE(3) Interpolation: Matrix logarithms and exponentials for smooth orientation changes.

Dynamics Integration

inverse_dynamics_trajectory(thetalist_trajectory, dthetalist_trajectory, ddthetalist_trajectory, gravity_vector=None, Ftip=None)[source]#

Compute joint torques with enhanced CUDA acceleration.

Parameters:
  • thetalist_trajectory (np.ndarray) – Array of joint angles over the trajectory.

  • dthetalist_trajectory (np.ndarray) – Array of joint velocities over the trajectory.

  • ddthetalist_trajectory (np.ndarray) – Array of joint accelerations over the trajectory.

  • gravity_vector (np.ndarray, optional) – Gravity vector affecting the system.

  • Ftip (list, optional) – External forces applied at the end effector.

Returns:

Array of joint torques required to follow the trajectory.

Return type:

np.ndarray

Parameters:

  • thetalist_trajectory (numpy.ndarray) – Joint angle trajectory (NΓ—n)

  • dthetalist_trajectory (numpy.ndarray) – Joint velocity trajectory (NΓ—n)

  • ddthetalist_trajectory (numpy.ndarray) – Joint acceleration trajectory (NΓ—n)

  • gravity_vector (array_like, optional) – Gravity vector [gx, gy, gz] (default: [0, 0, -9.81])

  • Ftip (array_like, optional) – End-effector wrench [fx, fy, fz, mx, my, mz] (default: zeros)

Returns:

  • torques (numpy.ndarray) – Required joint torques (NΓ—n)

Adaptive Execution: GPU with 2D parallelization for large trajectories, CPU for small ones.

Memory Management: Automatic memory pooling and pinned memory transfers.

Robust Fallback: Automatic CPU fallback on GPU errors.

forward_dynamics_trajectory(thetalist, dthetalist, taumat, g, Ftipmat, dt, intRes)[source]#

Enhanced forward dynamics trajectory computation.

Parameters:
  • thetalist (np.ndarray) – Initial joint angles.

  • dthetalist (np.ndarray) – Initial joint velocities.

  • taumat (np.ndarray) – Array of joint torques over the trajectory.

  • g (np.ndarray) – Gravity vector.

  • Ftipmat (np.ndarray) – Array of external forces.

  • dt (float) – Time step.

  • intRes (int) – Integration resolution.

Returns:

Dictionary containing positions, velocities, and accelerations.

Return type:

dict

Parameters:

  • thetalist (numpy.ndarray) – Initial joint angles

  • dthetalist (numpy.ndarray) – Initial joint velocities

  • taumat (numpy.ndarray) – Applied torque trajectory (NΓ—n)

  • g (numpy.ndarray) – Gravity vector [gx, gy, gz]

  • Ftipmat (numpy.ndarray) – External force trajectory (NΓ—6)

  • dt (float) – Integration time step

  • intRes (int) – Integration resolution (sub-steps per dt)

Returns:

  • simulation (dict) – Dictionary containing:

    • positions (numpy.ndarray) – Simulated joint positions (NΓ—n)

    • velocities (numpy.ndarray) – Simulated joint velocities (NΓ—n)

    • accelerations (numpy.ndarray) – Simulated joint accelerations (NΓ—n)

Parallel Integration: GPU-accelerated numerical integration with automatic joint limit enforcement.

Private GPU/CPU Methods

_joint_trajectory_gpu(thetastart, thetaend, Tf, N, method, kernel_type, enable_monitoring)[source]#

GPU joint trajectory generation with optimal kernel selection.

Runs the monitored GPU trajectory generator, clips positions to the configured joint limits, optionally applies GPU collision avoidance, updates performance stats, and falls back to the CPU path on any error.

Parameters:
  • thetastart – (num_joints,) ndarray of starting joint angles (rad).

  • thetaend – (num_joints,) ndarray of ending joint angles (rad).

  • Tf (float) – Total trajectory duration, seconds.

  • N (int) – Number of trajectory points.

  • method (int) – Time-scaling method; 3 for cubic, 5 for quintic.

  • kernel_type (str) – Kernel selection strategy passed to the GPU generator (e.g. β€œauto”, β€œvectorized”).

  • enable_monitoring (bool) – If True, enable CUDA profiling/logging.

Returns:

Dict with keys "positions", "velocities" and "accelerations", each an (N, num_joints) array.

Return type:

Dict[str, np.ndarray]

GPU-accelerated joint trajectory generation with optimized memory management.

_joint_trajectory_cpu(thetastart, thetaend, Tf, N, method)[source]#

CPU joint trajectory generation with performance tracking.

Generates the trajectory via the Numba CPU kernel, clips positions to the configured joint limits, optionally applies CPU collision avoidance, and records the elapsed time as the CPU baseline used for speedup reporting.

Parameters:
  • thetastart – (num_joints,) ndarray of starting joint angles (rad).

  • thetaend – (num_joints,) ndarray of ending joint angles (rad).

  • Tf (float) – Total trajectory duration, seconds.

  • N (int) – Number of trajectory points.

  • method (int) – Time-scaling method; 3 for cubic, 5 for quintic.

Returns:

Dict with keys "positions", "velocities" and "accelerations", each an (N, num_joints) array.

Return type:

Dict[str, np.ndarray]

CPU-based joint trajectory generation with Numba optimization.

_apply_collision_avoidance_gpu(traj_pos, thetaend)[source]#

Apply potential-field collision avoidance on the CUDA-enabled path.

Collision avoidance is a joint-space computation and has no GPU kernel (the GPU potential field is 3-D Cartesian, not joint-space), so this delegates to the validated joint-space CPU routine. Kept as a distinct dispatch target so the GPU/CPU planning code paths stay symmetric.

Parameters:
  • traj_pos (ndarray) – (N, num_joints) ndarray of joint positions (rad); modified in place as colliding points are nudged out.

  • thetaend (ndarray) – (num_joints,) ndarray of goal joint angles (rad), used as the attractive target.

Returns:

The (possibly adjusted) (N, num_joints) trajectory.

Return type:

np.ndarray

Apply GPU-accelerated potential field-based collision avoidance.

_apply_collision_avoidance_cpu(traj_pos, thetaend)[source]#

Apply CPU-based potential field collision avoidance.

For each colliding trajectory point, iteratively descends the potential-field gradient toward thetaend (up to 100 iterations) until the configuration is collision-free.

Parameters:
  • traj_pos (ndarray) – (N, num_joints) ndarray of joint positions (rad); modified in place as colliding points are nudged out.

  • thetaend (ndarray) – (num_joints,) ndarray of goal joint angles (rad), used as the attractive target.

Returns:

The (possibly adjusted) (N, num_joints) trajectory.

Return type:

np.ndarray

Apply CPU-based potential field collision avoidance.

_inverse_dynamics_gpu(thetalist_trajectory, dthetalist_trajectory, ddthetalist_trajectory, gravity_vector, Ftip)[source]#

GPU-accelerated inverse dynamics over a trajectory.

Transfers the trajectory and the manipulator’s dynamics data (Glist, S_list, M_list) to the device, launches the 2D inverse-dynamics kernel to compute per-point joint torques, clips them to self.torque_limits and returns them. Falls back to the CPU implementation if dynamics-data conversion or kernel execution fails.

Parameters:
  • thetalist_trajectory – (num_points, num_joints) ndarray of joint angles, radians.

  • dthetalist_trajectory – (num_points, num_joints) ndarray of joint velocities, rad/s.

  • ddthetalist_trajectory – (num_points, num_joints) ndarray of joint accelerations, rad/s^2.

  • gravity_vector – (3,) ndarray of gravitational acceleration, m/s^2.

  • Ftip – length-6 sequence of the external wrench at the end effector.

Returns:

(num_points, num_joints) array of joint torques, clipped to the configured torque limits.

Return type:

np.ndarray

GPU-accelerated inverse dynamics computation with optimized memory management.

_inverse_dynamics_cpu(thetalist_trajectory, dthetalist_trajectory, ddthetalist_trajectory, gravity_vector, Ftip)[source]#

CPU-based inverse dynamics over a trajectory.

Computes per-point joint torques by calling self.dynamics.inverse_dynamics at each trajectory point (using zero torques for any point that raises), then clips the result to self.torque_limits.

Parameters:
  • thetalist_trajectory – (num_points, num_joints) ndarray of joint angles, radians.

  • dthetalist_trajectory – (num_points, num_joints) ndarray of joint velocities, rad/s.

  • ddthetalist_trajectory – (num_points, num_joints) ndarray of joint accelerations, rad/s^2.

  • gravity_vector – (3,) ndarray of gravitational acceleration, m/s^2.

  • Ftip – length-6 sequence of the external wrench at the end effector.

Returns:

(num_points, num_joints) array of joint torques, clipped to the configured torque limits.

Return type:

np.ndarray

CPU-based inverse dynamics computation.

_forward_dynamics_gpu(thetalist, dthetalist, taumat, g, Ftipmat, dt, intRes)[source]#

GPU-accelerated forward dynamics integration over a trajectory.

Seeds the first step with the initial state, transfers the torque profile, external forces and dynamics data to the device, and launches the forward-dynamics kernel to integrate joint motion with intRes sub-steps per outer step, enforcing joint limits on the device. Falls back to the CPU implementation on any error.

Parameters:
  • thetalist – (num_joints,) ndarray of initial joint angles, radians.

  • dthetalist – (num_joints,) ndarray of initial joint velocities, rad/s.

  • taumat – (num_steps, num_joints) ndarray of applied joint torques.

  • g – (3,) ndarray of gravitational acceleration, m/s^2.

  • Ftipmat – (num_steps, 6) ndarray of end-effector wrenches per step.

  • dt (float) – Outer time step between trajectory points, seconds.

  • intRes (int) – Number of integration sub-steps per outer step.

Returns:

Dict with keys "positions", "velocities" and "accelerations", each a (num_steps, num_joints) array.

Return type:

Dict[str, np.ndarray]

GPU-accelerated forward dynamics computation with optimized memory management.

_forward_dynamics_cpu(thetalist, dthetalist, taumat, g, Ftipmat, dt, intRes)[source]#

CPU-based forward dynamics integration over a trajectory.

Integrates joint motion step by step using self.dynamics.forward_dynamics with intRes Euler sub-steps per outer step (dt / intRes each), clamping joint angles to self.joint_limits and using zero acceleration for any step that raises.

Parameters:
  • thetalist – (num_joints,) ndarray of initial joint angles, radians.

  • dthetalist – (num_joints,) ndarray of initial joint velocities, rad/s.

  • taumat – (num_steps, num_joints) ndarray of applied joint torques.

  • g – (3,) ndarray of gravitational acceleration, m/s^2.

  • Ftipmat – (num_steps, 6) ndarray of end-effector wrenches per step.

  • dt (float) – Outer time step between trajectory points, seconds.

  • intRes (int) – Number of integration sub-steps per outer step.

Returns:

Dict with keys "positions", "velocities" and "accelerations", each a (num_steps, num_joints) array.

Return type:

Dict[str, np.ndarray]

CPU-based forward dynamics computation.

_cartesian_trajectory_gpu(pstart, pend, Tf, N, method)[source]#

GPU computation of Cartesian linear velocity and acceleration.

Launches the Cartesian trajectory kernel to compute the time-scaled translational velocity and acceleration of the straight-line path from pstart to pend. Falls back to the CPU implementation on error.

Parameters:
  • pstart – (3,) ndarray of the start position, metres.

  • pend – (3,) ndarray of the end position, metres.

  • Tf (float) – Total trajectory duration, seconds.

  • N (int) – Number of trajectory points.

  • method (int) – Time-scaling method; 3 for cubic, 5 for quintic.

Returns:

(traj_vel, traj_acc), each an (N, 3) array of linear velocity (m/s) and acceleration (m/s^2).

Return type:

Tuple[np.ndarray, np.ndarray]

GPU-accelerated Cartesian trajectory computation.

_cartesian_trajectory_cpu(pstart, pend, Tf, N, method)[source]#

CPU computation of Cartesian linear velocity and acceleration.

Evaluates the time-scaling derivatives at each point and applies them to the straight-line displacement pend - pstart to obtain the translational velocity and acceleration profiles.

Parameters:
  • pstart – (3,) ndarray of the start position, metres.

  • pend – (3,) ndarray of the end position, metres.

  • Tf (float) – Total trajectory duration, seconds.

  • N (int) – Number of trajectory points.

  • method (int) – Time-scaling method; 3 for cubic, 5 for quintic.

Returns:

(traj_vel, traj_acc), each an (N, 3) array of linear velocity (m/s) and acceleration (m/s^2).

Return type:

Tuple[np.ndarray, np.ndarray]

CPU-based Cartesian trajectory computation.

_batch_joint_trajectory_cpu(thetastart_batch, thetaend_batch, Tf, N, method)[source]#

CPU fallback for batch trajectory generation.

Generates each trajectory in the batch sequentially with the Numba CPU kernel and clips positions to the configured joint limits (matching the GPU batch path).

Parameters:
  • thetastart_batch – (batch_size, num_joints) ndarray of starting joint angles, radians.

  • thetaend_batch – (batch_size, num_joints) ndarray of ending joint angles, radians.

  • Tf (float) – Total trajectory duration, seconds.

  • N (int) – Number of trajectory points per trajectory.

  • method (int) – Time-scaling method; 3 for cubic, 5 for quintic.

Returns:

Dict with keys "positions", "velocities" and "accelerations", each a (batch_size, N, num_joints) array.

Return type:

Dict[str, np.ndarray]

CPU fallback for batch trajectory generation.

_should_use_gpu(N, num_joints)[source]#

Decide whether to dispatch a problem to the GPU.

Returns False when CUDA is unavailable or the total work (N * num_joints) is below self.cpu_threshold. Otherwise logs a debug hint when the per-SM element count is unlikely to reach the configured target speedup, then approves GPU execution.

Parameters:
  • N (int) – Number of trajectory points / timesteps.

  • num_joints (int) – Number of joints (work units per point).

Returns:

True if the GPU path should be used, False otherwise.

Return type:

bool

Determine if GPU should be used based on problem size and availability.

Parameters:

  • N (int) – Number of trajectory points

  • num_joints (int) – Number of joints

Returns:

  • bool – True if GPU should be used

_get_or_resize_gpu_array(array_name, shape, dtype=<class 'numpy.float32'>)[source]#

Return a pooled CUDA array with the requested shape / dtype.

Looks up a previously pooled device array by array_name and reuses it when its shape and dtype match; otherwise the old array (if any) is returned to the pool and a freshly sized one is acquired and cached.

Parameters:
  • array_name (str) – Cache key identifying the pooled array slot.

  • shape (Tuple[int, ...]) – Desired device array shape.

  • dtype (Any) – Desired element dtype. Defaults to np.float32.

Returns:

The pooled CUDA device array of the requested shape/dtype, or None if CUDA is unavailable.

Return type:

Any

Return a pooled CUDA array with the requested shape/dtype.

Parameters:

  • array_name (str) – Key for array cache

  • shape (tuple) – Desired array shape

  • dtype (np.dtype) – Desired data type

Returns:

  • cuda.device_array or None – GPU array or None if CUDA unavailable

Performance Monitoring

get_performance_stats()[source]#

Enhanced performance statistics with speedup analysis.

Returns:

Comprehensive performance statistics

Return type:

dict

Get comprehensive performance statistics for GPU vs CPU usage.

Returns:

  • dict – Performance statistics including:

    • gpu_calls (int) – Number of GPU function calls

    • cpu_calls (int) – Number of CPU function calls

    • total_gpu_time (float) – Total GPU execution time

    • total_cpu_time (float) – Total CPU execution time

    • avg_gpu_time (float) – Average GPU execution time

    • avg_cpu_time (float) – Average CPU execution time

    • gpu_usage_percent (float) – Percentage of operations using GPU

    • memory_transfers (int) – Number of memory transfers

    • kernel_launches (int) – Number of CUDA kernel launches

reset_performance_stats()[source]#

Reset performance statistics.

Reset all performance statistics to zero.

Return type:

None

benchmark_performance(test_cases=None, include_cpu_comparison=True)[source]#

Enhanced performance benchmarking with detailed analysis.

Parameters:
  • test_cases (list, optional) – List of test cases to benchmark.

  • include_cpu_comparison (bool) – Whether to include CPU vs GPU comparison.

Returns:

Comprehensive benchmark results

Return type:

dict

Benchmark the performance of GPU vs CPU implementations.

Parameters:

  • test_cases (list, optional) – List of test cases to benchmark

Returns:

  • dict – Benchmark results for each test case

Memory Management

cleanup_gpu_memory()[source]#

Enhanced GPU memory cleanup.

Clean up GPU memory pools and cached arrays.

Return type:

None

Visualization Methods

static plot_trajectory(trajectory_data, Tf, title='Joint Trajectory', labels=None, performance_stats=None)[source]#

Plot joint position, velocity and acceleration trajectories.

Draws a 3-by-num_joints grid of subplots (position, velocity, acceleration per joint) over a time axis spanning [0, Tf] and appends GPU speedup / kernel info to the title when performance_stats is provided. Displays the figure via plt.show().

Parameters:
  • trajectory_data (dict) – Trajectory dict with keys "positions", "velocities" and "accelerations", each an (num_steps, num_joints) array.

  • Tf (float) – Total trajectory duration, seconds, defining the time axis.

  • title (str) – Base figure title. Defaults to "Joint Trajectory".

  • labels (list, optional) – Per-joint labels; used only when its length equals num_joints, otherwise generic "Joint i" labels are generated.

  • performance_stats (dict, optional) – Optional stats with keys such as "speedup_achieved" and "best_kernel_used" used to annotate the title.

Return type:

None

Parameters:

  • trajectory_data (dict) – Trajectory data with positions, velocities, accelerations

  • Tf (float) – Total trajectory time

  • title (str, optional) – Plot title (default: β€œJoint Trajectory”)

  • labels (list, optional) – Joint labels for legend

plot_tcp_trajectory(trajectory, dt)[source]#

Enhanced TCP trajectory plotting with performance monitoring.

Parameters:
  • trajectory (list) – A list of joint angle configurations representing the trajectory.

  • dt (float) – The time step between consecutive points in the trajectory.

Returns:

None

Return type:

None

Parameters:

  • trajectory (list) – List of joint angle configurations

  • dt (float) – Time step between trajectory points

plot_cartesian_trajectory(trajectory_data, Tf, title='Cartesian Trajectory', performance_stats=None)[source]#

Enhanced Cartesian trajectory plotting with performance information.

Parameters:
  • trajectory_data (dict) – A dictionary containing trajectory data.

  • Tf (float) – The final time of the trajectory.

  • title (str, optional) – The title of the plot.

  • performance_stats (dict, optional) – Performance statistics to display.

Returns:

None

Return type:

None

Parameters:

  • trajectory_data (dict) – Cartesian trajectory data

  • Tf (float) – Total trajectory time

  • title (str, optional) – Plot title (default: β€œCartesian Trajectory”)

plot_ee_trajectory(trajectory_data, Tf, title='End-Effector Trajectory')[source]#

Enhanced end-effector trajectory plotting.

Parameters:
  • trajectory_data (dict) – A dictionary containing trajectory data.

  • Tf (float) – The final time of the trajectory.

  • title (str, optional) – The title of the plot.

Returns:

None

Return type:

None

Parameters:

  • trajectory_data (dict) – End-effector trajectory data

  • Tf (float) – Total trajectory time

  • title (str, optional) – Plot title (default: β€œEnd-Effector Trajectory”)

Utility Methods

calculate_derivatives(positions, dt)[source]#

Calculate the velocity, acceleration, and jerk of a trajectory.

Parameters:
  • positions (list or numpy.ndarray) – A list or array of positions.

  • dt (float) – The time step between each position.

Returns:

An array of velocities. acceleration (numpy.ndarray): An array of accelerations. jerk (numpy.ndarray): An array of jerks.

Return type:

velocity (numpy.ndarray)

Parameters:

  • positions (array_like) – Position trajectory

  • dt (float) – Time step between positions

Returns:

  • velocity (numpy.ndarray) – Finite difference velocities

  • acceleration (numpy.ndarray) – Finite difference accelerations

  • jerk (numpy.ndarray) – Finite difference jerk

plan_trajectory(start_position, target_position, obstacle_points)[source]#

Enhanced trajectory planning with collision avoidance.

Parameters:
  • start_position (list) – Initial joint configuration.

  • target_position (list) – Desired joint configuration.

  • obstacle_points (list) – List of obstacle points in the environment.

Returns:

Joint trajectory as a list of joint configurations.

Return type:

list

Parameters:

  • start_position (list) – Initial joint configuration

  • target_position (list) – Desired joint configuration

  • obstacle_points (list) – Environment obstacle points

Returns:

  • trajectory (list) – Planned joint trajectory waypoints

TrajectoryPlanning Class#

class ManipulaPy.path_planning.TrajectoryPlanning(*args, **kwargs)[source]#

Bases: OptimizedTrajectoryPlanning

Backward compatibility alias for OptimizedTrajectoryPlanning.

This ensures existing code continues to work while providing access to all optimizations.

Backward compatibility alias for OptimizedTrajectoryPlanning.

This ensures existing code continues to work while providing access to all optimizations.

Parameters:

Utility Functions#

ManipulaPy.path_planning.create_optimized_planner(serial_manipulator, urdf_path, dynamics, joint_limits, torque_limits=None, target_speedup=40.0, gpu_memory_mb=None, enable_profiling=False, kernel_type='auto')[source]#

Enhanced factory function to create an optimized trajectory planner.

Parameters:
  • serial_manipulator – SerialManipulator instance

  • urdf_path – Path to URDF file

  • dynamics – ManipulatorDynamics instance

  • joint_limits – Joint limits

  • torque_limits – Torque limits (optional)

  • target_speedup – Target speedup over CPU (default: 40x)

  • gpu_memory_mb – GPU memory pool size in MB (optional)

  • enable_profiling – Enable CUDA profiling (optional)

  • kernel_type – Kernel selection strategy (optional)

Returns:

Configured planner instance

Return type:

OptimizedTrajectoryPlanning

Factory function to create an optimized trajectory planner with recommended settings.

Parameters:

  • serial_manipulator (SerialManipulator) – SerialManipulator instance

  • urdf_path (str) – Path to URDF file

  • dynamics (ManipulatorDynamics) – ManipulatorDynamics instance

  • joint_limits (list) – Joint limits

  • torque_limits (list, optional) – Torque limits

  • gpu_memory_mb (int, optional) – GPU memory pool size in MB

  • enable_profiling (bool) – Enable CUDA profiling

Returns:

  • OptimizedTrajectoryPlanning – Configured planner instance

ManipulaPy.path_planning.compare_implementations(serial_manipulator, urdf_path, dynamics, joint_limits, test_params=None, detailed_analysis=True)[source]#

Enhanced implementation comparison with detailed kernel analysis.

Parameters:
  • serial_manipulator – SerialManipulator instance

  • urdf_path – Path to URDF file

  • dynamics – ManipulatorDynamics instance

  • joint_limits – Joint limits

  • test_params – Test parameters (optional)

  • detailed_analysis – Whether to perform detailed kernel comparison

Returns:

Comprehensive comparison results

Return type:

dict

Compare performance between CPU and GPU implementations.

Parameters:

  • serial_manipulator (SerialManipulator) – SerialManipulator instance

  • urdf_path (str) – Path to URDF file

  • dynamics (ManipulatorDynamics) – ManipulatorDynamics instance

  • joint_limits (list) – Joint limits

  • test_params (dict, optional) – Test parameters

Returns:

  • dict – Comparison results including timing and accuracy metrics

CPU Optimization Functions#

ManipulaPy.path_planning._trajectory_cpu_fallback(*args, **kwargs)#

MagicMock is a subclass of Mock with default implementations of most of the magic methods. You can use MagicMock without having to configure the magic methods yourself.

If you use the spec or spec_set arguments then only magic methods that exist in the spec will be created.

Attributes and the return value of a MagicMock will also be MagicMocks.

Numba-optimized CPU trajectory generation with parallel execution.

Parameters:

  • thetastart (numpy.ndarray) – Starting joint angles

  • thetaend (numpy.ndarray) – Ending joint angles

  • Tf (float) – Final time

  • N (int) – Number of trajectory points

  • method (int) – Time scaling method

Returns:

  • tuple – (positions, velocities, accelerations) arrays

Optimization: Uses Numba’s parallel prange for CPU parallelization.

ManipulaPy.path_planning._traj_cpu_njit(*args, **kwargs)#

MagicMock is a subclass of Mock with default implementations of most of the magic methods. You can use MagicMock without having to configure the magic methods yourself.

If you use the spec or spec_set arguments then only magic methods that exist in the spec will be created.

Attributes and the return value of a MagicMock will also be MagicMocks.

Thin wrapper for Numba-optimized trajectory computation.

Parameters:

  • thetastart (numpy.ndarray) – Starting joint angles

  • thetaend (numpy.ndarray) – Ending joint angles

  • Tf (float) – Final time

  • N (int) – Number of trajectory points

  • method (int) – Time scaling method

Returns:

  • tuple – (positions, velocities, accelerations) arrays

Usage Examples#

Optimized Trajectory Planning:

from ManipulaPy.path_planning import OptimizedTrajectoryPlanning
from ManipulaPy.urdf_processor import URDFToSerialManipulator

# Setup with optimization options
processor = URDFToSerialManipulator("robot.urdf")
planner = OptimizedTrajectoryPlanning(
    processor.serial_manipulator,
    "robot.urdf",
    processor.dynamics,
    joint_limits=[(-np.pi, np.pi)] * 6,
    use_cuda=None,  # Auto-detect
    cuda_threshold=100,  # Use GPU for problems > 100*6 = 600 elements
    memory_pool_size_mb=256,  # 256MB GPU memory pool
    enable_profiling=True
)

# Generate trajectory with automatic GPU/CPU selection
trajectory = planner.joint_trajectory(
    thetastart=[0, 0, 0, 0, 0, 0],
    thetaend=[0.5, -0.3, 0.8, 0.1, -0.2, 0.4],
    Tf=3.0,
    N=2000,  # Large enough to trigger GPU
    method=5
)

# Check performance stats
stats = planner.get_performance_stats()
print(f"GPU usage: {stats['gpu_usage_percent']:.1f}%")
print(f"Average GPU time: {stats['avg_gpu_time']*1000:.2f}ms")

Factory Function Usage:

from ManipulaPy.path_planning import create_optimized_planner

# Create planner with auto-optimized settings
planner = create_optimized_planner(
    serial_manipulator=processor.serial_manipulator,
    urdf_path="robot.urdf",
    dynamics=processor.dynamics,
    joint_limits=[(-np.pi, np.pi)] * 6,
    gpu_memory_mb=512,
    enable_profiling=False
)

Batch Trajectory Processing:

# Generate 10 trajectories simultaneously
batch_size = 10
num_joints = 6

starts = np.random.uniform(-np.pi, np.pi, (batch_size, num_joints))
ends = np.random.uniform(-np.pi, np.pi, (batch_size, num_joints))

# Process all trajectories in parallel on GPU
batch_trajectories = planner.batch_joint_trajectory(
    thetastart_batch=starts,
    thetaend_batch=ends,
    Tf=2.0,
    N=500,
    method=3
)

print(f"Batch shape: {batch_trajectories['positions'].shape}")  # (10, 500, 6)

Performance Comparison:

from ManipulaPy.path_planning import compare_implementations

# Compare CPU vs GPU performance
results = compare_implementations(
    serial_manipulator=processor.serial_manipulator,
    urdf_path="robot.urdf",
    dynamics=processor.dynamics,
    joint_limits=[(-np.pi, np.pi)] * 6,
    test_params={"N": 5000, "Tf": 3.0, "method": 5}
)

print(f"CPU time: {results['cpu']['time']:.4f}s")
if results['gpu']['available']:
    print(f"GPU time: {results['gpu']['time']:.4f}s")
    print(f"Speedup: {results['gpu']['speedup']:.2f}x")
    print(f"Max position difference: {results['accuracy']['max_pos_diff']:.2e}")

Advanced Dynamics Integration:

# Generate trajectory
trajectory = planner.joint_trajectory(
    thetastart=[0, 0, 0, 0, 0, 0],
    thetaend=[1.0, -0.5, 0.8, 0.2, -0.3, 0.6],
    Tf=4.0,
    N=1000,
    method=5
)

# Compute required torques with GPU acceleration
torques = planner.inverse_dynamics_trajectory(
    trajectory["positions"],
    trajectory["velocities"],
    trajectory["accelerations"],
    gravity_vector=[0, 0, -9.81],
    Ftip=[0, 0, -10, 0, 0, 0]  # 10N downward force
)

# Forward dynamics simulation
simulation = planner.forward_dynamics_trajectory(
    thetalist=[0, 0, 0, 0, 0, 0],
    dthetalist=[0, 0, 0, 0, 0, 0],
    taumat=torques,
    g=[0, 0, -9.81],
    Ftipmat=np.zeros((1000, 6)),
    dt=0.004,
    intRes=5
)

Performance Monitoring:

# Run multiple operations
for i in range(5):
    trajectory = planner.joint_trajectory(
        thetastart=np.random.uniform(-1, 1, 6),
        thetaend=np.random.uniform(-1, 1, 6),
        Tf=2.0,
        N=1000,
        method=3
    )

# Analyze performance
stats = planner.get_performance_stats()
print(f"Total operations: {stats['gpu_calls'] + stats['cpu_calls']}")
print(f"GPU efficiency: {stats['gpu_usage_percent']:.1f}%")
print(f"Memory transfers: {stats['memory_transfers']}")

# Benchmark different problem sizes
benchmark_results = planner.benchmark_performance([
    {"N": 100, "joints": 6, "name": "Small"},
    {"N": 2000, "joints": 6, "name": "Large"},
    {"N": 1000, "joints": 12, "name": "Many joints"},
])

for name, result in benchmark_results.items():
    print(f"{name}: {result['total_time']:.4f}s (GPU: {result['used_gpu']})")

Memory Management:

# Clean up GPU resources explicitly
planner.cleanup_gpu_memory()

# Reset performance stats
planner.reset_performance_stats()

Cartesian Space Planning with Hybrid Computation:

# Define start and end poses
T_start = np.eye(4)
T_start[:3, 3] = [0.5, 0.2, 0.3]

T_end = np.eye(4)
T_end[:3, 3] = [0.3, 0.4, 0.5]
# Add rotation
from ManipulaPy.utils import rotation_matrix_z
T_end[:3, :3] = rotation_matrix_z(np.pi/3)

# Generate Cartesian trajectory (GPU for derivatives, CPU for orientations)
cart_traj = planner.cartesian_trajectory(
    Xstart=T_start,
    Xend=T_end,
    Tf=3.0,
    N=1000,
    method=5
)

# Visualize
planner.plot_cartesian_trajectory(cart_traj, Tf=3.0)

Adaptive Threshold Tuning:

# The planner automatically adapts thresholds based on performance
initial_threshold = planner.cpu_threshold

# Run several operations
for _ in range(10):
    trajectory = planner.joint_trajectory(
        thetastart=np.random.uniform(-1, 1, 6),
        thetaend=np.random.uniform(-1, 1, 6),
        Tf=2.0,
        N=800,  # Around threshold
        method=3
    )

# Check if threshold adapted
final_threshold = planner.cpu_threshold
print(f"Threshold adapted: {initial_threshold} -> {final_threshold}")

Key Features#

Adaptive Execution:

  • Automatic GPU/CPU selection based on problem size

  • Intelligent threshold adaptation based on performance history

  • Graceful fallback on GPU errors or memory constraints

CUDA Acceleration:

  • 2D parallelized kernels for optimal GPU utilization

  • Shared memory optimization for time-scaling computations

  • Memory pooling to reduce allocation overhead

  • Pinned memory transfers for maximum PCIe bandwidth

Memory Management:

  • Per-instance GPU array caching

  • Global memory pool with automatic cleanup

  • Configurable memory pool sizes

  • Explicit memory management controls

Performance Monitoring:

  • Detailed timing statistics for GPU vs CPU usage

  • Automatic performance tracking and analysis

  • Built-in benchmarking capabilities

  • Adaptive threshold tuning based on empirical performance

Batch Processing:

  • 3D parallelized batch trajectory generation

  • Efficient memory management for large batches

  • Automatic load balancing across GPU cores

Robust Operation:

  • Comprehensive error handling with fallbacks

  • Automatic recovery from GPU memory issues

  • Extensive logging for debugging and optimization

Performance Characteristics#

GPU Acceleration Thresholds:

  • Small problems (N Γ— joints < 200): CPU (lower overhead)

  • Medium problems (200 ≀ N Γ— joints < 5000): Adaptive selection

  • Large problems (N Γ— joints β‰₯ 5000): GPU preferred

Memory Usage:

  • Joint trajectory: ~12 bytes per (time, joint) element

  • Cartesian trajectory: ~36 bytes per time step

  • Dynamics computation: ~20 bytes per (time, joint) element

Typical Speedups (on modern GPU):

  • Joint trajectories: 5-20Γ— for N > 1000

  • Batch processing: 10-50Γ— for large batches

  • Dynamics computation: 3-15Γ— for long trajectories

Configuration Guidelines#

GPU Memory Pool Sizing:

  • Small robots (≀6 DOF): 128-256 MB

  • Medium robots (7-12 DOF): 256-512 MB

  • Large robots (>12 DOF): 512+ MB

CUDA Threshold Tuning:

  • High-end GPUs: Lower threshold (50-100)

  • Mid-range GPUs: Medium threshold (100-200)

  • Integrated GPUs: Higher threshold (200-500)

Batch Size Recommendations:

  • Memory-limited: batch_size ≀ 50

  • Compute-limited: batch_size ≀ 200

  • High-memory systems: batch_size ≀ 1000

See Also#