CUDA Kernels API Referenceο
This module provides CUDA-accelerated functions for trajectory planning and dynamics computation with automatic CPU fallback.
Availability Checkο
- ManipulaPy.cuda_kernels.check_cuda_availability()[source]ο
Enhanced CUDA availability check with detailed diagnostics.
Check if CUDA is available and provide helpful diagnostic information.
Returns:
bool β True if CUDA is available, False otherwise
Core CUDA Kernelsο
Trajectory Kernelsο
- ManipulaPy.cuda_kernels.trajectory_kernel(*args, **kwargs)[source]ο
CUDA kernel for generating joint trajectory points with time-scaling.
Parameters:
thetastart (cuda.device_array) β Starting joint angles
thetaend (cuda.device_array) β Target joint angles
traj_pos (cuda.device_array) β Output trajectory positions
traj_vel (cuda.device_array) β Output trajectory velocities
traj_acc (cuda.device_array) β Output trajectory accelerations
Tf (float) β Total trajectory time
N (int) β Number of trajectory points
method (int) β Time scaling method (3=cubic, 5=quintic)
stream (int) β CUDA stream for kernel execution
- ManipulaPy.cuda_kernels.cartesian_trajectory_kernel(*args, **kwargs)[source]ο
CUDA kernel for generating Cartesian trajectory with time-scaling.
Parameters:
pstart (cuda.device_array) β Starting point coordinates [x, y, z]
pend (cuda.device_array) β Ending point coordinates [x, y, z]
traj_pos (cuda.device_array) β Output trajectory positions
traj_vel (cuda.device_array) β Output trajectory velocities
traj_acc (cuda.device_array) β Output trajectory accelerations
Tf (float) β Total trajectory duration
N (int) β Number of trajectory points
method (int) β Time-scaling method (3=cubic, 5=quintic)
stream (int) β CUDA stream for kernel execution
- ManipulaPy.cuda_kernels.batch_trajectory_kernel(*args, **kwargs)[source]ο
Optimized CUDA kernel for batch trajectory generation with time-scaling.
Parameters:
thetastart_batch (cuda.device_array) β Starting joint positions for each batch
thetaend_batch (cuda.device_array) β Ending joint positions for each batch
traj_pos_batch (cuda.device_array) β Output trajectory positions
traj_vel_batch (cuda.device_array) β Output trajectory velocities
traj_acc_batch (cuda.device_array) β Output trajectory accelerations
Tf (float) β Total trajectory duration
N (int) β Number of trajectory timesteps
method (int) β Time-scaling method (3=cubic, 5=quintic)
batch_size (int) β Number of trajectory batches
stream (int) β CUDA stream for kernel execution
Dynamics Kernelsο
- ManipulaPy.cuda_kernels.inverse_dynamics_kernel(*args, **kwargs)[source]ο
Optimized CUDA kernel for computing inverse dynamics using 2D parallelization.
Parameters:
thetalist_trajectory (cuda.device_array) β Joint position trajectory
dthetalist_trajectory (cuda.device_array) β Joint velocity trajectory
ddthetalist_trajectory (cuda.device_array) β Joint acceleration trajectory
gravity_vector (cuda.device_array) β Gravity vector
Ftip (cuda.device_array) β End-effector wrench
Glist (cuda.device_array) β Mass matrix diagonal elements
Slist (cuda.device_array) β Velocity quadratic force coefficients
M (cuda.device_array) β Full mass matrix
torques_trajectory (cuda.device_array) β Output joint torque trajectory
torque_limits (cuda.device_array) β Joint torque limits
stream (int) β CUDA stream for kernel execution
- ManipulaPy.cuda_kernels.forward_dynamics_kernel(*args, **kwargs)[source]ο
Compute forward dynamics for a robotic system using a CUDA kernel.
Parameters:
thetalist (cuda.device_array) β Initial joint positions
dthetalist (cuda.device_array) β Initial joint velocities
taumat (cuda.device_array) β Applied joint torques trajectory
g (cuda.device_array) β Gravity vector
Ftipmat (cuda.device_array) β End-effector wrenches
dt (float) β Total time step
intRes (int) β Integration resolution/substeps
Glist (cuda.device_array) β Mass matrix diagonal elements
Slist (cuda.device_array) β Velocity quadratic force coefficients
M (cuda.device_array) β Full mass matrix
thetamat (cuda.device_array) β Output joint position trajectory
dthetamat (cuda.device_array) β Output joint velocity trajectory
ddthetamat (cuda.device_array) β Output joint acceleration trajectory
joint_limits (cuda.device_array) β Joint position limits
stream (int) β CUDA stream for kernel execution
Potential Field Kernelsο
- ManipulaPy.cuda_kernels.fused_potential_gradient_kernel(*args, **kwargs)[source]ο
CUDA kernel for computing potential and gradient for path planning.
Parameters:
positions (cuda.device_array) β Input positions to evaluate
goal (cuda.device_array) β Target goal point coordinates
obstacles (cuda.device_array) β Array of obstacle point coordinates
potential (cuda.device_array) β Output array for computed potential values
gradient (cuda.device_array) β Output array for computed gradient vectors
influence_distance (float) β Distance threshold for obstacle influence
stream (int) β CUDA stream for kernel execution
- ManipulaPy.cuda_kernels.attractive_potential_kernel(*args, **kwargs)[source]ο
Legacy function - use fused_potential_gradient_kernel instead.
Legacy CUDA kernel for attractive potential field computation.
Parameters:
positions (cuda.device_array) β Query positions (N, 3)
goal (cuda.device_array) β Goal position [x, y, z]
potential (cuda.device_array) β Output potential values (N,)
- ManipulaPy.cuda_kernels.repulsive_potential_kernel(*args, **kwargs)[source]ο
Legacy function - use fused_potential_gradient_kernel instead.
Legacy CUDA kernel for repulsive potential field computation.
Parameters:
positions (cuda.device_array) β Query positions (N, 3)
obstacles (cuda.device_array) β Obstacle positions (M, 3)
potential (cuda.device_array) β Output potential values (N,)
influence_distance (float) β Maximum influence distance
- ManipulaPy.cuda_kernels.gradient_kernel(*args, **kwargs)[source]ο
Legacy function - use fused_potential_gradient_kernel instead.
Legacy CUDA kernel for numerical gradient computation.
Parameters:
potential (cuda.device_array) β Potential field values (N,)
gradient (cuda.device_array) β Output gradient (N-1,)
High-Level Wrappersο
- ManipulaPy.cuda_kernels.optimized_trajectory_generation(thetastart, thetaend, Tf, N, method, use_pinned=True, kernel_type='auto')[source]ο
Main entry point for optimized trajectory generation.
This function automatically selects the best kernel and configuration for maximum performance and 40x+ speedups.
- Parameters:
thetastart β Start and end joint angles
thetaend β Start and end joint angles
Tf β Final time
N β Number of trajectory points
method β Time scaling method (3=cubic, 5=quintic)
use_pinned β Use pinned memory for faster transfers
kernel_type β Kernel selection (βautoβ, βstandardβ, βvectorizedβ, etc.)
Generates an optimized trajectory using CUDA acceleration with automatic memory management.
Parameters:
thetastart (np.ndarray) β Initial joint configuration
thetaend (np.ndarray) β Final joint configuration
Tf (float) β Total trajectory duration
N (int) β Number of trajectory timesteps
method (int) β Trajectory generation method
use_pinned (bool) β Use pinned memory for faster GPU transfers
Returns:
tuple β (trajectory positions, trajectory velocities, trajectory accelerations)
- ManipulaPy.cuda_kernels.optimized_potential_field(positions, goal, obstacles, influence_distance, use_pinned=True)[source]ο
Optimized potential field computation with CUDA acceleration.
Compute potential field and gradient for a set of positions using a CUDA-accelerated kernel.
Parameters:
positions (np.ndarray) β Input positions to compute potential field for
goal (np.ndarray) β Target goal position
obstacles (np.ndarray) β Array of obstacle positions
influence_distance (float) β Distance within which obstacles influence the potential field
use_pinned (bool) β Use pinned memory for faster GPU transfers
Returns:
tuple β (potential values, gradient vectors) for each input position
- ManipulaPy.cuda_kernels.optimized_batch_trajectory_generation(thetastart_batch, thetaend_batch, Tf, N, method, use_pinned=True)[source]ο
Optimized batch trajectory generation for multiple trajectories.
Efficiently generate batch trajectories using CUDA acceleration.
Parameters:
thetastart_batch (np.ndarray) β Batch of initial joint configurations
thetaend_batch (np.ndarray) β Batch of final joint configurations
Tf (float) β Total trajectory duration
N (int) β Number of trajectory timesteps
method (int) β Trajectory generation method identifier
use_pinned (bool) β Use pinned memory for faster GPU transfers
Returns:
tuple β Batch of trajectory positions, velocities, and accelerations
CPU Fallback Functionsο
- ManipulaPy.cuda_kernels.trajectory_cpu_fallback(thetastart, thetaend, Tf, N, method)[source]ο
Optimized CPU fallback using NumPy vectorization.
Compute trajectory positions, velocities, and accelerations on the CPU when CUDA is unavailable.
Parameters:
thetastart (np.ndarray) β Initial joint configurations
thetaend (np.ndarray) β Target joint configurations
Tf (float) β Total trajectory duration
N (int) β Number of trajectory points to generate
method (int) β Time scaling method (3=cubic, 5=quintic)
Returns:
tuple β (positions, velocities, accelerations) arrays
Memory Managementο
- ManipulaPy.cuda_kernels.get_cuda_array(*args, **kwargs)[source]ο
Get a CUDA array from the memory pool.
Parameters:
shape (tuple) β Array dimensions
dtype (np.dtype) β Data type
Returns:
cuda.device_array β GPU array from memory pool
Grid Configurationο
- ManipulaPy.cuda_kernels.make_1d_grid(size, threads=256)[source]ο
Create optimal 1D grid for maximum GPU utilization.
Create a 1D grid configuration for CUDA kernel launch with optimal thread and block sizing.
Parameters:
size (int) β Total number of elements or work items to process
threads (int) β Desired number of threads per block
Returns:
tuple β ((blocks,), (threads,)) for kernel launch configuration
- ManipulaPy.cuda_kernels.make_2d_grid(N, num_joints, block_size=(128, 8))[source]ο
Create 2D grid configuration for CUDA kernel launch (backward compatibility).
This is the original function maintained for compatibility. For optimal performance, use make_2d_grid_optimized().
Compute optimal 2D grid configuration for CUDA kernel launch.
Parameters:
N (int) β First dimension of problem space
num_joints (int) β Second dimension of problem space
block_size (tuple) β Initial suggested block dimensions
Returns:
tuple β ((blocks_x, blocks_y), (threads_x, threads_y))
Performance Toolsο
- ManipulaPy.cuda_kernels.benchmark_kernel_performance(*args, **kwargs)[source]ο
Benchmark the performance of a specific CUDA kernel by executing it multiple times.
Parameters:
kernel_name (str) β Name of the kernel to benchmark
*args β Arguments to pass to the kernel function
num_runs (int) β Number of times to run the kernel
Returns:
dict or None β Performance metrics including average, std, min/max times
- ManipulaPy.cuda_kernels._best_2d_config(*args, **kwargs)[source]ο
Auto-tune 2D CUDA kernel launch configuration for optimal performance.
Parameters:
N (int) β Number of time steps or trajectory points
J (int) β Number of joints or degrees of freedom
Returns:
tuple β ((grid_x, grid_y), (block_x, block_y))
Module Constantsο
- ManipulaPy.cuda_kernels.CUDA_AVAILABLE = Falseο
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
bool β True if CUDA is available, False otherwise
- ManipulaPy.cuda_kernels.CUPY_AVAILABLE = Trueο
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
bool β True if CuPy is available, False otherwise
- ManipulaPy.cuda_kernels.FAST_MATH = Trueο
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
bool β Whether fast math optimizations are enabled
- ManipulaPy.cuda_kernels.float_t = <class 'numpy.float32'>ο
Single-precision floating-point number type, compatible with C
float.- Character code:
'f'- Canonical name:
numpy.single
- Alias on this platform (Linux x86_64):
numpy.float32: 32-bit-precision floating-point number type: sign bit, 8 bits exponent, 23 bits mantissa.
type β Float precision type (float32 or float16)
Environment Variablesο
- MANIPULAPY_FASTMATH
Set to β1β to enable fast math optimizations (~2x speedup with relaxed IEEE 754 compliance)
- MANIPULAPY_USE_FP16
Set to β1β to use 16-bit floating point precision for memory-bound kernels