Vision API Referenceο
This page documents ManipulaPy.vision, the module for computer vision capabilities including stereo vision, object detection, camera calibration, and PyBullet integration with optional YOLO object detection.
Tip
For conceptual explanations, see Vision User Guide.
β
Vision Classο
- class ManipulaPy.vision.Vision(camera_configs=None, stereo_configs=None, use_pybullet_debug=False, show_plot=True, logger_name='VisionSystemLogger', physics_client=None)[source]ο
Bases:
objectA unified vision class for monocular/stereo inputs and PyBullet debugging.
Features:ο
Monocular camera(s) with user-specified intrinsics/extrinsics or OpenCV capture devices.
Optional PyBullet debug sliders for a βvirtualβ camera.
Stereo pipeline: rectification, disparity, 3D point cloud generation (if stereo configs provided).
Basic obstacle detection via depth thresholding + intrinsics-based unprojection.
- param camera_configs:
- Each dict describes one camera (monocular or part of a stereo pair):
βnameβ (str)
βtranslationβ ([x, y, z]) in world or local coords
βrotationβ ([roll_deg, pitch_deg, yaw_deg]) in degrees
βfovβ (float)
βnearβ (float)
βfarβ (float)
βintrinsic_matrixβ (3x3 np.array)
βdistortion_coeffsβ (1D np.array of length=5)
βuse_opencvβ (bool)
βdevice_indexβ (int) => for OpenCV
- type camera_configs:
list of dict, optional
- param stereo_configs:
(left_cam_cfg, right_cam_cfg) for stereo. Must have βintrinsic_matrixβ, βdistortion_coeffsβ, βtranslationβ, βrotationβ, etc.
- type stereo_configs:
tuple(dict, dict), optional
- param use_pybullet_debug:
If True, create debug sliders in PyBullet for a single βvirtualβ camera.
- type use_pybullet_debug:
bool
- param show_plot:
If True (and use_pybullet_debug=True), display the debug camera feed in a Matplotlib window.
- type show_plot:
bool
- param logger_name:
Logger name for this class.
- type logger_name:
str
- param physics_client:
PyBullet client ID if controlling a simulation environment.
- type physics_client:
int or None
Unified vision system for monocular/stereo cameras with PyBullet integration, stereo processing pipeline, and YOLO-based object detection capabilities.
Constructor
- __init__(camera_configs=None, stereo_configs=None, use_pybullet_debug=False, show_plot=True, logger_name='VisionSystemLogger', physics_client=None)[source]ο
Initializes the Vision system with optional monocular/stereo cameras, PyBullet debug tools, and YOLO object detection.
- Parameters:
camera_configs (-) β List of dictionaries for monocular cameras or stereo pairs.
stereo_configs (-) β Tuple(left_cam_config, right_cam_config) for stereo processing.
use_pybullet_debug (-) β Enables PyBullet sliders to modify virtual camera parameters.
show_plot (-) β Displays PyBullet debug images using Matplotlib.
logger_name (-) β Name for logging information.
physics_client (-) β PyBullet client ID for interacting with a simulation.
Parameters:
camera_configs (list of dict, optional) β Camera configuration dictionaries
stereo_configs (tuple(dict, dict), optional) β Left and right camera configurations for stereo
use_pybullet_debug (bool, optional) β Enable PyBullet debug sliders (default: False)
show_plot (bool, optional) β Display debug camera feed in matplotlib (default: True)
logger_name (str, optional) β Logger identifier (default: βVisionSystemLoggerβ)
physics_client (int, optional) β PyBullet client ID for simulation integration
Camera Configuration Structure:
Each camera_config dictionary contains:
name (str) β Camera identifier
translation (list) β [x, y, z] position in world coordinates
rotation (list) β [roll_deg, pitch_deg, yaw_deg] orientation angles
fov (float) β Field of view in degrees
near (float) β Near clipping plane distance
far (float) β Far clipping plane distance
intrinsic_matrix (np.ndarray) β 3Γ3 camera intrinsic matrix
distortion_coeffs (np.ndarray) β 5-element distortion coefficient array
use_opencv (bool) β Enable OpenCV device capture
device_index (int) β OpenCV device identifier
- detect_obstacles(depth_image, rgb_image, depth_threshold=0.0, camera_index=0, step=5)[source]ο
Detect obstacles with YOLO (if available) and estimate 3D positions via depth.
- rectify_stereo_images(left_img, right_img)[source]ο
Remap left and right images to their rectified forms.
- compute_disparity(left_rect, right_rect)[source]ο
Compute a disparity map from rectified stereo images using StereoSGBM (or BM).
- disparity_to_pointcloud(disparity)[source]ο
Reproject a disparity map to 3D points using the Q matrix from stereoRectify.
β
Camera Managementο
Configuration Methodsο
- Vision._configure_camera(idx, cfg)[source]ο
Store a camera configuration and open an OpenCV capture device if requested.
Internal method for storing camera configurations and initializing OpenCV capture devices.
Parameters:
idx (int) β Camera index for internal storage
cfg (dict) β Camera configuration dictionary
Processing:
Extracts configuration parameters with defaults
Builds extrinsic transformation matrix
Stores camera parameters in self.cameras[idx]
Initializes OpenCV VideoCapture if use_opencv=True
Validates device accessibility and logs status
- Vision._validate_stereo_config(left_cfg, right_cfg)[source]ο
Basic validation that each stereo cam config has required keys.
Validates stereo camera configuration completeness.
Parameters:
left_cfg (dict) β Left camera configuration
right_cfg (dict) β Right camera configuration
Required Keys:
intrinsic_matrix, distortion_coeffs, translation, rotation
Extrinsic Matrix Computationο
- Vision._make_extrinsic_matrix(translation, rotation_deg)[source]ο
Create a 4x4 extrinsic matrix from translation and Euler angles in degrees. Uses cv2.Rodrigues for improved numerical stability.
Constructs 4Γ4 homogeneous transformation matrix from translation and Euler angles.
Parameters:
translation (list) β [x, y, z] translation vector
rotation_deg (list) β [roll, pitch, yaw] in degrees
Returns:
np.ndarray β 4Γ4 extrinsic transformation matrix
- Vision._euler_to_rotation_matrix(euler_deg)[source]ο
Convert [roll_deg, pitch_deg, yaw_deg] to a rotation matrix via manual multiplication, or you can chain cv2.Rodrigues calls. Weβll do a direct approach here for clarity.
Converts Euler angles to 3Γ3 rotation matrix using ZYX convention.
Parameters:
euler_deg (list) β [roll_deg, pitch_deg, yaw_deg]
Returns:
np.ndarray β 3Γ3 rotation matrix (float32)
Mathematical Implementation:
\[R = R_z(\text{yaw}) \cdot R_y(\text{pitch}) \cdot R_x(\text{roll})\]
β
Image Acquisitionο
PyBullet Integrationο
- Vision.capture_image(camera_index=0)[source]ο
Captures an RGB and depth image from PyBullet cameras.
Captures RGB and depth images from PyBullet cameras with automatic depth scaling.
Parameters:
camera_index (int, optional) β Camera identifier (default: 0)
Returns:
rgb (np.ndarray) β RGB image array (H, W, 3) uint8
depth (np.ndarray) β Depth image array (H, W) float32
Processing Pipeline:
Validates camera_index existence in self.cameras
Constructs view matrix from camera translation and fixed target
Builds projection matrix from camera parameters
Calls pb.getCameraImage() with computed matrices
Processes RGBA to RGB conversion
Applies depth scaling: depth = near + (far - near) Γ normalized_depth
β
Object Detectionο
YOLO Integrationο
- Vision.detect_obstacles(depth_image, rgb_image, depth_threshold=0.0, camera_index=0, step=5)[source]ο
Detect obstacles with YOLO (if available) and estimate 3D positions via depth.
- Returns:
positions ((N, 3) float32 array)
orientations ((N,) float32 array (yaw angles in degrees in XY plane))
- Parameters:
YOLO-based object detection with 3D position estimation using depth information.
Parameters:
depth_image (np.ndarray) β Depth image for 3D positioning
rgb_image (np.ndarray) β RGB image for YOLO detection
depth_threshold (float, optional) β Maximum depth consideration (default: 0.0)
camera_index (int, optional) β Camera for intrinsic parameters (default: 0)
step (int, optional) β Depth downsampling stride (default: 5)
Returns:
positions (np.ndarray) β Shape (N, 3) 3D object positions
orientations (np.ndarray) β Shape (N,) XY-plane orientation angles
Detection Pipeline:
YOLO Inference: self.yolo_model(rgb_image, conf=0.3)
Bounding Box Extraction: box.xyxy[0].tolist() coordinates
Depth Analysis: Median depth within bounding box region
3D Reprojection: Camera intrinsics-based coordinate transformation
Orientation Estimation: np.arctan2(y_cam, x_cam) in XY plane
3D Transformation:
\[\begin{split}\begin{align} x_{cam} &= (c_x - c_{x,intrinsic}) \cdot \frac{d}{f_x} \\ y_{cam} &= (c_y - c_{y,intrinsic}) \cdot \frac{d}{f_y} \\ z_{cam} &= d \end{align}\end{split}\]Where (c_x, c_y) is bounding box center, d is median depth, and f_x, f_y are focal lengths.
β
Stereo Vision Pipelineο
Rectification Setupο
- Vision.compute_stereo_rectification_maps(image_size=(640, 480))[source]ο
Computes stereo rectification maps using OpenCV stereoRectify algorithm.
Parameters:
image_size (tuple, optional) β (width, height) for rectification (default: (640, 480))
Implementation Details:
Parameter Extraction: Intrinsics K1, K2 and distortions D1, D2
Relative Geometry: R_lr = R_right @ R_left.T, t_lr = t_right - R_lr @ t_left
Type Unification: Convert all inputs to float64 for cv2.stereoRectify
Rectification Computation: cv2.stereoRectify with CALIB_ZERO_DISPARITY
Map Generation: cv2.initUndistortRectifyMap for both cameras
Q Matrix Storage: Disparity-to-depth transformation matrix
Image Processingο
- Vision.rectify_stereo_images(left_img, right_img)[source]ο
Remap left and right images to their rectified forms.
Applies rectification maps to stereo image pairs.
Parameters:
left_img (np.ndarray) β Left camera image
right_img (np.ndarray) β Right camera image
Returns:
left_rect (np.ndarray) β Rectified left image
right_rect (np.ndarray) β Rectified right image
Requirements:
stereo_enabled = True
Rectification maps computed via compute_stereo_rectification_maps()
- Vision.compute_disparity(left_rect, right_rect)[source]ο
Compute a disparity map from rectified stereo images using StereoSGBM (or BM).
Computes disparity map using StereoSGBM algorithm.
Parameters:
left_rect (np.ndarray) β Rectified left image
right_rect (np.ndarray) β Rectified right image
Returns:
disparity (np.ndarray) β Float32 disparity map
StereoSGBM Configuration:
minDisparity: 0
numDisparities: 64
blockSize: 7
P1: 8 Γ 3 Γ 7Β²
P2: 32 Γ 3 Γ 7Β²
Fixed-point scaling: division by 16.0
3D Reconstructionο
- Vision.disparity_to_pointcloud(disparity)[source]ο
Reproject a disparity map to 3D points using the Q matrix from stereoRectify.
Reprojects disparity map to 3D point cloud using Q matrix.
Parameters:
disparity (np.ndarray) β Disparity map from compute_disparity()
Returns:
cloud_filtered (np.ndarray) β Shape (N, 3) valid 3D points
Filtering Criteria:
Positive disparity values (disp_flat > 0)
Finite coordinates (np.isfinite(cloud[:, 0]))
Reasonable depth (cloud[:, 2] < 10.0)
- Vision.get_stereo_point_cloud(left_img, right_img)[source]ο
High-level pipeline: rectify, compute disparity, reproject to 3D.
High-level stereo processing pipeline combining rectification, disparity, and reprojection.
Parameters:
left_img (np.ndarray) β Left camera image
right_img (np.ndarray) β Right camera image
Returns:
point_cloud (np.ndarray) β Shape (N, 3) 3D point coordinates
Pipeline Sequence:
rectify_stereo_images(left_img, right_img)
compute_disparity(left_rect, right_rect)
disparity_to_pointcloud(disparity)
β
PyBullet Debug Interfaceο
Debug Slider Setupο
- Vision._setup_pybullet_debug_sliders()[source]ο
Creates PyBullet debug sliders for a single βvirtualβ camera.
Creates PyBullet debug sliders for interactive virtual camera control.
Slider Categories:
View Matrix Parameters:
target_x, target_y, target_z: Camera target position
distance: Camera distance from target
yaw, pitch, roll: Camera orientation angles
upAxisIndex: Up axis selection (0 or 1)
Projection Matrix Parameters:
width, height: Image resolution
fov: Field of view angle
near_val, far_val: Clipping plane distances
print: Debug output trigger
- Vision._get_pybullet_view_proj()[source]ο
Reads debug sliders, returns (view_mtx, proj_mtx, width, height).
Reads debug slider values and constructs view/projection matrices.
Returns:
view_mtx (list) β PyBullet view matrix
proj_mtx (list) β PyBullet projection matrix
width (int) β Image width
height (int) β Image height
Matrix Construction:
View: pb.computeViewMatrixFromYawPitchRoll()
Projection: pb.computeProjectionMatrixFOV()
Debug logging on print button toggle
β
Resource Managementο
Cleanup Methodsο
- Vision.release()[source]ο
Release resources (e.g., OpenCV capture devices).
Releases OpenCV capture device resources.
Operations:
Iterates through self.capture_devices
Calls cap.release() for each VideoCapture object
Clears capture_devices dictionary
Logs cleanup status for each device
- Vision.__del__()[source]ο
Destructor: ensure we release resources gracefully.
Destructor with robust error handling for graceful resource cleanup.
Safety Features:
Validates logger existence and handler availability
Checks stream closure status before logging
Silent exception handling to prevent destructor failures
Guaranteed resource cleanup regardless of logging success
β
Utility Functionsο
- ManipulaPy.vision.read_debug_parameters(dbg_params)[source]ο
Utility to read current slider values from PyBullet debug interface.
Utility function for reading PyBullet debug parameter values.
Parameters:
dbg_params (dict) β Dictionary mapping parameter names to PyBullet IDs
Returns:
values (dict) β Dictionary mapping parameter names to current values
Implementation:
Iterates through dbg_params and calls pb.readUserDebugParameter() for each ID.
β
Data Structures and Configurationο
Internal Storage Formatο
Camera Storage (self.cameras):
self.cameras[index] = {
"name": str,
"translation": [x, y, z],
"rotation": [roll_deg, pitch_deg, yaw_deg],
"fov": float,
"near": float,
"far": float,
"intrinsic_matrix": np.ndarray(3, 3),
"distortion_coeffs": np.ndarray(5,),
"use_opencv": bool,
"device_index": int,
"extrinsic_matrix": np.ndarray(4, 4)
}
Stereo Configuration Attributes:
# Stereo processing state
self.stereo_enabled: bool
self.left_cam_cfg: dict
self.right_cam_cfg: dict
# Rectification maps
self.left_map_x: np.ndarray
self.left_map_y: np.ndarray
self.right_map_x: np.ndarray
self.right_map_y: np.ndarray
# 3D reconstruction
self.Q: np.ndarray # 4x4 disparity-to-depth matrix
self.stereo_matcher: cv2.StereoSGBM
Default Camera Configurationο
When no camera_configs provided, uses:
default_config = {
"name": "default_camera",
"translation": [0, 0, 0],
"rotation": [0, 0, 0],
"fov": 60,
"near": 0.1,
"far": 5.0,
"intrinsic_matrix": [[500, 0, 320],
[0, 500, 240],
[0, 0, 1]],
"distortion_coeffs": [0, 0, 0, 0, 0],
"use_opencv": False,
"device_index": 0
}
β
Error Handling and Validationο
YOLO Model Managementο
Loading Failure: Sets yolo_model = None, continues operation
Detection Fallback: Returns empty arrays when YOLO unavailable
Input Validation: Checks rgb_image and depth_image for None/invalid
Stereo Processing Errorsο
Configuration Validation: Checks required keys in stereo configs
Runtime State Checking: Validates stereo_enabled before operations
Map Initialization: Ensures rectification maps computed before use
OpenCV Device Handlingο
Device Access Failure: Raises RuntimeError with descriptive message
Capture Validation: Uses cap.isOpened() to verify device accessibility
Resource Cleanup: Automatic release in destructor and explicit method
PyBullet Integration Safetyο
Parameter Reading: Safe handling of missing debug parameters
Matrix Validation: Debug logging for view/projection matrix inspection
Client State: No assumptions about PyBullet connection status
β
Performance Considerationsο
Memory Managementο
Image Arrays: Contiguous memory layout for OpenCV operations
Rectification Maps: Persistent storage for repeated stereo processing
Point Clouds: Filtered arrays to reduce memory footprint
Computational Efficiencyο
YOLO Inference: Single forward pass per image
Stereo Matching: StereoSGBM optimized for quality/speed balance
Depth Scaling: Vectorized operations for range conversion
Threading Considerationsο
OpenCV Capture: Single-threaded device access
YOLO Processing: GPU acceleration when available
PyBullet Integration: Main thread simulation access required
β
See Alsoο
Perception API Reference β Higher-level perception capabilities using Vision
Utils Module β Mathematical utilities for transformations
Simulation API Reference β PyBullet simulation integration
Potential Field API Reference β Obstacle avoidance using vision data
External Dependenciesο
OpenCV β Computer vision algorithms and stereo processing
PyBullet β Physics simulation and camera rendering
Ultralytics YOLO β Object detection framework
NumPy β Numerical array operations
Matplotlib β Debug visualization