Vision API Reference#
This page documents ManipulaPy.vision, the module for computer vision capabilities including stereo vision, object detection, camera calibration, and PyBullet integration with optional YOLO object detection.
Tip
For conceptual explanations, see Vision User Guide.
—
Vision Class#
- class ManipulaPy.vision.Vision(camera_configs=None, stereo_configs=None, use_pybullet_debug=False, show_plot=True, logger_name='VisionSystemLogger', physics_client=None)[source]#
Bases:
objectA unified vision class for monocular/stereo inputs and PyBullet debugging.
Features:#
Monocular camera(s) with user-specified intrinsics/extrinsics or OpenCV capture devices.
Optional PyBullet debug sliders for a “virtual” camera.
Stereo pipeline: rectification, disparity, 3D point cloud generation (if stereo configs provided).
Basic obstacle detection via depth thresholding + intrinsics-based unprojection.
- param camera_configs:
- Each dict describes one camera (monocular or part of a stereo pair):
“name” (str)
“translation” ([x, y, z]) in world or local coords
“rotation” ([roll_deg, pitch_deg, yaw_deg]) in degrees
“fov” (float)
“near” (float)
“far” (float)
“intrinsic_matrix” (3x3 np.array)
“distortion_coeffs” (1D np.array of length=5)
“use_opencv” (bool)
“device_index” (int) => for OpenCV
- type camera_configs:
list of dict, optional
- param stereo_configs:
(left_cam_cfg, right_cam_cfg) for stereo. Must have ‘intrinsic_matrix’, ‘distortion_coeffs’, ‘translation’, ‘rotation’, etc.
- type stereo_configs:
tuple(dict, dict), optional
- param use_pybullet_debug:
If True, create debug sliders in PyBullet for a single “virtual” camera.
- type use_pybullet_debug:
bool
- param show_plot:
If True (and use_pybullet_debug=True), display the debug camera feed in a Matplotlib window.
- type show_plot:
bool
- param logger_name:
Logger name for this class.
- type logger_name:
str
- param physics_client:
PyBullet client ID if controlling a simulation environment.
- type physics_client:
int or None
Unified vision system for monocular/stereo cameras with PyBullet integration, stereo processing pipeline, and YOLO-based object detection capabilities.
Constructor
- __init__(camera_configs=None, stereo_configs=None, use_pybullet_debug=False, show_plot=True, logger_name='VisionSystemLogger', physics_client=None)[source]#
Initializes the Vision system with optional cameras, PyBullet debug tools, and YOLO object detection.
- Parameters:
camera_configs (-) – List of dictionaries for monocular cameras or stereo pairs.
stereo_configs (-) – Tuple(left_cam_config, right_cam_config) for stereo processing.
use_pybullet_debug (-) – Enables PyBullet sliders to modify virtual camera parameters.
show_plot (-) – Displays PyBullet debug images using Matplotlib.
logger_name (-) – Name for logging information.
physics_client (-) – PyBullet client ID for interacting with a simulation.
- Return type:
None
Parameters:
camera_configs (list of dict, optional) – Camera configuration dictionaries
stereo_configs (tuple(dict, dict), optional) – Left and right camera configurations for stereo
use_pybullet_debug (bool, optional) – Enable PyBullet debug sliders (default: False)
show_plot (bool, optional) – Display debug camera feed in matplotlib (default: True)
logger_name (str, optional) – Logger identifier (default: “VisionSystemLogger”)
physics_client (int, optional) – PyBullet client ID for simulation integration
Camera Configuration Structure:
Each camera_config dictionary contains:
name (str) – Camera identifier
translation (list) – [x, y, z] position in world coordinates
rotation (list) – [roll_deg, pitch_deg, yaw_deg] orientation angles
fov (float) – Field of view in degrees
near (float) – Near clipping plane distance
far (float) – Far clipping plane distance
intrinsic_matrix (np.ndarray) – 3×3 camera intrinsic matrix
distortion_coeffs (np.ndarray) – 5-element distortion coefficient array
use_opencv (bool) – Enable OpenCV device capture
device_index (int) – OpenCV device identifier
—
Camera Management#
Configuration Methods#
- Vision._configure_camera(idx, cfg)[source]#
Store a camera configuration and open an OpenCV capture device if requested.
Internal method for storing camera configurations and initializing OpenCV capture devices.
Parameters:
idx (int) – Camera index for internal storage
cfg (dict) – Camera configuration dictionary
Processing:
Extracts configuration parameters with defaults
Builds extrinsic transformation matrix
Stores camera parameters in self.cameras[idx]
Initializes OpenCV VideoCapture if use_opencv=True
Validates device accessibility and logs status
- Vision._validate_stereo_config(left_cfg, right_cfg)[source]#
Basic validation that each stereo cam config has required keys.
Validates stereo camera configuration completeness.
Parameters:
left_cfg (dict) – Left camera configuration
right_cfg (dict) – Right camera configuration
Required Keys:
intrinsic_matrix, distortion_coeffs, translation, rotation
Extrinsic Matrix Computation#
- Vision._make_extrinsic_matrix(translation, rotation_deg)[source]#
Create a 4x4 extrinsic matrix from translation and Euler angles in degrees. Uses cv2.Rodrigues for improved numerical stability.
Constructs 4×4 homogeneous transformation matrix from translation and Euler angles.
Parameters:
translation (list) – [x, y, z] translation vector
rotation_deg (list) – [roll, pitch, yaw] in degrees
Returns:
np.ndarray – 4×4 extrinsic transformation matrix
- Vision._euler_to_rotation_matrix(euler_deg)[source]#
Convert [roll_deg, pitch_deg, yaw_deg] to a rotation matrix via manual multiplication, or you can chain cv2.Rodrigues calls. We’ll do a direct approach here for clarity.
Converts Euler angles to 3×3 rotation matrix using ZYX convention.
Parameters:
euler_deg (list) – [roll_deg, pitch_deg, yaw_deg]
Returns:
np.ndarray – 3×3 rotation matrix (float32)
Mathematical Implementation:
\[R = R_z(\text{yaw}) \cdot R_y(\text{pitch}) \cdot R_x(\text{roll})\]
—
Image Acquisition#
PyBullet Integration#
- Vision.capture_image(camera_index=0)[source]#
Captures an RGB and depth image from PyBullet cameras.
Captures RGB and depth images from PyBullet cameras with automatic depth scaling.
Parameters:
camera_index (int, optional) – Camera identifier (default: 0)
Returns:
rgb (np.ndarray) – RGB image array (H, W, 3) uint8
depth (np.ndarray) – Depth image array (H, W) float32
Processing Pipeline:
Validates camera_index existence in self.cameras
Constructs view matrix from camera translation and fixed target
Builds projection matrix from camera parameters
Calls pb.getCameraImage() with computed matrices
Processes RGBA to RGB conversion
Applies depth scaling: depth = near + (far - near) × normalized_depth
—
Object Detection#
YOLO Integration#
- Vision.detect_obstacles(depth_image, rgb_image, depth_threshold=5.0, camera_index=0, step=5)[source]#
Detect obstacles with YOLO (if available) and estimate 3D positions via depth.
- Returns:
positions ((N, 3) float32 array)
orientations ((N,) float32 array (yaw angles in degrees in XY plane))
- Parameters:
- Return type:
YOLO-based object detection with 3D position estimation using depth information.
Parameters:
depth_image (np.ndarray) – Depth image for 3D positioning
rgb_image (np.ndarray) – RGB image for YOLO detection
depth_threshold (float, optional) – Maximum depth (m) to consider for obstacles. Default: 5.0.
Changed in version 1.3.2: Default raised from 0.0 → 5.0 — the prior default silently filtered every detection.
camera_index (int, optional) – Camera for intrinsic parameters (default: 0)
step (int, optional) – Depth downsampling stride (default: 5)
Returns:
positions (np.ndarray) – Shape (N, 3) 3D object positions
orientations (np.ndarray) – Shape (N,) XY-plane orientation angles
Detection Pipeline:
YOLO Inference: self.yolo_model(rgb_image, conf=0.3)
Bounding Box Extraction: box.xyxy[0].tolist() coordinates
Depth Analysis: Median depth within bounding box region
3D Reprojection: Camera intrinsics-based coordinate transformation
Orientation Estimation: np.arctan2(y_cam, x_cam) in XY plane
3D Transformation:
\[\begin{split}\begin{align} x_{cam} &= (c_x - c_{x,intrinsic}) \cdot \frac{d}{f_x} \\ y_{cam} &= (c_y - c_{y,intrinsic}) \cdot \frac{d}{f_y} \\ z_{cam} &= d \end{align}\end{split}\]Where (c_x, c_y) is bounding box center, d is median depth, and f_x, f_y are focal lengths.
—
Stereo Vision Pipeline#
Rectification Setup#
- Vision.compute_stereo_rectification_maps(image_size=(640, 480))[source]#
Compute and cache stereo rectification maps for configured cameras.
Computes stereo rectification maps using OpenCV stereoRectify algorithm.
Parameters:
image_size (tuple, optional) – (width, height) for rectification (default: (640, 480))
Implementation Details:
Parameter Extraction: Intrinsics K1, K2 and distortions D1, D2
Relative Geometry: R_lr = R_right @ R_left.T, t_lr = t_right - R_lr @ t_left
Type Unification: Convert all inputs to float64 for cv2.stereoRectify
Rectification Computation: cv2.stereoRectify with CALIB_ZERO_DISPARITY
Map Generation: cv2.initUndistortRectifyMap for both cameras
Q Matrix Storage: Disparity-to-depth transformation matrix
Image Processing#
- Vision.rectify_stereo_images(left_img, right_img)[source]#
Remap left and right images to their rectified forms.
Applies rectification maps to stereo image pairs.
Parameters:
left_img (np.ndarray) – Left camera image
right_img (np.ndarray) – Right camera image
Returns:
left_rect (np.ndarray) – Rectified left image
right_rect (np.ndarray) – Rectified right image
Requirements:
stereo_enabled = True
Rectification maps computed via compute_stereo_rectification_maps()
- Vision.compute_disparity(left_rect, right_rect)[source]#
Compute a disparity map from rectified stereo images using StereoSGBM (or BM).
Computes disparity map using StereoSGBM algorithm.
Parameters:
left_rect (np.ndarray) – Rectified left image
right_rect (np.ndarray) – Rectified right image
Returns:
disparity (np.ndarray) – Float32 disparity map
StereoSGBM Configuration:
minDisparity: 0
numDisparities: 64
blockSize: 7
P1: 8 × 3 × 7²
P2: 32 × 3 × 7²
Fixed-point scaling: division by 16.0
3D Reconstruction#
- Vision.disparity_to_pointcloud(disparity)[source]#
Reproject a disparity map to 3D points using the Q matrix from stereoRectify.
Reprojects disparity map to 3D point cloud using Q matrix.
Parameters:
disparity (np.ndarray) – Disparity map from compute_disparity()
Returns:
cloud_filtered (np.ndarray) – Shape (N, 3) valid 3D points
Filtering Criteria:
Positive disparity values (disp_flat > 0)
Finite coordinates (np.isfinite(cloud[:, 0]))
Reasonable depth (cloud[:, 2] < 10.0)
- Vision.get_stereo_point_cloud(left_img, right_img)[source]#
High-level pipeline: rectify, compute disparity, reproject to 3D.
High-level stereo processing pipeline combining rectification, disparity, and reprojection.
Parameters:
left_img (np.ndarray) – Left camera image
right_img (np.ndarray) – Right camera image
Returns:
point_cloud (np.ndarray) – Shape (N, 3) 3D point coordinates
Pipeline Sequence:
rectify_stereo_images(left_img, right_img)
compute_disparity(left_rect, right_rect)
disparity_to_pointcloud(disparity)
—
PyBullet Debug Interface#
Debug Slider Setup#
- Vision._setup_pybullet_debug_sliders()[source]#
Creates PyBullet debug sliders for a single ‘virtual’ camera.
Creates PyBullet debug sliders for interactive virtual camera control.
Slider Categories:
View Matrix Parameters:
target_x, target_y, target_z: Camera target position
distance: Camera distance from target
yaw, pitch, roll: Camera orientation angles
upAxisIndex: Up axis selection (0 or 1)
Projection Matrix Parameters:
width, height: Image resolution
fov: Field of view angle
near_val, far_val: Clipping plane distances
print: Debug output trigger
- Return type:
None
- Vision._get_pybullet_view_proj()[source]#
Reads debug sliders, returns (view_mtx, proj_mtx, width, height).
Reads debug slider values and constructs view/projection matrices.
Returns:
view_mtx (list) – PyBullet view matrix
proj_mtx (list) – PyBullet projection matrix
width (int) – Image width
height (int) – Image height
Matrix Construction:
View: pb.computeViewMatrixFromYawPitchRoll()
Projection: pb.computeProjectionMatrixFOV()
Debug logging on print button toggle
—
Resource Management#
Cleanup Methods#
- Vision.release()[source]#
Release resources (e.g., OpenCV capture devices).
Releases OpenCV capture device resources.
Operations:
Iterates through self.capture_devices
Calls cap.release() for each VideoCapture object
Clears capture_devices dictionary
Logs cleanup status for each device
- Return type:
None
- Vision.__del__()[source]#
Destructor: ensure we release resources gracefully.
Destructor with robust error handling for graceful resource cleanup.
Safety Features:
Validates logger existence and handler availability
Checks stream closure status before logging
Silent exception handling to prevent destructor failures
Guaranteed resource cleanup regardless of logging success
- Return type:
None
—
Utility Functions#
- ManipulaPy.vision.read_debug_parameters(dbg_params)[source]#
Utility to read current slider values from PyBullet debug interface.
Utility function for reading PyBullet debug parameter values.
Parameters:
dbg_params (dict) – Dictionary mapping parameter names to PyBullet IDs
Returns:
values (dict) – Dictionary mapping parameter names to current values
Implementation:
Iterates through dbg_params and calls pb.readUserDebugParameter() for each ID.
—
Data Structures and Configuration#
Internal Storage Format#
Camera Storage (self.cameras):
self.cameras[index] = {
"name": str,
"translation": [x, y, z],
"rotation": [roll_deg, pitch_deg, yaw_deg],
"fov": float,
"near": float,
"far": float,
"intrinsic_matrix": np.ndarray(3, 3),
"distortion_coeffs": np.ndarray(5,),
"use_opencv": bool,
"device_index": int,
"extrinsic_matrix": np.ndarray(4, 4)
}
Stereo Configuration Attributes:
# Stereo processing state
self.stereo_enabled: bool
self.left_cam_cfg: dict
self.right_cam_cfg: dict
# Rectification maps
self.left_map_x: np.ndarray
self.left_map_y: np.ndarray
self.right_map_x: np.ndarray
self.right_map_y: np.ndarray
# 3D reconstruction
self.Q: np.ndarray # 4x4 disparity-to-depth matrix
self.stereo_matcher: cv2.StereoSGBM
Default Camera Configuration#
When no camera_configs provided, uses:
default_config = {
"name": "default_camera",
"translation": [0, 0, 0],
"rotation": [0, 0, 0],
"fov": 60,
"near": 0.1,
"far": 5.0,
"intrinsic_matrix": [[500, 0, 320],
[0, 500, 240],
[0, 0, 1]],
"distortion_coeffs": [0, 0, 0, 0, 0],
"use_opencv": False,
"device_index": 0
}
—
Error Handling and Validation#
YOLO Model Management#
Loading Failure: Sets yolo_model = None, continues operation
Detection Fallback: Returns empty arrays when YOLO unavailable
Input Validation: Checks rgb_image and depth_image for None/invalid
Stereo Processing Errors#
Configuration Validation: Checks required keys in stereo configs
Runtime State Checking: Validates stereo_enabled before operations
Map Initialization: Ensures rectification maps computed before use
OpenCV Device Handling#
Device Access Failure: Raises RuntimeError with descriptive message
Capture Validation: Uses cap.isOpened() to verify device accessibility
Resource Cleanup: Automatic release in destructor and explicit method
PyBullet Integration Safety#
Parameter Reading: Safe handling of missing debug parameters
Matrix Validation: Debug logging for view/projection matrix inspection
Client State: No assumptions about PyBullet connection status
—
Performance Considerations#
Memory Management#
Image Arrays: Contiguous memory layout for OpenCV operations
Rectification Maps: Persistent storage for repeated stereo processing
Point Clouds: Filtered arrays to reduce memory footprint
Computational Efficiency#
YOLO Inference: Single forward pass per image
Stereo Matching: StereoSGBM optimized for quality/speed balance
Depth Scaling: Vectorized operations for range conversion
Threading Considerations#
OpenCV Capture: Single-threaded device access
YOLO Processing: GPU acceleration when available
PyBullet Integration: Main thread simulation access required
—
See Also#
Perception API Reference – Higher-level perception capabilities using Vision
Utils Module – Mathematical utilities for transformations
Simulation API Reference – PyBullet simulation integration
Potential Field API Reference – Obstacle avoidance using vision data
External Dependencies#
OpenCV – Computer vision algorithms and stereo processing
PyBullet – Physics simulation and camera rendering
Ultralytics YOLO – Object detection framework
NumPy – Numerical array operations
Matplotlib – Debug visualization