Vision API Reference#

This page documents ManipulaPy.vision, the module for computer vision capabilities including stereo vision, object detection, camera calibration, and PyBullet integration with optional YOLO object detection.

Tip

For conceptual explanations, see Vision User Guide.

—

Vision Class#

class ManipulaPy.vision.Vision(camera_configs=None, stereo_configs=None, use_pybullet_debug=False, show_plot=True, logger_name='VisionSystemLogger', physics_client=None)[source]#

Bases: object

A unified vision class for monocular/stereo inputs and PyBullet debugging.

Features:#

Monocular camera(s) with user-specified intrinsics/extrinsics or OpenCV capture devices.

Optional PyBullet debug sliders for a “virtual” camera.

Stereo pipeline: rectification, disparity, 3D point cloud generation (if stereo configs provided).

Basic obstacle detection via depth thresholding + intrinsics-based unprojection.

param camera_configs:

Each dict describes one camera (monocular or part of a stereo pair):

“name” (str)
“translation” ([x, y, z]) in world or local coords
“rotation” ([roll_deg, pitch_deg, yaw_deg]) in degrees
“fov” (float)
“near” (float)
“far” (float)
“intrinsic_matrix” (3x3 np.array)
“distortion_coeffs” (1D np.array of length=5)
“use_opencv” (bool)
“device_index” (int) => for OpenCV

type camera_configs:

list of dict, optional

param stereo_configs:

(left_cam_cfg, right_cam_cfg) for stereo. Must have ‘intrinsic_matrix’, ‘distortion_coeffs’, ‘translation’, ‘rotation’, etc.

type stereo_configs:

tuple(dict, dict), optional

param use_pybullet_debug:

If True, create debug sliders in PyBullet for a single “virtual” camera.

type use_pybullet_debug:

bool

param show_plot:

If True (and use_pybullet_debug=True), display the debug camera feed in a Matplotlib window.

type show_plot:

bool

param logger_name:

Logger name for this class.

type logger_name:

str

param physics_client:

PyBullet client ID if controlling a simulation environment.

type physics_client:

int or None

Unified vision system for monocular/stereo cameras with PyBullet integration, stereo processing pipeline, and YOLO-based object detection capabilities.

Constructor

__init__(camera_configs=None, stereo_configs=None, use_pybullet_debug=False, show_plot=True, logger_name='VisionSystemLogger', physics_client=None)[source]#

Initializes the Vision system with optional cameras, PyBullet debug tools, and YOLO object detection.

Parameters:

camera_configs (-) – List of dictionaries for monocular cameras or stereo pairs.
stereo_configs (-) – Tuple(left_cam_config, right_cam_config) for stereo processing.
use_pybullet_debug (-) – Enables PyBullet sliders to modify virtual camera parameters.
show_plot (-) – Displays PyBullet debug images using Matplotlib.
logger_name (-) – Name for logging information.
physics_client (-) – PyBullet client ID for interacting with a simulation.

Return type:

None

Parameters:

camera_configs (list of dict, optional) – Camera configuration dictionaries
stereo_configs (tuple(dict, dict), optional) – Left and right camera configurations for stereo
use_pybullet_debug (bool, optional) – Enable PyBullet debug sliders (default: False)
show_plot (bool, optional) – Display debug camera feed in matplotlib (default: True)
logger_name (str, optional) – Logger identifier (default: “VisionSystemLogger”)
physics_client (int, optional) – PyBullet client ID for simulation integration

Camera Configuration Structure:

Each camera_config dictionary contains:

name (str) – Camera identifier
translation (list) – [x, y, z] position in world coordinates
rotation (list) – [roll_deg, pitch_deg, yaw_deg] orientation angles
fov (float) – Field of view in degrees
near (float) – Near clipping plane distance
far (float) – Far clipping plane distance
intrinsic_matrix (np.ndarray) – 3×3 camera intrinsic matrix
distortion_coeffs (np.ndarray) – 5-element distortion coefficient array
use_opencv (bool) – Enable OpenCV device capture
device_index (int) – OpenCV device identifier

Parameters:

camera_configs (List[Dict[str, Any]] | None)
stereo_configs (Tuple[Dict[str, Any], Dict[str, Any]] | None)
use_pybullet_debug (bool)
show_plot (bool)
logger_name (str)
physics_client (int | None)

—

Camera Management#

Configuration Methods#

Vision._configure_camera(idx, cfg)[source]#

Store a camera configuration and open an OpenCV capture device if requested.

Internal method for storing camera configurations and initializing OpenCV capture devices.

Parameters:

idx (int) – Camera index for internal storage
cfg (dict) – Camera configuration dictionary

Processing:

Extracts configuration parameters with defaults
Builds extrinsic transformation matrix
Stores camera parameters in self.cameras[idx]
Initializes OpenCV VideoCapture if use_opencv=True
Validates device accessibility and logs status

Parameters:

idx (int)
cfg (Dict[str, Any])

Return type:

None

Vision._validate_stereo_config(left_cfg, right_cfg)[source]#

Basic validation that each stereo cam config has required keys.

Validates stereo camera configuration completeness.

Parameters:

left_cfg (dict) – Left camera configuration
right_cfg (dict) – Right camera configuration

Required Keys:

intrinsic_matrix, distortion_coeffs, translation, rotation

Parameters:

left_cfg (Dict[str, Any])
right_cfg (Dict[str, Any])

Return type:

None

Extrinsic Matrix Computation#

Vision._make_extrinsic_matrix(translation, rotation_deg)[source]#

Create a 4x4 extrinsic matrix from translation and Euler angles in degrees. Uses cv2.Rodrigues for improved numerical stability.

Constructs 4×4 homogeneous transformation matrix from translation and Euler angles.

Parameters:

translation (list) – [x, y, z] translation vector
rotation_deg (list) – [roll, pitch, yaw] in degrees

Returns:

np.ndarray – 4×4 extrinsic transformation matrix

Parameters:

translation (List[float] | ndarray)
rotation_deg (List[float] | ndarray)

Return type:

ndarray

Vision._euler_to_rotation_matrix(euler_deg)[source]#

Convert [roll_deg, pitch_deg, yaw_deg] to a rotation matrix via manual multiplication, or you can chain cv2.Rodrigues calls. We’ll do a direct approach here for clarity.

Converts Euler angles to 3×3 rotation matrix using ZYX convention.

Parameters:

euler_deg (list) – [roll_deg, pitch_deg, yaw_deg]

Returns:

np.ndarray – 3×3 rotation matrix (float32)

Mathematical Implementation:

\[R = R_z(\text{yaw}) \cdot R_y(\text{pitch}) \cdot R_x(\text{roll})\]

Parameters:: euler_deg (List[float] | ndarray)
Return type:: ndarray

—

Image Acquisition#

PyBullet Integration#

Vision.capture_image(camera_index=0)[source]#

Captures an RGB and depth image from PyBullet cameras.

Captures RGB and depth images from PyBullet cameras with automatic depth scaling.

Parameters:

camera_index (int, optional) – Camera identifier (default: 0)

Returns:

rgb (np.ndarray) – RGB image array (H, W, 3) uint8
depth (np.ndarray) – Depth image array (H, W) float32

Processing Pipeline:

Validates camera_index existence in self.cameras
Constructs view matrix from camera translation and fixed target
Builds projection matrix from camera parameters
Calls pb.getCameraImage() with computed matrices
Processes RGBA to RGB conversion
Applies depth scaling: depth = near + (far - near) × normalized_depth

Parameters:: camera_index (int)
Return type:: Tuple[ndarray | None, ndarray | None]

—

Object Detection#

YOLO Integration#

Vision.detect_obstacles(depth_image, rgb_image, depth_threshold=5.0, camera_index=0, step=5)[source]#

Detect obstacles with YOLO (if available) and estimate 3D positions via depth.

Returns:

positions ((N, 3) float32 array)
orientations ((N,) float32 array (yaw angles in degrees in XY plane))

Parameters:

depth_image (ndarray)
rgb_image (ndarray)
depth_threshold (float)
camera_index (int)
step (int)

Return type:

Tuple[ndarray, ndarray]

YOLO-based object detection with 3D position estimation using depth information.

Parameters:

depth_image (np.ndarray) – Depth image for 3D positioning
rgb_image (np.ndarray) – RGB image for YOLO detection
depth_threshold (float, optional) – Maximum depth (m) to consider for obstacles. Default: 5.0.

Changed in version 1.3.2: Default raised from 0.0 → 5.0 — the prior default silently filtered every detection.

camera_index (int, optional) – Camera for intrinsic parameters (default: 0)
step (int, optional) – Depth downsampling stride (default: 5)

Returns:

positions (np.ndarray) – Shape (N, 3) 3D object positions
orientations (np.ndarray) – Shape (N,) XY-plane orientation angles

Detection Pipeline:

YOLO Inference: self.yolo_model(rgb_image, conf=0.3)
Bounding Box Extraction: box.xyxy[0].tolist() coordinates
Depth Analysis: Median depth within bounding box region
3D Reprojection: Camera intrinsics-based coordinate transformation
Orientation Estimation: np.arctan2(y_cam, x_cam) in XY plane

3D Transformation:

\[\begin{split}\begin{align} x_{cam} &= (c_x - c_{x,intrinsic}) \cdot \frac{d}{f_x} \\ y_{cam} &= (c_y - c_{y,intrinsic}) \cdot \frac{d}{f_y} \\ z_{cam} &= d \end{align}\end{split}\]

Where (c_x, c_y) is bounding box center, d is median depth, and f_x, f_y are focal lengths.

—

Stereo Vision Pipeline#

Rectification Setup#

Vision.compute_stereo_rectification_maps(image_size=(640, 480))[source]#

Compute and cache stereo rectification maps for configured cameras.

Computes stereo rectification maps using OpenCV stereoRectify algorithm.

Parameters:

image_size (tuple, optional) – (width, height) for rectification (default: (640, 480))

Implementation Details:

Parameter Extraction: Intrinsics K1, K2 and distortions D1, D2
Relative Geometry: R_lr = R_right @ R_left.T, t_lr = t_right - R_lr @ t_left
Type Unification: Convert all inputs to float64 for cv2.stereoRectify
Rectification Computation: cv2.stereoRectify with CALIB_ZERO_DISPARITY
Map Generation: cv2.initUndistortRectifyMap for both cameras
Q Matrix Storage: Disparity-to-depth transformation matrix

Parameters:: image_size (Tuple[int, int])
Return type:: None

Image Processing#

Vision.rectify_stereo_images(left_img, right_img)[source]#

Remap left and right images to their rectified forms.

Applies rectification maps to stereo image pairs.

Parameters:

left_img (np.ndarray) – Left camera image
right_img (np.ndarray) – Right camera image

Returns:

left_rect (np.ndarray) – Rectified left image
right_rect (np.ndarray) – Rectified right image

Requirements:

stereo_enabled = True
Rectification maps computed via compute_stereo_rectification_maps()

Parameters:

left_img (ndarray)
right_img (ndarray)

Return type:

Tuple[ndarray, ndarray]

Vision.compute_disparity(left_rect, right_rect)[source]#

Compute a disparity map from rectified stereo images using StereoSGBM (or BM).

Computes disparity map using StereoSGBM algorithm.

Parameters:

left_rect (np.ndarray) – Rectified left image
right_rect (np.ndarray) – Rectified right image

Returns:

disparity (np.ndarray) – Float32 disparity map

StereoSGBM Configuration:

minDisparity: 0
numDisparities: 64
blockSize: 7
P1: 8 × 3 × 7²
P2: 32 × 3 × 7²
Fixed-point scaling: division by 16.0

Parameters:

left_rect (ndarray)
right_rect (ndarray)

Return type:

ndarray

3D Reconstruction#

Vision.disparity_to_pointcloud(disparity)[source]#

Reproject a disparity map to 3D points using the Q matrix from stereoRectify.

Reprojects disparity map to 3D point cloud using Q matrix.

Parameters:

disparity (np.ndarray) – Disparity map from compute_disparity()

Returns:

cloud_filtered (np.ndarray) – Shape (N, 3) valid 3D points

Filtering Criteria:

Positive disparity values (disp_flat > 0)
Finite coordinates (np.isfinite(cloud[:, 0]))
Reasonable depth (cloud[:, 2] < 10.0)

Parameters:: disparity (ndarray)
Return type:: ndarray

Vision.get_stereo_point_cloud(left_img, right_img)[source]#

High-level pipeline: rectify, compute disparity, reproject to 3D.

High-level stereo processing pipeline combining rectification, disparity, and reprojection.

Parameters:

left_img (np.ndarray) – Left camera image
right_img (np.ndarray) – Right camera image

Returns:

point_cloud (np.ndarray) – Shape (N, 3) 3D point coordinates

Pipeline Sequence:

rectify_stereo_images(left_img, right_img)
compute_disparity(left_rect, right_rect)
disparity_to_pointcloud(disparity)

Parameters:

left_img (ndarray)
right_img (ndarray)

Return type:

ndarray

—

PyBullet Debug Interface#

Debug Slider Setup#

Vision._setup_pybullet_debug_sliders()[source]#

Creates PyBullet debug sliders for a single ‘virtual’ camera.

Creates PyBullet debug sliders for interactive virtual camera control.

Slider Categories:

View Matrix Parameters:

target_x, target_y, target_z: Camera target position
distance: Camera distance from target
yaw, pitch, roll: Camera orientation angles
upAxisIndex: Up axis selection (0 or 1)

Projection Matrix Parameters:

width, height: Image resolution
fov: Field of view angle
near_val, far_val: Clipping plane distances
print: Debug output trigger

Return type:: None

Vision._get_pybullet_view_proj()[source]#

Reads debug sliders, returns (view_mtx, proj_mtx, width, height).

Reads debug slider values and constructs view/projection matrices.

Returns:

view_mtx (list) – PyBullet view matrix
proj_mtx (list) – PyBullet projection matrix
width (int) – Image width
height (int) – Image height

Matrix Construction:

View: pb.computeViewMatrixFromYawPitchRoll()
Projection: pb.computeProjectionMatrixFOV()
Debug logging on print button toggle

Return type:: Tuple[Any, Any, int, int]

—

Resource Management#

Cleanup Methods#

Vision.release()[source]#

Release resources (e.g., OpenCV capture devices).

Releases OpenCV capture device resources.

Operations:

Iterates through self.capture_devices
Calls cap.release() for each VideoCapture object
Clears capture_devices dictionary
Logs cleanup status for each device

Return type:: None

Vision.__del__()[source]#

Destructor: ensure we release resources gracefully.

Destructor with robust error handling for graceful resource cleanup.

Safety Features:

Validates logger existence and handler availability
Checks stream closure status before logging
Silent exception handling to prevent destructor failures
Guaranteed resource cleanup regardless of logging success

Return type:: None

—

Utility Functions#

ManipulaPy.vision.read_debug_parameters(dbg_params)[source]#

Utility to read current slider values from PyBullet debug interface.

Utility function for reading PyBullet debug parameter values.

Parameters:

dbg_params (dict) – Dictionary mapping parameter names to PyBullet IDs

Returns:

values (dict) – Dictionary mapping parameter names to current values

Implementation:

Iterates through dbg_params and calls pb.readUserDebugParameter() for each ID.

Parameters:: dbg_params (Dict[str, int])
Return type:: Dict[str, float]

—

Data Structures and Configuration#

Internal Storage Format#

Camera Storage (self.cameras):

self.cameras[index] = {
    "name": str,
    "translation": [x, y, z],
    "rotation": [roll_deg, pitch_deg, yaw_deg],
    "fov": float,
    "near": float,
    "far": float,
    "intrinsic_matrix": np.ndarray(3, 3),
    "distortion_coeffs": np.ndarray(5,),
    "use_opencv": bool,
    "device_index": int,
    "extrinsic_matrix": np.ndarray(4, 4)
}

Stereo Configuration Attributes:

# Stereo processing state
self.stereo_enabled: bool
self.left_cam_cfg: dict
self.right_cam_cfg: dict

# Rectification maps
self.left_map_x: np.ndarray
self.left_map_y: np.ndarray
self.right_map_x: np.ndarray
self.right_map_y: np.ndarray

# 3D reconstruction
self.Q: np.ndarray  # 4x4 disparity-to-depth matrix
self.stereo_matcher: cv2.StereoSGBM

Default Camera Configuration#

When no camera_configs provided, uses:

default_config = {
    "name": "default_camera",
    "translation": [0, 0, 0],
    "rotation": [0, 0, 0],
    "fov": 60,
    "near": 0.1,
    "far": 5.0,
    "intrinsic_matrix": [[500, 0, 320],
                        [0, 500, 240],
                        [0, 0, 1]],
    "distortion_coeffs": [0, 0, 0, 0, 0],
    "use_opencv": False,
    "device_index": 0
}

—

Error Handling and Validation#

YOLO Model Management#

Loading Failure: Sets yolo_model = None, continues operation
Detection Fallback: Returns empty arrays when YOLO unavailable
Input Validation: Checks rgb_image and depth_image for None/invalid

Stereo Processing Errors#

Configuration Validation: Checks required keys in stereo configs
Runtime State Checking: Validates stereo_enabled before operations
Map Initialization: Ensures rectification maps computed before use

OpenCV Device Handling#

Device Access Failure: Raises RuntimeError with descriptive message
Capture Validation: Uses cap.isOpened() to verify device accessibility
Resource Cleanup: Automatic release in destructor and explicit method

PyBullet Integration Safety#

Parameter Reading: Safe handling of missing debug parameters
Matrix Validation: Debug logging for view/projection matrix inspection
Client State: No assumptions about PyBullet connection status

—

Performance Considerations#

Memory Management#

Image Arrays: Contiguous memory layout for OpenCV operations
Rectification Maps: Persistent storage for repeated stereo processing
Point Clouds: Filtered arrays to reduce memory footprint

Computational Efficiency#

YOLO Inference: Single forward pass per image
Stereo Matching: StereoSGBM optimized for quality/speed balance
Depth Scaling: Vectorized operations for range conversion

Threading Considerations#

OpenCV Capture: Single-threaded device access
YOLO Processing: GPU acceleration when available
PyBullet Integration: Main thread simulation access required

—

External Dependencies#

OpenCV – Computer vision algorithms and stereo processing
PyBullet – Physics simulation and camera rendering
Ultralytics YOLO – Object detection framework
NumPy – Numerical array operations
Matplotlib – Debug visualization

Vision API Reference#

Quick Navigation#

Vision Class#

Features:#

Camera Management#

Configuration Methods#

Extrinsic Matrix Computation#

Image Acquisition#

PyBullet Integration#

Object Detection#

YOLO Integration#

Stereo Vision Pipeline#

Rectification Setup#

Image Processing#

3D Reconstruction#

PyBullet Debug Interface#

Debug Slider Setup#

Resource Management#

Cleanup Methods#

Utility Functions#

Data Structures and Configuration#

Internal Storage Format#

Default Camera Configuration#

Error Handling and Validation#

YOLO Model Management#

Stereo Processing Errors#

OpenCV Device Handling#

PyBullet Integration Safety#

Performance Considerations#

Memory Management#

Computational Efficiency#

Threading Considerations#

See Also#

External Dependencies#