Vision API Reference#

This page documents ManipulaPy.vision, the module for computer vision capabilities including stereo vision, object detection, camera calibration, and PyBullet integration with optional YOLO object detection.

Tip

For conceptual explanations, see Vision User Guide.

Quick Navigation#

Vision Class#

class ManipulaPy.vision.Vision(camera_configs=None, stereo_configs=None, use_pybullet_debug=False, show_plot=True, logger_name='VisionSystemLogger', physics_client=None)[source]#

Bases: object

A unified vision class for monocular/stereo inputs and PyBullet debugging.

Features:#

  • Monocular camera(s) with user-specified intrinsics/extrinsics or OpenCV capture devices.

  • Optional PyBullet debug sliders for a “virtual” camera.

  • Stereo pipeline: rectification, disparity, 3D point cloud generation (if stereo configs provided).

  • Basic obstacle detection via depth thresholding + intrinsics-based unprojection.

param camera_configs:
Each dict describes one camera (monocular or part of a stereo pair):
  • “name” (str)

  • “translation” ([x, y, z]) in world or local coords

  • “rotation” ([roll_deg, pitch_deg, yaw_deg]) in degrees

  • “fov” (float)

  • “near” (float)

  • “far” (float)

  • “intrinsic_matrix” (3x3 np.array)

  • “distortion_coeffs” (1D np.array of length=5)

  • “use_opencv” (bool)

  • “device_index” (int) => for OpenCV

type camera_configs:

list of dict, optional

param stereo_configs:

(left_cam_cfg, right_cam_cfg) for stereo. Must have ‘intrinsic_matrix’, ‘distortion_coeffs’, ‘translation’, ‘rotation’, etc.

type stereo_configs:

tuple(dict, dict), optional

param use_pybullet_debug:

If True, create debug sliders in PyBullet for a single “virtual” camera.

type use_pybullet_debug:

bool

param show_plot:

If True (and use_pybullet_debug=True), display the debug camera feed in a Matplotlib window.

type show_plot:

bool

param logger_name:

Logger name for this class.

type logger_name:

str

param physics_client:

PyBullet client ID if controlling a simulation environment.

type physics_client:

int or None

Unified vision system for monocular/stereo cameras with PyBullet integration, stereo processing pipeline, and YOLO-based object detection capabilities.

Constructor

__init__(camera_configs=None, stereo_configs=None, use_pybullet_debug=False, show_plot=True, logger_name='VisionSystemLogger', physics_client=None)[source]#

Initializes the Vision system with optional cameras, PyBullet debug tools, and YOLO object detection.

Parameters:
  • camera_configs (-) – List of dictionaries for monocular cameras or stereo pairs.

  • stereo_configs (-) – Tuple(left_cam_config, right_cam_config) for stereo processing.

  • use_pybullet_debug (-) – Enables PyBullet sliders to modify virtual camera parameters.

  • show_plot (-) – Displays PyBullet debug images using Matplotlib.

  • logger_name (-) – Name for logging information.

  • physics_client (-) – PyBullet client ID for interacting with a simulation.

Return type:

None

Parameters:

  • camera_configs (list of dict, optional) – Camera configuration dictionaries

  • stereo_configs (tuple(dict, dict), optional) – Left and right camera configurations for stereo

  • use_pybullet_debug (bool, optional) – Enable PyBullet debug sliders (default: False)

  • show_plot (bool, optional) – Display debug camera feed in matplotlib (default: True)

  • logger_name (str, optional) – Logger identifier (default: “VisionSystemLogger”)

  • physics_client (int, optional) – PyBullet client ID for simulation integration

Camera Configuration Structure:

Each camera_config dictionary contains:

  • name (str) – Camera identifier

  • translation (list) – [x, y, z] position in world coordinates

  • rotation (list) – [roll_deg, pitch_deg, yaw_deg] orientation angles

  • fov (float) – Field of view in degrees

  • near (float) – Near clipping plane distance

  • far (float) – Far clipping plane distance

  • intrinsic_matrix (np.ndarray) – 3×3 camera intrinsic matrix

  • distortion_coeffs (np.ndarray) – 5-element distortion coefficient array

  • use_opencv (bool) – Enable OpenCV device capture

  • device_index (int) – OpenCV device identifier

Parameters:

Camera Management#

Configuration Methods#

Vision._configure_camera(idx, cfg)[source]#

Store a camera configuration and open an OpenCV capture device if requested.

Internal method for storing camera configurations and initializing OpenCV capture devices.

Parameters:

  • idx (int) – Camera index for internal storage

  • cfg (dict) – Camera configuration dictionary

Processing:

  1. Extracts configuration parameters with defaults

  2. Builds extrinsic transformation matrix

  3. Stores camera parameters in self.cameras[idx]

  4. Initializes OpenCV VideoCapture if use_opencv=True

  5. Validates device accessibility and logs status

Parameters:
Return type:

None

Vision._validate_stereo_config(left_cfg, right_cfg)[source]#

Basic validation that each stereo cam config has required keys.

Validates stereo camera configuration completeness.

Parameters:

  • left_cfg (dict) – Left camera configuration

  • right_cfg (dict) – Right camera configuration

Required Keys:

  • intrinsic_matrix, distortion_coeffs, translation, rotation

Parameters:
Return type:

None

Extrinsic Matrix Computation#

Vision._make_extrinsic_matrix(translation, rotation_deg)[source]#

Create a 4x4 extrinsic matrix from translation and Euler angles in degrees. Uses cv2.Rodrigues for improved numerical stability.

Constructs 4×4 homogeneous transformation matrix from translation and Euler angles.

Parameters:

  • translation (list) – [x, y, z] translation vector

  • rotation_deg (list) – [roll, pitch, yaw] in degrees

Returns:

  • np.ndarray – 4×4 extrinsic transformation matrix

Parameters:
Return type:

ndarray

Vision._euler_to_rotation_matrix(euler_deg)[source]#

Convert [roll_deg, pitch_deg, yaw_deg] to a rotation matrix via manual multiplication, or you can chain cv2.Rodrigues calls. We’ll do a direct approach here for clarity.

Converts Euler angles to 3×3 rotation matrix using ZYX convention.

Parameters:

  • euler_deg (list) – [roll_deg, pitch_deg, yaw_deg]

Returns:

  • np.ndarray – 3×3 rotation matrix (float32)

Mathematical Implementation:

\[R = R_z(\text{yaw}) \cdot R_y(\text{pitch}) \cdot R_x(\text{roll})\]
Parameters:

euler_deg (List[float] | ndarray)

Return type:

ndarray

Image Acquisition#

PyBullet Integration#

Vision.capture_image(camera_index=0)[source]#

Captures an RGB and depth image from PyBullet cameras.

Captures RGB and depth images from PyBullet cameras with automatic depth scaling.

Parameters:

  • camera_index (int, optional) – Camera identifier (default: 0)

Returns:

  • rgb (np.ndarray) – RGB image array (H, W, 3) uint8

  • depth (np.ndarray) – Depth image array (H, W) float32

Processing Pipeline:

  1. Validates camera_index existence in self.cameras

  2. Constructs view matrix from camera translation and fixed target

  3. Builds projection matrix from camera parameters

  4. Calls pb.getCameraImage() with computed matrices

  5. Processes RGBA to RGB conversion

  6. Applies depth scaling: depth = near + (far - near) × normalized_depth

Parameters:

camera_index (int)

Return type:

Tuple[ndarray | None, ndarray | None]

Object Detection#

YOLO Integration#

Vision.detect_obstacles(depth_image, rgb_image, depth_threshold=5.0, camera_index=0, step=5)[source]#

Detect obstacles with YOLO (if available) and estimate 3D positions via depth.

Returns:

  • positions ((N, 3) float32 array)

  • orientations ((N,) float32 array (yaw angles in degrees in XY plane))

Parameters:
Return type:

Tuple[ndarray, ndarray]

YOLO-based object detection with 3D position estimation using depth information.

Parameters:

  • depth_image (np.ndarray) – Depth image for 3D positioning

  • rgb_image (np.ndarray) – RGB image for YOLO detection

  • depth_threshold (float, optional) – Maximum depth (m) to consider for obstacles. Default: 5.0.

Changed in version 1.3.2: Default raised from 0.0 → 5.0 — the prior default silently filtered every detection.

  • camera_index (int, optional) – Camera for intrinsic parameters (default: 0)

  • step (int, optional) – Depth downsampling stride (default: 5)

Returns:

  • positions (np.ndarray) – Shape (N, 3) 3D object positions

  • orientations (np.ndarray) – Shape (N,) XY-plane orientation angles

Detection Pipeline:

  1. YOLO Inference: self.yolo_model(rgb_image, conf=0.3)

  2. Bounding Box Extraction: box.xyxy[0].tolist() coordinates

  3. Depth Analysis: Median depth within bounding box region

  4. 3D Reprojection: Camera intrinsics-based coordinate transformation

  5. Orientation Estimation: np.arctan2(y_cam, x_cam) in XY plane

3D Transformation:

\[\begin{split}\begin{align} x_{cam} &= (c_x - c_{x,intrinsic}) \cdot \frac{d}{f_x} \\ y_{cam} &= (c_y - c_{y,intrinsic}) \cdot \frac{d}{f_y} \\ z_{cam} &= d \end{align}\end{split}\]

Where (c_x, c_y) is bounding box center, d is median depth, and f_x, f_y are focal lengths.

Stereo Vision Pipeline#

Rectification Setup#

Vision.compute_stereo_rectification_maps(image_size=(640, 480))[source]#

Compute and cache stereo rectification maps for configured cameras.

Computes stereo rectification maps using OpenCV stereoRectify algorithm.

Parameters:

  • image_size (tuple, optional) – (width, height) for rectification (default: (640, 480))

Implementation Details:

  1. Parameter Extraction: Intrinsics K1, K2 and distortions D1, D2

  2. Relative Geometry: R_lr = R_right @ R_left.T, t_lr = t_right - R_lr @ t_left

  3. Type Unification: Convert all inputs to float64 for cv2.stereoRectify

  4. Rectification Computation: cv2.stereoRectify with CALIB_ZERO_DISPARITY

  5. Map Generation: cv2.initUndistortRectifyMap for both cameras

  6. Q Matrix Storage: Disparity-to-depth transformation matrix

Parameters:

image_size (Tuple[int, int])

Return type:

None

Image Processing#

Vision.rectify_stereo_images(left_img, right_img)[source]#

Remap left and right images to their rectified forms.

Applies rectification maps to stereo image pairs.

Parameters:

  • left_img (np.ndarray) – Left camera image

  • right_img (np.ndarray) – Right camera image

Returns:

  • left_rect (np.ndarray) – Rectified left image

  • right_rect (np.ndarray) – Rectified right image

Requirements:

  • stereo_enabled = True

  • Rectification maps computed via compute_stereo_rectification_maps()

Parameters:
Return type:

Tuple[ndarray, ndarray]

Vision.compute_disparity(left_rect, right_rect)[source]#

Compute a disparity map from rectified stereo images using StereoSGBM (or BM).

Computes disparity map using StereoSGBM algorithm.

Parameters:

  • left_rect (np.ndarray) – Rectified left image

  • right_rect (np.ndarray) – Rectified right image

Returns:

  • disparity (np.ndarray) – Float32 disparity map

StereoSGBM Configuration:

  • minDisparity: 0

  • numDisparities: 64

  • blockSize: 7

  • P1: 8 × 3 × 7²

  • P2: 32 × 3 × 7²

  • Fixed-point scaling: division by 16.0

Parameters:
Return type:

ndarray

3D Reconstruction#

Vision.disparity_to_pointcloud(disparity)[source]#

Reproject a disparity map to 3D points using the Q matrix from stereoRectify.

Reprojects disparity map to 3D point cloud using Q matrix.

Parameters:

  • disparity (np.ndarray) – Disparity map from compute_disparity()

Returns:

  • cloud_filtered (np.ndarray) – Shape (N, 3) valid 3D points

Filtering Criteria:

  • Positive disparity values (disp_flat > 0)

  • Finite coordinates (np.isfinite(cloud[:, 0]))

  • Reasonable depth (cloud[:, 2] < 10.0)

Parameters:

disparity (ndarray)

Return type:

ndarray

Vision.get_stereo_point_cloud(left_img, right_img)[source]#

High-level pipeline: rectify, compute disparity, reproject to 3D.

High-level stereo processing pipeline combining rectification, disparity, and reprojection.

Parameters:

  • left_img (np.ndarray) – Left camera image

  • right_img (np.ndarray) – Right camera image

Returns:

  • point_cloud (np.ndarray) – Shape (N, 3) 3D point coordinates

Pipeline Sequence:

  1. rectify_stereo_images(left_img, right_img)

  2. compute_disparity(left_rect, right_rect)

  3. disparity_to_pointcloud(disparity)

Parameters:
Return type:

ndarray

PyBullet Debug Interface#

Debug Slider Setup#

Vision._setup_pybullet_debug_sliders()[source]#

Creates PyBullet debug sliders for a single ‘virtual’ camera.

Creates PyBullet debug sliders for interactive virtual camera control.

Slider Categories:

View Matrix Parameters:

  • target_x, target_y, target_z: Camera target position

  • distance: Camera distance from target

  • yaw, pitch, roll: Camera orientation angles

  • upAxisIndex: Up axis selection (0 or 1)

Projection Matrix Parameters:

  • width, height: Image resolution

  • fov: Field of view angle

  • near_val, far_val: Clipping plane distances

  • print: Debug output trigger

Return type:

None

Vision._get_pybullet_view_proj()[source]#

Reads debug sliders, returns (view_mtx, proj_mtx, width, height).

Reads debug slider values and constructs view/projection matrices.

Returns:

  • view_mtx (list) – PyBullet view matrix

  • proj_mtx (list) – PyBullet projection matrix

  • width (int) – Image width

  • height (int) – Image height

Matrix Construction:

  • View: pb.computeViewMatrixFromYawPitchRoll()

  • Projection: pb.computeProjectionMatrixFOV()

  • Debug logging on print button toggle

Return type:

Tuple[Any, Any, int, int]

Resource Management#

Cleanup Methods#

Vision.release()[source]#

Release resources (e.g., OpenCV capture devices).

Releases OpenCV capture device resources.

Operations:

  • Iterates through self.capture_devices

  • Calls cap.release() for each VideoCapture object

  • Clears capture_devices dictionary

  • Logs cleanup status for each device

Return type:

None

Vision.__del__()[source]#

Destructor: ensure we release resources gracefully.

Destructor with robust error handling for graceful resource cleanup.

Safety Features:

  • Validates logger existence and handler availability

  • Checks stream closure status before logging

  • Silent exception handling to prevent destructor failures

  • Guaranteed resource cleanup regardless of logging success

Return type:

None

Utility Functions#

ManipulaPy.vision.read_debug_parameters(dbg_params)[source]#

Utility to read current slider values from PyBullet debug interface.

Utility function for reading PyBullet debug parameter values.

Parameters:

  • dbg_params (dict) – Dictionary mapping parameter names to PyBullet IDs

Returns:

  • values (dict) – Dictionary mapping parameter names to current values

Implementation:

Iterates through dbg_params and calls pb.readUserDebugParameter() for each ID.

Parameters:

dbg_params (Dict[str, int])

Return type:

Dict[str, float]

Data Structures and Configuration#

Internal Storage Format#

Camera Storage (self.cameras):

self.cameras[index] = {
    "name": str,
    "translation": [x, y, z],
    "rotation": [roll_deg, pitch_deg, yaw_deg],
    "fov": float,
    "near": float,
    "far": float,
    "intrinsic_matrix": np.ndarray(3, 3),
    "distortion_coeffs": np.ndarray(5,),
    "use_opencv": bool,
    "device_index": int,
    "extrinsic_matrix": np.ndarray(4, 4)
}

Stereo Configuration Attributes:

# Stereo processing state
self.stereo_enabled: bool
self.left_cam_cfg: dict
self.right_cam_cfg: dict

# Rectification maps
self.left_map_x: np.ndarray
self.left_map_y: np.ndarray
self.right_map_x: np.ndarray
self.right_map_y: np.ndarray

# 3D reconstruction
self.Q: np.ndarray  # 4x4 disparity-to-depth matrix
self.stereo_matcher: cv2.StereoSGBM

Default Camera Configuration#

When no camera_configs provided, uses:

default_config = {
    "name": "default_camera",
    "translation": [0, 0, 0],
    "rotation": [0, 0, 0],
    "fov": 60,
    "near": 0.1,
    "far": 5.0,
    "intrinsic_matrix": [[500, 0, 320],
                        [0, 500, 240],
                        [0, 0, 1]],
    "distortion_coeffs": [0, 0, 0, 0, 0],
    "use_opencv": False,
    "device_index": 0
}

Error Handling and Validation#

YOLO Model Management#

  • Loading Failure: Sets yolo_model = None, continues operation

  • Detection Fallback: Returns empty arrays when YOLO unavailable

  • Input Validation: Checks rgb_image and depth_image for None/invalid

Stereo Processing Errors#

  • Configuration Validation: Checks required keys in stereo configs

  • Runtime State Checking: Validates stereo_enabled before operations

  • Map Initialization: Ensures rectification maps computed before use

OpenCV Device Handling#

  • Device Access Failure: Raises RuntimeError with descriptive message

  • Capture Validation: Uses cap.isOpened() to verify device accessibility

  • Resource Cleanup: Automatic release in destructor and explicit method

PyBullet Integration Safety#

  • Parameter Reading: Safe handling of missing debug parameters

  • Matrix Validation: Debug logging for view/projection matrix inspection

  • Client State: No assumptions about PyBullet connection status

Performance Considerations#

Memory Management#

  • Image Arrays: Contiguous memory layout for OpenCV operations

  • Rectification Maps: Persistent storage for repeated stereo processing

  • Point Clouds: Filtered arrays to reduce memory footprint

Computational Efficiency#

  • YOLO Inference: Single forward pass per image

  • Stereo Matching: StereoSGBM optimized for quality/speed balance

  • Depth Scaling: Vectorized operations for range conversion

Threading Considerations#

  • OpenCV Capture: Single-threaded device access

  • YOLO Processing: GPU acceleration when available

  • PyBullet Integration: Main thread simulation access required

See Also#

External Dependencies#

  • OpenCV – Computer vision algorithms and stereo processing

  • PyBullet – Physics simulation and camera rendering

  • Ultralytics YOLO – Object detection framework

  • NumPy – Numerical array operations

  • Matplotlib – Debug visualization