.. _user_guide_vision:
Vision User Guide
=================
This comprehensive guide covers the Vision module in ManipulaPy, which provides advanced computer vision capabilities for robotic perception, including stereo vision, object detection, and PyBullet integration.
.. contents:: **Quick Navigation**
:local:
:depth: 2
Overview
--------
The Vision module is a unified computer vision system that brings together:
- **Monocular and stereo camera support** with flexible configuration
- **YOLO-based object detection** for real-time obstacle identification
- **PyBullet virtual cameras** with interactive debugging sliders
- **Stereo vision pipeline** for 3D reconstruction and depth estimation
- **Camera calibration utilities** for precise geometric measurements
.. raw:: html
📷
Multi-Camera Support
Configure multiple cameras with individual intrinsics, extrinsics, and distortion parameters
🤖
YOLO Integration
Real-time object detection with YOLOv8 for robust obstacle identification
🎮
PyBullet Debug
Interactive virtual cameras with real-time parameter adjustment
👁️
Stereo Vision
Complete stereo pipeline from rectification to 3D point cloud generation
Getting Started
---------------
Basic Camera Setup
~~~~~~~~~~~~~~~~~~
The simplest way to start with the Vision module:
.. code-block:: python
from ManipulaPy.vision import Vision
import numpy as np
# Create a basic vision system with default settings
vision = Vision()
# Capture an image (requires PyBullet environment)
rgb_image, depth_image = vision.capture_image()
print(f"📸 Captured RGB image: {rgb_image.shape}")
print(f"📏 Captured depth image: {depth_image.shape}")
Custom Camera Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~
For more control, configure your cameras explicitly:
.. code-block:: python
# Define camera parameters
camera_config = {
"name": "workspace_camera",
"translation": [0.0, 0.0, 1.0], # 1 meter above workspace
"rotation": [0, 45, 0], # Look down at 45 degrees
"fov": 60, # Field of view in degrees
"near": 0.1, # Near clipping plane
"far": 10.0, # Far clipping plane
"intrinsic_matrix": np.array([
[500, 0, 320], # fx, 0, cx
[0, 500, 240], # 0, fy, cy
[0, 0, 1] # 0, 0, 1
], dtype=np.float32),
"distortion_coeffs": np.zeros(5, dtype=np.float32), # k1,k2,p1,p2,k3
"use_opencv": False, # Use PyBullet cameras
"device_index": 0 # Camera device index
}
# Create vision system with custom configuration
vision = Vision(camera_configs=[camera_config])
Core Features
-------------
Object Detection with YOLO
~~~~~~~~~~~~~~~~~~~~~~~~~~
The Vision module integrates YOLOv8 for robust object detection:
.. code-block:: python
# Capture images
rgb_image, depth_image = vision.capture_image(camera_index=0)
# Detect obstacles with 3D positioning
obstacle_positions, orientations = vision.detect_obstacles(
depth_image=depth_image,
rgb_image=rgb_image,
depth_threshold=5.0, # Only consider objects within 5 meters
camera_index=0,
step=2 # Depth sampling step for efficiency
)
# Process detected obstacles
print(f"🔍 Detected {len(obstacle_positions)} obstacles")
for i, (pos, orientation) in enumerate(zip(obstacle_positions, orientations)):
print(f"Obstacle {i+1}:")
print(f" 📍 Position: [{pos[0]:.2f}, {pos[1]:.2f}, {pos[2]:.2f}] meters")
print(f" 🧭 Orientation: {orientation:.1f} degrees")
The object detection pipeline:
1. **YOLO Detection**: Identifies objects in RGB images with bounding boxes
2. **Depth Analysis**: Uses depth information within bounding boxes
3. **3D Positioning**: Converts 2D detections to 3D world coordinates
4. **Orientation Estimation**: Computes object orientation in the XY plane
PyBullet Virtual Cameras
~~~~~~~~~~~~~~~~~~~~~~~~
For simulation and debugging, use PyBullet's virtual cameras:
.. code-block:: python
# Create an interactive debug camera system
debug_vision = Vision(
use_pybullet_debug=True, # Enable PyBullet debug sliders
show_plot=True # Display camera feed in matplotlib
)
# The debug interface provides real-time sliders for:
# - Camera position (target_x, target_y, target_z)
# - Camera orientation (yaw, pitch, roll)
# - View parameters (distance, up axis)
# - Projection settings (width, height, FOV, near/far planes)
**Debug Interface Features:**
- **Real-time parameter adjustment** via PyBullet GUI sliders
- **Live camera feed** displayed in matplotlib window
- **Matrix visualization** for view and projection matrices
- **Interactive positioning** for optimal camera placement
Stereo Vision Pipeline
~~~~~~~~~~~~~~~~~~~~~~
For 3D reconstruction, configure a stereo camera pair:
.. code-block:: python
# Configure left camera
left_camera_config = {
"name": "left_camera",
"translation": [0.0, 0.0, 0.5],
"rotation": [0, 0, 0],
"intrinsic_matrix": np.array([
[600, 0, 320],
[0, 600, 240],
[0, 0, 1]
], dtype=np.float32),
"distortion_coeffs": np.zeros(5, dtype=np.float32)
}
# Configure right camera (10cm baseline)
right_camera_config = left_camera_config.copy()
right_camera_config["name"] = "right_camera"
right_camera_config["translation"] = [0.1, 0.0, 0.5] # 10cm to the right
# Create stereo vision system
stereo_vision = Vision(stereo_configs=(left_camera_config, right_camera_config))
# Compute rectification maps (do this once)
stereo_vision.compute_stereo_rectification_maps(image_size=(640, 480))
# Capture stereo images
left_image, _ = stereo_vision.capture_image(0) # Left camera
right_image, _ = stereo_vision.capture_image(1) # Right camera
# Process stereo pipeline
left_rect, right_rect = stereo_vision.rectify_stereo_images(left_image, right_image)
disparity_map = stereo_vision.compute_disparity(left_rect, right_rect)
point_cloud = stereo_vision.disparity_to_pointcloud(disparity_map)
print(f"🌐 Generated point cloud with {len(point_cloud)} 3D points")
**Stereo Pipeline Steps:**
1. **Image Rectification**: Align stereo images for disparity computation
2. **Disparity Calculation**: Use StereoSGBM for robust disparity estimation
3. **3D Reconstruction**: Convert disparity to 3D points using camera geometry
4. **Point Cloud Filtering**: Remove invalid and distant points
Advanced Usage
--------------
Multiple Camera Systems
~~~~~~~~~~~~~~~~~~~~~~~
Configure and manage multiple cameras simultaneously:
.. code-block:: python
# Define multiple camera configurations
camera_configs = [
{ # Overview camera
"name": "overview_camera",
"translation": [0, 0, 2.0],
"rotation": [0, 90, 0], # Look straight down
"fov": 80,
"intrinsic_matrix": np.array([[400, 0, 320], [0, 400, 240], [0, 0, 1]], dtype=np.float32),
"distortion_coeffs": np.zeros(5, dtype=np.float32)
},
{ # Side view camera
"name": "side_camera",
"translation": [1.0, 0, 0.5],
"rotation": [0, 0, 90], # Look sideways
"fov": 60,
"intrinsic_matrix": np.array([[500, 0, 320], [0, 500, 240], [0, 0, 1]], dtype=np.float32),
"distortion_coeffs": np.zeros(5, dtype=np.float32)
}
]
# Create multi-camera vision system
multi_vision = Vision(camera_configs=camera_configs)
# Capture from different cameras
overview_rgb, overview_depth = multi_vision.capture_image(camera_index=0)
side_rgb, side_depth = multi_vision.capture_image(camera_index=1)
# Detect obstacles from multiple viewpoints
obstacles_overview, _ = multi_vision.detect_obstacles(overview_depth, overview_rgb, camera_index=0)
obstacles_side, _ = multi_vision.detect_obstacles(side_depth, side_rgb, camera_index=1)
print(f"📷 Overview camera detected {len(obstacles_overview)} obstacles")
print(f"📷 Side camera detected {len(obstacles_side)} obstacles")
OpenCV Camera Integration
~~~~~~~~~~~~~~~~~~~~~~~~~
Use real hardware cameras with OpenCV:
.. code-block:: python
# Configure real camera with OpenCV
real_camera_config = {
"name": "usb_camera",
"translation": [0, 0, 0],
"rotation": [0, 0, 0],
"fov": 60,
"intrinsic_matrix": np.array([
[800, 0, 320], # Values from camera calibration
[0, 800, 240],
[0, 0, 1]
], dtype=np.float32),
"distortion_coeffs": np.array([-0.1, 0.05, 0, 0, 0], dtype=np.float32), # From calibration
"use_opencv": True, # Enable OpenCV capture
"device_index": 0 # USB camera device ID
}
# Create vision system with real camera
real_vision = Vision(camera_configs=[real_camera_config])
# Note: capture_image() will use OpenCV for image acquisition
# when use_opencv=True
Camera Calibration Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Understanding the camera configuration parameters:
.. code-block:: python
# Intrinsic matrix format:
# [fx 0 cx]
# [0 fy cy]
# [0 0 1]
#
# Where:
# fx, fy = focal lengths in pixels
# cx, cy = principal point (image center) in pixels
intrinsic_matrix = np.array([
[focal_x, 0, center_x],
[0, focal_y, center_y],
[0, 0, 1]
], dtype=np.float32)
# Distortion coefficients: [k1, k2, p1, p2, k3]
# k1, k2, k3 = radial distortion coefficients
# p1, p2 = tangential distortion coefficients
distortion_coeffs = np.array([k1, k2, p1, p2, k3], dtype=np.float32)
# Extrinsic parameters (pose in world coordinates):
# translation = [x, y, z] position in meters
# rotation = [roll, pitch, yaw] in degrees
Performance Optimization
------------------------
Memory Management
~~~~~~~~~~~~~~~~~
.. code-block:: python
# For long-running applications, manage resources carefully
vision = Vision(camera_configs=configs)
try:
while True:
# Capture and process images
rgb, depth = vision.capture_image()
obstacles, _ = vision.detect_obstacles(depth, rgb)
# Process obstacles...
# Clean up large arrays if needed
del rgb, depth
finally:
# Always release resources
vision.release()
Efficient Object Detection
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# Optimize detection parameters for performance
obstacles, orientations = vision.detect_obstacles(
depth_image=depth,
rgb_image=rgb,
depth_threshold=3.0, # Limit detection range
camera_index=0,
step=4 # Increase step size for speed (lower accuracy)
)
# For real-time applications, consider:
# - Reducing image resolution
# - Increasing step size
# - Limiting depth threshold
# - Processing every nth frame
Error Handling and Debugging
----------------------------
Robust Error Handling
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
try:
# Create vision system
vision = Vision(camera_configs=configs)
# Attempt image capture
rgb, depth = vision.capture_image(camera_index=0)
if rgb is None or depth is None:
print("❌ Failed to capture images")
raise RuntimeError("Image capture failed")
# Attempt object detection
obstacles, orientations = vision.detect_obstacles(depth, rgb)
if len(obstacles) == 0:
print("⚠️ No obstacles detected")
else:
print(f"✅ Detected {len(obstacles)} obstacles")
except RuntimeError as e:
print(f"❌ Vision system error: {e}")
except Exception as e:
print(f"❌ Unexpected error: {e}")
finally:
if 'vision' in locals():
vision.release()
Debugging Tips
~~~~~~~~~~~~~~
.. code-block:: python
# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)
# Create vision with detailed logging
vision = Vision(camera_configs=configs, logger_name="DebugVision")
# Check YOLO model status
if vision.yolo_model is None:
print("⚠️ YOLO model not loaded - object detection disabled")
else:
print("✅ YOLO model loaded successfully")
# Verify camera configuration
for idx, camera in vision.cameras.items():
print(f"📷 Camera {idx}: {camera['name']}")
print(f" Position: {camera['translation']}")
print(f" Rotation: {camera['rotation']}")
print(f" FOV: {camera['fov']}°")
Common Issues and Solutions
---------------------------
**Issue: No objects detected by YOLO**
.. code-block:: python
# Solutions:
# 1. Check if YOLO model loaded properly
if vision.yolo_model is None:
print("Install ultralytics: pip install ultralytics")
# 2. Verify image quality
rgb, depth = vision.capture_image()
if rgb.max() == 0:
print("Image is completely black - check lighting/camera")
# 3. Adjust detection confidence
# Lower confidence threshold in detect_obstacles()
**Issue: Poor stereo reconstruction**
.. code-block:: python
# Solutions:
# 1. Ensure proper camera calibration
# 2. Check baseline distance (should be 5-15% of working distance)
# 3. Verify image rectification quality
left_rect, right_rect = vision.rectify_stereo_images(left, right)
# Rectified images should be aligned horizontally
**Issue: Inaccurate 3D positions**
.. code-block:: python
# Solutions:
# 1. Calibrate intrinsic matrix precisely
# 2. Verify depth image scaling
# 3. Check coordinate frame conventions
# Debug depth values
print(f"Depth range: {depth.min():.3f} - {depth.max():.3f}")
print(f"Near/far planes: {camera['near']} - {camera['far']}")
Real-World Applications
-----------------------
Robot Navigation
~~~~~~~~~~~~~~~~
.. code-block:: python
from ManipulaPy.path_planning import TrajectoryPlanning
# Integrated obstacle detection for path planning
def safe_navigation():
# Detect current obstacles
rgb, depth = vision.capture_image()
obstacles, _ = vision.detect_obstacles(depth, rgb, depth_threshold=2.0)
# Update robot's environmental model
planner = TrajectoryPlanning(robot, urdf_file, dynamics, joint_limits)
# Plan collision-free trajectory
safe_trajectory = planner.joint_trajectory(
thetastart=current_position,
thetaend=target_position,
Tf=5.0,
N=100,
method=5
)
return safe_trajectory, obstacles
Pick and Place Operations
~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
def pick_and_place_with_vision():
# Detect objects in workspace
rgb, depth = vision.capture_image(camera_index=0) # Overhead camera
objects, orientations = vision.detect_obstacles(depth, rgb, depth_threshold=1.0)
if len(objects) == 0:
print("No objects found to pick")
return
# Select closest object
closest_idx = np.argmin([np.linalg.norm(obj) for obj in objects])
target_object = objects[closest_idx]
target_orientation = orientations[closest_idx]
print(f"🎯 Targeting object at: {target_object}")
print(f"🧭 Object orientation: {target_orientation:.1f}°")
# Plan approach trajectory
# ... (integrate with kinematics and planning)
Best Practices
--------------
1. **Camera Placement**
- Position cameras for optimal workspace coverage
- Avoid backlighting and reflective surfaces
- Ensure sufficient lighting for object detection
2. **Calibration**
- Use high-quality calibration patterns (checkerboards)
- Capture calibration images from multiple angles
- Verify calibration accuracy before deployment
3. **Performance**
- Choose appropriate image resolutions for your application
- Balance detection accuracy with processing speed
- Use temporal filtering for stable object tracking
4. **Robustness**
- Implement proper error handling for all vision operations
- Use multiple cameras for redundancy when possible
- Validate detection results before using in control loops
5. **Integration**
- Coordinate vision frame rates with control loop timing
- Transform coordinates to robot base frame consistently
- Use vision confidence scores in decision making
See Also
--------
- :doc:`../api/vision` - Complete Vision API reference
- :doc:`Perception` - Higher-level perception capabilities
- :doc:`../tutorials/index` - Vision and perception tutorials
- :doc:`Simulation` - PyBullet integration guide
.. raw:: html