.. _user_guide_vision: Vision User Guide ================= This comprehensive guide covers the Vision module in ManipulaPy, which provides advanced computer vision capabilities for robotic perception, including stereo vision, object detection, and PyBullet integration. .. contents:: **Quick Navigation** :local: :depth: 2 Overview -------- The Vision module is a unified computer vision system that brings together: - **Monocular and stereo camera support** with flexible configuration - **YOLO-based object detection** for real-time obstacle identification - **PyBullet virtual cameras** with interactive debugging sliders - **Stereo vision pipeline** for 3D reconstruction and depth estimation - **Camera calibration utilities** for precise geometric measurements .. raw:: html
📷

Multi-Camera Support

Configure multiple cameras with individual intrinsics, extrinsics, and distortion parameters

🤖

YOLO Integration

Real-time object detection with YOLOv8 for robust obstacle identification

🎮

PyBullet Debug

Interactive virtual cameras with real-time parameter adjustment

👁️

Stereo Vision

Complete stereo pipeline from rectification to 3D point cloud generation

Getting Started --------------- Basic Camera Setup ~~~~~~~~~~~~~~~~~~ The simplest way to start with the Vision module: .. code-block:: python from ManipulaPy.vision import Vision import numpy as np # Create a basic vision system with default settings vision = Vision() # Capture an image (requires PyBullet environment) rgb_image, depth_image = vision.capture_image() print(f"📸 Captured RGB image: {rgb_image.shape}") print(f"📏 Captured depth image: {depth_image.shape}") Custom Camera Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~ For more control, configure your cameras explicitly: .. code-block:: python # Define camera parameters camera_config = { "name": "workspace_camera", "translation": [0.0, 0.0, 1.0], # 1 meter above workspace "rotation": [0, 45, 0], # Look down at 45 degrees "fov": 60, # Field of view in degrees "near": 0.1, # Near clipping plane "far": 10.0, # Far clipping plane "intrinsic_matrix": np.array([ [500, 0, 320], # fx, 0, cx [0, 500, 240], # 0, fy, cy [0, 0, 1] # 0, 0, 1 ], dtype=np.float32), "distortion_coeffs": np.zeros(5, dtype=np.float32), # k1,k2,p1,p2,k3 "use_opencv": False, # Use PyBullet cameras "device_index": 0 # Camera device index } # Create vision system with custom configuration vision = Vision(camera_configs=[camera_config]) Core Features ------------- Object Detection with YOLO ~~~~~~~~~~~~~~~~~~~~~~~~~~ The Vision module integrates YOLOv8 for robust object detection: .. code-block:: python # Capture images rgb_image, depth_image = vision.capture_image(camera_index=0) # Detect obstacles with 3D positioning obstacle_positions, orientations = vision.detect_obstacles( depth_image=depth_image, rgb_image=rgb_image, depth_threshold=5.0, # Only consider objects within 5 meters camera_index=0, step=2 # Depth sampling step for efficiency ) # Process detected obstacles print(f"🔍 Detected {len(obstacle_positions)} obstacles") for i, (pos, orientation) in enumerate(zip(obstacle_positions, orientations)): print(f"Obstacle {i+1}:") print(f" 📍 Position: [{pos[0]:.2f}, {pos[1]:.2f}, {pos[2]:.2f}] meters") print(f" 🧭 Orientation: {orientation:.1f} degrees") The object detection pipeline: 1. **YOLO Detection**: Identifies objects in RGB images with bounding boxes 2. **Depth Analysis**: Uses depth information within bounding boxes 3. **3D Positioning**: Converts 2D detections to 3D world coordinates 4. **Orientation Estimation**: Computes object orientation in the XY plane PyBullet Virtual Cameras ~~~~~~~~~~~~~~~~~~~~~~~~ For simulation and debugging, use PyBullet's virtual cameras: .. code-block:: python # Create an interactive debug camera system debug_vision = Vision( use_pybullet_debug=True, # Enable PyBullet debug sliders show_plot=True # Display camera feed in matplotlib ) # The debug interface provides real-time sliders for: # - Camera position (target_x, target_y, target_z) # - Camera orientation (yaw, pitch, roll) # - View parameters (distance, up axis) # - Projection settings (width, height, FOV, near/far planes) **Debug Interface Features:** - **Real-time parameter adjustment** via PyBullet GUI sliders - **Live camera feed** displayed in matplotlib window - **Matrix visualization** for view and projection matrices - **Interactive positioning** for optimal camera placement Stereo Vision Pipeline ~~~~~~~~~~~~~~~~~~~~~~ For 3D reconstruction, configure a stereo camera pair: .. code-block:: python # Configure left camera left_camera_config = { "name": "left_camera", "translation": [0.0, 0.0, 0.5], "rotation": [0, 0, 0], "intrinsic_matrix": np.array([ [600, 0, 320], [0, 600, 240], [0, 0, 1] ], dtype=np.float32), "distortion_coeffs": np.zeros(5, dtype=np.float32) } # Configure right camera (10cm baseline) right_camera_config = left_camera_config.copy() right_camera_config["name"] = "right_camera" right_camera_config["translation"] = [0.1, 0.0, 0.5] # 10cm to the right # Create stereo vision system stereo_vision = Vision(stereo_configs=(left_camera_config, right_camera_config)) # Compute rectification maps (do this once) stereo_vision.compute_stereo_rectification_maps(image_size=(640, 480)) # Capture stereo images left_image, _ = stereo_vision.capture_image(0) # Left camera right_image, _ = stereo_vision.capture_image(1) # Right camera # Process stereo pipeline left_rect, right_rect = stereo_vision.rectify_stereo_images(left_image, right_image) disparity_map = stereo_vision.compute_disparity(left_rect, right_rect) point_cloud = stereo_vision.disparity_to_pointcloud(disparity_map) print(f"🌐 Generated point cloud with {len(point_cloud)} 3D points") **Stereo Pipeline Steps:** 1. **Image Rectification**: Align stereo images for disparity computation 2. **Disparity Calculation**: Use StereoSGBM for robust disparity estimation 3. **3D Reconstruction**: Convert disparity to 3D points using camera geometry 4. **Point Cloud Filtering**: Remove invalid and distant points Advanced Usage -------------- Multiple Camera Systems ~~~~~~~~~~~~~~~~~~~~~~~ Configure and manage multiple cameras simultaneously: .. code-block:: python # Define multiple camera configurations camera_configs = [ { # Overview camera "name": "overview_camera", "translation": [0, 0, 2.0], "rotation": [0, 90, 0], # Look straight down "fov": 80, "intrinsic_matrix": np.array([[400, 0, 320], [0, 400, 240], [0, 0, 1]], dtype=np.float32), "distortion_coeffs": np.zeros(5, dtype=np.float32) }, { # Side view camera "name": "side_camera", "translation": [1.0, 0, 0.5], "rotation": [0, 0, 90], # Look sideways "fov": 60, "intrinsic_matrix": np.array([[500, 0, 320], [0, 500, 240], [0, 0, 1]], dtype=np.float32), "distortion_coeffs": np.zeros(5, dtype=np.float32) } ] # Create multi-camera vision system multi_vision = Vision(camera_configs=camera_configs) # Capture from different cameras overview_rgb, overview_depth = multi_vision.capture_image(camera_index=0) side_rgb, side_depth = multi_vision.capture_image(camera_index=1) # Detect obstacles from multiple viewpoints obstacles_overview, _ = multi_vision.detect_obstacles(overview_depth, overview_rgb, camera_index=0) obstacles_side, _ = multi_vision.detect_obstacles(side_depth, side_rgb, camera_index=1) print(f"📷 Overview camera detected {len(obstacles_overview)} obstacles") print(f"📷 Side camera detected {len(obstacles_side)} obstacles") OpenCV Camera Integration ~~~~~~~~~~~~~~~~~~~~~~~~~ Use real hardware cameras with OpenCV: .. code-block:: python # Configure real camera with OpenCV real_camera_config = { "name": "usb_camera", "translation": [0, 0, 0], "rotation": [0, 0, 0], "fov": 60, "intrinsic_matrix": np.array([ [800, 0, 320], # Values from camera calibration [0, 800, 240], [0, 0, 1] ], dtype=np.float32), "distortion_coeffs": np.array([-0.1, 0.05, 0, 0, 0], dtype=np.float32), # From calibration "use_opencv": True, # Enable OpenCV capture "device_index": 0 # USB camera device ID } # Create vision system with real camera real_vision = Vision(camera_configs=[real_camera_config]) # Note: capture_image() will use OpenCV for image acquisition # when use_opencv=True Camera Calibration Parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Understanding the camera configuration parameters: .. code-block:: python # Intrinsic matrix format: # [fx 0 cx] # [0 fy cy] # [0 0 1] # # Where: # fx, fy = focal lengths in pixels # cx, cy = principal point (image center) in pixels intrinsic_matrix = np.array([ [focal_x, 0, center_x], [0, focal_y, center_y], [0, 0, 1] ], dtype=np.float32) # Distortion coefficients: [k1, k2, p1, p2, k3] # k1, k2, k3 = radial distortion coefficients # p1, p2 = tangential distortion coefficients distortion_coeffs = np.array([k1, k2, p1, p2, k3], dtype=np.float32) # Extrinsic parameters (pose in world coordinates): # translation = [x, y, z] position in meters # rotation = [roll, pitch, yaw] in degrees Performance Optimization ------------------------ Memory Management ~~~~~~~~~~~~~~~~~ .. code-block:: python # For long-running applications, manage resources carefully vision = Vision(camera_configs=configs) try: while True: # Capture and process images rgb, depth = vision.capture_image() obstacles, _ = vision.detect_obstacles(depth, rgb) # Process obstacles... # Clean up large arrays if needed del rgb, depth finally: # Always release resources vision.release() Efficient Object Detection ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Optimize detection parameters for performance obstacles, orientations = vision.detect_obstacles( depth_image=depth, rgb_image=rgb, depth_threshold=3.0, # Limit detection range camera_index=0, step=4 # Increase step size for speed (lower accuracy) ) # For real-time applications, consider: # - Reducing image resolution # - Increasing step size # - Limiting depth threshold # - Processing every nth frame Error Handling and Debugging ---------------------------- Robust Error Handling ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python try: # Create vision system vision = Vision(camera_configs=configs) # Attempt image capture rgb, depth = vision.capture_image(camera_index=0) if rgb is None or depth is None: print("❌ Failed to capture images") raise RuntimeError("Image capture failed") # Attempt object detection obstacles, orientations = vision.detect_obstacles(depth, rgb) if len(obstacles) == 0: print("⚠️ No obstacles detected") else: print(f"✅ Detected {len(obstacles)} obstacles") except RuntimeError as e: print(f"❌ Vision system error: {e}") except Exception as e: print(f"❌ Unexpected error: {e}") finally: if 'vision' in locals(): vision.release() Debugging Tips ~~~~~~~~~~~~~~ .. code-block:: python # Enable debug logging import logging logging.basicConfig(level=logging.DEBUG) # Create vision with detailed logging vision = Vision(camera_configs=configs, logger_name="DebugVision") # Check YOLO model status if vision.yolo_model is None: print("⚠️ YOLO model not loaded - object detection disabled") else: print("✅ YOLO model loaded successfully") # Verify camera configuration for idx, camera in vision.cameras.items(): print(f"📷 Camera {idx}: {camera['name']}") print(f" Position: {camera['translation']}") print(f" Rotation: {camera['rotation']}") print(f" FOV: {camera['fov']}°") Common Issues and Solutions --------------------------- **Issue: No objects detected by YOLO** .. code-block:: python # Solutions: # 1. Check if YOLO model loaded properly if vision.yolo_model is None: print("Install ultralytics: pip install ultralytics") # 2. Verify image quality rgb, depth = vision.capture_image() if rgb.max() == 0: print("Image is completely black - check lighting/camera") # 3. Adjust detection confidence # Lower confidence threshold in detect_obstacles() **Issue: Poor stereo reconstruction** .. code-block:: python # Solutions: # 1. Ensure proper camera calibration # 2. Check baseline distance (should be 5-15% of working distance) # 3. Verify image rectification quality left_rect, right_rect = vision.rectify_stereo_images(left, right) # Rectified images should be aligned horizontally **Issue: Inaccurate 3D positions** .. code-block:: python # Solutions: # 1. Calibrate intrinsic matrix precisely # 2. Verify depth image scaling # 3. Check coordinate frame conventions # Debug depth values print(f"Depth range: {depth.min():.3f} - {depth.max():.3f}") print(f"Near/far planes: {camera['near']} - {camera['far']}") Real-World Applications ----------------------- Robot Navigation ~~~~~~~~~~~~~~~~ .. code-block:: python from ManipulaPy.path_planning import TrajectoryPlanning # Integrated obstacle detection for path planning def safe_navigation(): # Detect current obstacles rgb, depth = vision.capture_image() obstacles, _ = vision.detect_obstacles(depth, rgb, depth_threshold=2.0) # Update robot's environmental model planner = TrajectoryPlanning(robot, urdf_file, dynamics, joint_limits) # Plan collision-free trajectory safe_trajectory = planner.joint_trajectory( thetastart=current_position, thetaend=target_position, Tf=5.0, N=100, method=5 ) return safe_trajectory, obstacles Pick and Place Operations ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python def pick_and_place_with_vision(): # Detect objects in workspace rgb, depth = vision.capture_image(camera_index=0) # Overhead camera objects, orientations = vision.detect_obstacles(depth, rgb, depth_threshold=1.0) if len(objects) == 0: print("No objects found to pick") return # Select closest object closest_idx = np.argmin([np.linalg.norm(obj) for obj in objects]) target_object = objects[closest_idx] target_orientation = orientations[closest_idx] print(f"🎯 Targeting object at: {target_object}") print(f"🧭 Object orientation: {target_orientation:.1f}°") # Plan approach trajectory # ... (integrate with kinematics and planning) Best Practices -------------- 1. **Camera Placement** - Position cameras for optimal workspace coverage - Avoid backlighting and reflective surfaces - Ensure sufficient lighting for object detection 2. **Calibration** - Use high-quality calibration patterns (checkerboards) - Capture calibration images from multiple angles - Verify calibration accuracy before deployment 3. **Performance** - Choose appropriate image resolutions for your application - Balance detection accuracy with processing speed - Use temporal filtering for stable object tracking 4. **Robustness** - Implement proper error handling for all vision operations - Use multiple cameras for redundancy when possible - Validate detection results before using in control loops 5. **Integration** - Coordinate vision frame rates with control loop timing - Transform coordinates to robot base frame consistently - Use vision confidence scores in decision making See Also -------- - :doc:`../api/vision` - Complete Vision API reference - :doc:`Perception` - Higher-level perception capabilities - :doc:`../tutorials/index` - Vision and perception tutorials - :doc:`Simulation` - PyBullet integration guide .. raw:: html