Perception API Reference#

This page documents ManipulaPy.perception, the module for higher-level perception capabilities including obstacle detection, 3D point cloud generation, clustering, and environmental understanding for robotic systems.

Tip

For conceptual explanations, see Perception User Guide.

β€”

Quick Navigation#

β€”

Perception Class#

class ManipulaPy.perception.Perception(vision_instance=None, logger_name='PerceptionLogger')[source]#

Bases: object

A higher-level perception module that uses a Vision instance to handle tasks like obstacle detection, 3D point cloud generation, and clustering.

Parameters:
  • vision_instance (Any)

  • logger_name (str)

vision#

A Vision instance for camera tasks (capturing images, stereo, etc.).

Type:

Vision

logger#

Logger for debugging and status messages.

Type:

logging.Logger

Higher-level perception module that integrates Vision capabilities with machine learning algorithms for comprehensive environmental understanding and obstacle analysis.

Constructor

__init__(vision_instance=None, logger_name='PerceptionLogger')[source]#

Initialize the Perception system with a Vision instance.

Parameters:
  • vision_instance (Vision, optional) – A Vision instance for camera tasks (monocular/stereo). Must be provided or else a ValueError is raised.

  • logger_name (str) – The name for this Perception logger.

Return type:

None

Parameters:

  • vision_instance (Vision) – Required Vision instance for camera operations and low-level processing

  • logger_name (str, optional) – Logger identifier (default: β€œPerceptionLogger”)

Validation:

  • Raises ValueError if vision_instance is None

  • Initializes logging with configurable debug levels

  • Establishes Vision module dependency relationship

β€”

Primary Perception Methods#

Obstacle Detection Pipeline#

Perception.detect_and_cluster_obstacles(camera_index=0, depth_threshold=5.0, step=2, eps=0.1, min_samples=3)[source]#

Capture an image from the Vision instance, detect 3D obstacle points, and then cluster those points using DBSCAN.

Parameters:
  • camera_index (int) – Index of the camera to capture from.

  • depth_threshold (float) – Maximum depth (in meters) to consider for obstacles.

  • step (int) – Step size for downsampling the depth image.

  • eps (float) – Maximum distance between two points to be considered neighbors in DBSCAN.

  • min_samples (int) – Minimum number of points required to form a cluster in DBSCAN.

Returns:

  • obstacle_points (np.ndarray of shape (N, 3)) – 3D points representing detected obstacles.

  • labels (np.ndarray) – Cluster labels for each point (with -1 for noise).

Return type:

Tuple[ndarray, ndarray]

Complete environmental analysis pipeline combining vision-based detection with machine learning clustering.

Parameters:

  • camera_index (int, optional) – Camera identifier for image capture (default: 0)

  • depth_threshold (float, optional) – Maximum depth for obstacle consideration in meters (default: 5.0)

  • step (int, optional) – Depth image downsampling stride (default: 2)

  • eps (float, optional) – DBSCAN neighbor distance threshold (default: 0.1)

  • min_samples (int, optional) – DBSCAN minimum cluster size (default: 3)

Returns:

  • obstacle_points (np.ndarray) – Shape (N, 3) array of 3D obstacle coordinates

  • labels (np.ndarray) – Shape (N,) cluster labels with -1 indicating noise

Processing Pipeline:

  1. Image Acquisition: vision.capture_image(camera_index)

  2. Depth Validation: Checks depth.ndim >= 2 and non-None status

  3. Obstacle Detection: vision.detect_obstacles() with YOLO integration

  4. Point Validation: Verifies non-None and non-empty point arrays

  5. Clustering Analysis: DBSCAN application to 3D point coordinates

  6. Result Logging: Debug information and cluster statistics

β€”

Stereo Vision Integration#

Disparity Computation#

Perception.compute_stereo_disparity(left_img, right_img)[source]#

Compute a stereo disparity map from two images.

Parameters:
  • left_img (np.ndarray) – Image from the left camera.

  • right_img (np.ndarray) – Image from the right camera.

Returns:

disparity – Computed disparity map.

Return type:

np.ndarray of type float32

Wrapper for stereo disparity computation with error handling and logging.

Parameters:

  • left_img (np.ndarray) – Left camera image array

  • right_img (np.ndarray) – Right camera image array

Returns:

  • disparity (np.ndarray) – Float32 disparity map

Dependencies:

  • Requires vision.stereo_enabled = True

  • Delegates to vision.rectify_stereo_images()

  • Delegates to vision.compute_disparity()

Point Cloud Generation#

Perception.get_stereo_point_cloud(left_img, right_img)[source]#

Generate a 3D point cloud from a stereo pair of images.

Parameters:
  • left_img (np.ndarray) – Image from the left camera.

  • right_img (np.ndarray) – Image from the right camera.

Returns:

point_cloud – The 3D point cloud in world coordinates.

Return type:

np.ndarray of shape (N, 3)

3D point cloud generation from stereo image pairs with comprehensive error handling.

Parameters:

  • left_img (np.ndarray) – Left camera image array

  • right_img (np.ndarray) – Right camera image array

Returns:

  • point_cloud (np.ndarray) – Shape (N, 3) 3D point coordinates

Error Handling:

  • Returns empty (0, 3) array if stereo not enabled

  • Logs point cloud size for debugging

  • Delegates processing to vision.get_stereo_point_cloud()

β€”

Machine Learning Components#

Clustering Analysis#

Perception.cluster_obstacles(points, eps=0.1, min_samples=3)[source]#

Cluster the 3D points using DBSCAN.

Parameters:
  • points (np.ndarray of shape (N, 3)) – The 3D points to cluster.

  • eps (float) – The maximum distance between two points for them to be considered neighbors.

  • min_samples (int) – The minimum number of points required to form a dense region.

Returns:

  • labels (np.ndarray of shape (N,)) – The cluster label for each point (-1 indicates noise).

  • num_clusters (int) – The number of clusters found (excluding noise).

Return type:

Tuple[ndarray, int]

DBSCAN-based clustering of 3D obstacle points for spatial grouping and noise filtering.

Parameters:

  • points (np.ndarray) – Shape (N, 3) array of 3D coordinates

  • eps (float, optional) – Maximum neighbor distance (default: 0.1)

  • min_samples (int, optional) – Minimum cluster density (default: 3)

Returns:

  • labels (np.ndarray) – Shape (N,) cluster assignments

  • num_clusters (int) – Count of identified clusters excluding noise

Raises:

ImportError – if scikit-learn is not installed. Install the optional ML extra with pip install "ManipulaPy[ml]".

Parameters:
Return type:

Tuple[ndarray, int]

Changed in version 1.3.2: import ManipulaPy.perception now succeeds without scikit-learn β€” only cluster_obstacles (and pipelines that call it) raise ImportError at call time.

DBSCAN Implementation:

  • Uses sklearn.cluster.DBSCAN with L2 distance metric

  • Noise points labeled as -1

  • Cluster counting excludes noise category

  • Comprehensive logging of cluster statistics

Algorithm Details:

  1. Input Validation: Empty array handling with early return

  2. Model Instantiation: DBSCAN(eps=eps, min_samples=min_samples)

  3. Fitting: dbscan_model.fit(points) on 3D coordinates

  4. Label Extraction: dbscan_model.labels_ array

  5. Statistics Computation: Unique label counting and noise analysis

β€”

Resource Management#

Cleanup Methods#

Perception.release()[source]#

Release resources held by the Vision instance.

Explicit resource cleanup for Vision instance dependencies.

Implementation:

  • Safely calls vision.release() with exception handling

  • Logs cleanup operations for debugging

  • Handles None vision instances gracefully

Return type:

None

Perception.__del__()[source]#

Destructor to ensure Vision resources are released.

Destructor ensuring automatic resource cleanup on object destruction.

Safety Features:

  • hasattr() validation before vision access

  • Silent exception handling to prevent destructor failures

  • Automatic cleanup without user intervention

Return type:

None

β€”

Logging Infrastructure#

Logger Configuration#

Perception._setup_logger(name)[source]#

Configure and return a logger for this Perception module.

Configures structured logging with consistent formatting across the perception pipeline.

Parameters:

  • name (str) – Logger identifier for message routing

Returns:

  • logger (logging.Logger) – Configured logger instance

Configuration Details:

  • Level: DEBUG for comprehensive information

  • Handler: StreamHandler for console output

  • Format: β€œ%(asctime)s - %(name)s - %(levelname)s - %(message)s”

  • Duplicate Prevention: Checks existing handlers before addition

Parameters:

name (str)

Return type:

Logger

β€”

Error Handling Hierarchy#

  1. Vision Instance Validation: Constructor-level checking

  2. Image Availability: Depth image validation with fallbacks

  3. Detection Results: None-checking with empty array returns

  4. Clustering Input: Empty point array handling

  5. Resource Cleanup: Exception-safe cleanup operations

β€”

Dependencies and Integration#

Required Dependencies#

Module

Purpose

Integration Point

sklearn.cluster.DBSCAN (optional, since v1.3.2)

Point clustering

cluster_obstacles() method β€” install via pip install "ManipulaPy[ml]"

numpy

Array operations

All coordinate manipulations

logging

Debug/status output

Logger configuration and usage

ManipulaPy.vision

Low-level vision

vision_instance dependency

Note

As of v1.3.2, import ManipulaPy.perception no longer requires scikit-learn β€” the import is guarded internally. Only Perception.cluster_obstacles() (and the Perception.detect_and_cluster_obstacles() pipeline that calls it) raise ImportError if the ml extra is missing.

Vision Module Interface#

Perception requires Vision instance with specific methods:

  • capture_image(): RGB/depth image acquisition

  • detect_obstacles(): YOLO-based object detection

  • stereo_enabled: Boolean stereo capability flag

  • rectify_stereo_images(): Stereo rectification

  • compute_disparity(): Disparity map computation

  • get_stereo_point_cloud(): 3D reconstruction

  • release(): Resource cleanup

β€”

Error Recovery and Robustness#

Vision Failure Handling#

  • Missing Depth: Empty array return with warning logs

  • Detection Failures: None-result handling with error logs

  • Stereo Unavailability: Runtime exception with clear messaging

Clustering Robustness#

  • Empty Input: Graceful handling with early return

  • Insufficient Points: DBSCAN handles minimum sample requirements

  • Parameter Validation: Implicit through sklearn parameter checking

Resource Management Safety#

  • Constructor Validation: Immediate failure for invalid Vision instances

  • Cleanup Exceptions: Silent handling in destructor context

  • Multiple Cleanup Calls: Safe repeated release() invocation

β€”

See Also#

External Dependencies#