Perception API Reference

This page documents ManipulaPy.perception, the module for higher-level perception capabilities including obstacle detection, 3D point cloud generation, clustering, and environmental understanding for robotic systems.

Tip

For conceptual explanations, see Perception User Guide.

—

Perception Class

class ManipulaPy.perception.Perception(vision_instance=None, logger_name='PerceptionLogger')[source]

Bases: object

A higher-level perception module that uses a Vision instance to handle tasks like obstacle detection, 3D point cloud generation, and clustering.

vision

A Vision instance for camera tasks (capturing images, stereo, etc.).

Type:: Vision

logger

Logger for debugging and status messages.

Type:: logging.Logger

Higher-level perception module that integrates Vision capabilities with machine learning algorithms for comprehensive environmental understanding and obstacle analysis.

Constructor

__init__(vision_instance=None, logger_name='PerceptionLogger')[source]

Initialize the Perception system with a Vision instance.

Parameters:

vision_instance (Vision, optional) – A Vision instance for camera tasks (monocular/stereo). Must be provided or else a ValueError is raised.
logger_name (str) – The name for this Perception logger.

Parameters:

vision_instance (Vision) – Required Vision instance for camera operations and low-level processing
logger_name (str, optional) – Logger identifier (default: “PerceptionLogger”)

Validation:

Raises ValueError if vision_instance is None
Initializes logging with configurable debug levels
Establishes Vision module dependency relationship

detect_and_cluster_obstacles(camera_index=0, depth_threshold=5.0, step=2, eps=0.1, min_samples=3)[source]

Capture an image from the Vision instance, detect 3D obstacle points, and then cluster those points using DBSCAN.

Parameters:

camera_index (int) – Index of the camera to capture from.
depth_threshold (float) – Maximum depth (in meters) to consider for obstacles.
step (int) – Step size for downsampling the depth image.
eps (float) – Maximum distance between two points to be considered neighbors in DBSCAN.
min_samples (int) – Minimum number of points required to form a cluster in DBSCAN.

Returns:

obstacle_points (np.ndarray of shape (N, 3)) – 3D points representing detected obstacles.
labels (np.ndarray) – Cluster labels for each point (with -1 for noise).

compute_stereo_disparity(left_img, right_img)[source]

Compute a stereo disparity map from two images.

Parameters:

left_img (np.ndarray) – Image from the left camera.
right_img (np.ndarray) – Image from the right camera.

Returns:

disparity – Computed disparity map.

Return type:

np.ndarray of type float32

get_stereo_point_cloud(left_img, right_img)[source]

Generate a 3D point cloud from a stereo pair of images.

Parameters:

left_img (np.ndarray) – Image from the left camera.
right_img (np.ndarray) – Image from the right camera.

Returns:

point_cloud – The 3D point cloud in world coordinates.

Return type:

np.ndarray of shape (N, 3)

cluster_obstacles(points, eps=0.1, min_samples=3)[source]

Cluster the 3D points using DBSCAN.

Parameters:

points (np.ndarray of shape (N, 3)) – The 3D points to cluster.
eps (float) – The maximum distance between two points for them to be considered neighbors.
min_samples (int) – The minimum number of points required to form a dense region.

Returns:

labels (np.ndarray of shape (N,)) – The cluster label for each point (-1 indicates noise).
num_clusters (int) – The number of clusters found (excluding noise).

release()[source]: Release resources held by the Vision instance.

__del__()[source]: Destructor to ensure Vision resources are released.

—

Primary Perception Methods

Obstacle Detection Pipeline

Perception.detect_and_cluster_obstacles(camera_index=0, depth_threshold=5.0, step=2, eps=0.1, min_samples=3)[source]

Capture an image from the Vision instance, detect 3D obstacle points, and then cluster those points using DBSCAN.

Parameters:

camera_index (int) – Index of the camera to capture from.
depth_threshold (float) – Maximum depth (in meters) to consider for obstacles.
step (int) – Step size for downsampling the depth image.
eps (float) – Maximum distance between two points to be considered neighbors in DBSCAN.
min_samples (int) – Minimum number of points required to form a cluster in DBSCAN.

Returns:

obstacle_points (np.ndarray of shape (N, 3)) – 3D points representing detected obstacles.
labels (np.ndarray) – Cluster labels for each point (with -1 for noise).

Complete environmental analysis pipeline combining vision-based detection with machine learning clustering.

Parameters:

camera_index (int, optional) – Camera identifier for image capture (default: 0)
depth_threshold (float, optional) – Maximum depth for obstacle consideration in meters (default: 5.0)
step (int, optional) – Depth image downsampling stride (default: 2)
eps (float, optional) – DBSCAN neighbor distance threshold (default: 0.1)
min_samples (int, optional) – DBSCAN minimum cluster size (default: 3)

Returns:

obstacle_points (np.ndarray) – Shape (N, 3) array of 3D obstacle coordinates
labels (np.ndarray) – Shape (N,) cluster labels with -1 indicating noise

Processing Pipeline:

Image Acquisition: vision.capture_image(camera_index)
Depth Validation: Checks depth.ndim >= 2 and non-None status
Obstacle Detection: vision.detect_obstacles() with YOLO integration
Point Validation: Verifies non-None and non-empty point arrays
Clustering Analysis: DBSCAN application to 3D point coordinates
Result Logging: Debug information and cluster statistics

—

Stereo Vision Integration

Disparity Computation

Perception.compute_stereo_disparity(left_img, right_img)[source]

Compute a stereo disparity map from two images.

Parameters:

left_img (np.ndarray) – Image from the left camera.
right_img (np.ndarray) – Image from the right camera.

Returns:

disparity – Computed disparity map.

Return type:

np.ndarray of type float32

Wrapper for stereo disparity computation with error handling and logging.

Parameters:

left_img (np.ndarray) – Left camera image array
right_img (np.ndarray) – Right camera image array

Returns:

disparity (np.ndarray) – Float32 disparity map

Dependencies:

Requires vision.stereo_enabled = True
Delegates to vision.rectify_stereo_images()
Delegates to vision.compute_disparity()

Point Cloud Generation

Perception.get_stereo_point_cloud(left_img, right_img)[source]

Generate a 3D point cloud from a stereo pair of images.

Parameters:

left_img (np.ndarray) – Image from the left camera.
right_img (np.ndarray) – Image from the right camera.

Returns:

point_cloud – The 3D point cloud in world coordinates.

Return type:

np.ndarray of shape (N, 3)

3D point cloud generation from stereo image pairs with comprehensive error handling.

Parameters:

left_img (np.ndarray) – Left camera image array
right_img (np.ndarray) – Right camera image array

Returns:

point_cloud (np.ndarray) – Shape (N, 3) 3D point coordinates

Error Handling:

Returns empty (0, 3) array if stereo not enabled
Logs point cloud size for debugging
Delegates processing to vision.get_stereo_point_cloud()

—

Machine Learning Components

Clustering Analysis

Perception.cluster_obstacles(points, eps=0.1, min_samples=3)[source]

Cluster the 3D points using DBSCAN.

Parameters:

points (np.ndarray of shape (N, 3)) – The 3D points to cluster.
eps (float) – The maximum distance between two points for them to be considered neighbors.
min_samples (int) – The minimum number of points required to form a dense region.

Returns:

labels (np.ndarray of shape (N,)) – The cluster label for each point (-1 indicates noise).
num_clusters (int) – The number of clusters found (excluding noise).

DBSCAN-based clustering of 3D obstacle points for spatial grouping and noise filtering.

Parameters:

points (np.ndarray) – Shape (N, 3) array of 3D coordinates
eps (float, optional) – Maximum neighbor distance (default: 0.1)
min_samples (int, optional) – Minimum cluster density (default: 3)

Returns:

labels (np.ndarray) – Shape (N,) cluster assignments
num_clusters (int) – Count of identified clusters excluding noise

DBSCAN Implementation:

Uses sklearn.cluster.DBSCAN with L2 distance metric
Noise points labeled as -1
Cluster counting excludes noise category
Comprehensive logging of cluster statistics

Algorithm Details:

Input Validation: Empty array handling with early return
Model Instantiation: DBSCAN(eps=eps, min_samples=min_samples)
Fitting: dbscan_model.fit(points) on 3D coordinates
Label Extraction: dbscan_model.labels_ array
Statistics Computation: Unique label counting and noise analysis

—

Resource Management

Cleanup Methods

Perception.release()[source]

Release resources held by the Vision instance.

Explicit resource cleanup for Vision instance dependencies.

Implementation:

Safely calls vision.release() with exception handling
Logs cleanup operations for debugging
Handles None vision instances gracefully

Perception.__del__()[source]

Destructor to ensure Vision resources are released.

Destructor ensuring automatic resource cleanup on object destruction.

Safety Features:

hasattr() validation before vision access
Silent exception handling to prevent destructor failures
Automatic cleanup without user intervention

—

Logging Infrastructure

Logger Configuration

Perception._setup_logger(name)[source]

Configure and return a logger for this Perception module.

Configures structured logging with consistent formatting across the perception pipeline.

Parameters:

name (str) – Logger identifier for message routing

Returns:

logger (logging.Logger) – Configured logger instance

Configuration Details:

Level: DEBUG for comprehensive information
Handler: StreamHandler for console output
Format: “%(asctime)s - %(name)s - %(levelname)s - %(message)s”
Duplicate Prevention: Checks existing handlers before addition

—

Error Handling Hierarchy

Vision Instance Validation: Constructor-level checking
Image Availability: Depth image validation with fallbacks
Detection Results: None-checking with empty array returns
Clustering Input: Empty point array handling
Resource Cleanup: Exception-safe cleanup operations

—

Dependencies and Integration

Required Dependencies

Module	Purpose	Integration Point
`sklearn.cluster.DBSCAN`	Point clustering	cluster_obstacles() method
`numpy`	Array operations	All coordinate manipulations
`logging`	Debug/status output	Logger configuration and usage
`ManipulaPy.vision`	Low-level vision	vision_instance dependency

Vision Module Interface

Perception requires Vision instance with specific methods:

capture_image(): RGB/depth image acquisition
detect_obstacles(): YOLO-based object detection
stereo_enabled: Boolean stereo capability flag
rectify_stereo_images(): Stereo rectification
compute_disparity(): Disparity map computation
get_stereo_point_cloud(): 3D reconstruction
release(): Resource cleanup

—

Error Recovery and Robustness

Vision Failure Handling

Missing Depth: Empty array return with warning logs
Detection Failures: None-result handling with error logs
Stereo Unavailability: Runtime exception with clear messaging

Clustering Robustness

Empty Input: Graceful handling with early return
Insufficient Points: DBSCAN handles minimum sample requirements
Parameter Validation: Implicit through sklearn parameter checking

Resource Management Safety

Constructor Validation: Immediate failure for invalid Vision instances
Cleanup Exceptions: Silent handling in destructor context
Multiple Cleanup Calls: Safe repeated release() invocation

—

External Dependencies

scikit-learn – DBSCAN clustering algorithm
NumPy – Numerical array operations
Python logging – Structured logging framework

Perception API Reference

Quick Navigation

Perception Class

Primary Perception Methods

Obstacle Detection Pipeline

Stereo Vision Integration

Disparity Computation

Point Cloud Generation

Machine Learning Components

Clustering Analysis

Resource Management

Cleanup Methods

Logging Infrastructure

Logger Configuration

Error Handling Hierarchy

Dependencies and Integration

Required Dependencies

Vision Module Interface

Error Recovery and Robustness

Vision Failure Handling

Clustering Robustness

Resource Management Safety

See Also

External Dependencies