ZUMA: Training-Free Zero-Shot Unified Multimodal Anomaly Detection
Summary
ZUMA is a training-free framework for multimodal anomaly detection (MAD, identifying unusual patterns using both image and 3D data together) that works without needing labeled training examples, addressing privacy concerns. It uses CLIP (a model trained on images and text) and introduces cross-domain calibration (a technique that bridges differences between how CLIP was trained and how 3D point cloud data works) and dynamic semantic interaction (using natural language descriptions as reference points to spot anomalies) to detect defects in 2D images, 3D objects, or both together without requiring training.
Classification
Affected Vendors
Original source: http://ieeexplore.ieee.org/document/11367454
First tracked: May 8, 2026 at 08:01 PM
Classified by LLM (prompt v3) · confidence: 85%