MR2-ByteTrack: CNN and Transformer-based Video Object Detection for AI-augmented Embedded Vision Sensor Nodes
The paper introduces MR2-ByteTrack, a novel video object detection method designed for AI-augmented embedded vision sensor nodes. This approach addresses the limitations of conventional detection methods on ultra-low-power microcontrollers by alternating between full- and low-resolution inference. The results demonstrate significant energy savings and accuracy, making real-time detection feasible on MCU-class devices.
- ▪MR2-ByteTrack is designed for on-device video processing in smart vision sensors.
- ▪The method reduces computational costs by alternating between full- and low-resolution inference.
- ▪Experiments show that MR2-ByteTrack achieves mAP scores of up to 49.0 for CNN models and 48.7 for Transformer models.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computer Vision and Pattern Recognition arXiv:2605.15423 (cs) [Submitted on 14 May 2026] Title:MR2-ByteTrack: CNN and Transformer-based Video Object Detection for AI-augmented Embedded Vision Sensor Nodes Authors:Luca Bompani, Manuele Rusci, Luca Benini, Daniele Palossi, Francesco Conti View a PDF of the paper titled MR2-ByteTrack: CNN and Transformer-based Video Object Detection for AI-augmented Embedded Vision Sensor Nodes, by Luca Bompani and 4 other authors View PDF Abstract:Modern smart vision sensors need on-device intelligence to process video streams, as cloud computing is often impractical due to bandwidth, latency, and privacy constraints.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.