We introduce TASTE-Rob: 1) a dataset with 100,856 task-oriented hand-object interaction videos, 2) a three-stage pose-refinement video generation pipeline. With the above contributions, TASTE-Rob is ...
Abstract: Nowadays, high-resolution remote sensing images provide rich data sources and deep learning models show powerful feature representation capability for remote sensing object detection.
Abstract: Object detection is a fundamental task in computer vision, involving the prediction of bounding boxes and class labels for Regions of Interest (ROI) within images. Traditionally, ...