Abstract:
Autonomous Driving’s most important module is pedestrian detection. Cameras are often used for this, but their quality suffers in low-light nighttime driving. A thermal camera’s image quality is unaffected in similar conditions. This paper proposes an end-to-end multimodal fusion model for pedestrian detection using RGB and thermal images. Its novel spatio-contextual deep network architecture efficiently exploits multimodal input. Two deformable ResNeXt-50 encoders extract features from the two modalities. A multimodal feature embedding module (MuFEm) with several groups of Graph Attention Networks and a feature fusion unit fuses these encoded features. Two CRFs spatially refine MuFEm’s last feature fusion unit’s output. Channel-wise attention and contextual information extraction with four RNNs in four directions improve features. A single-stage decoder generates the pedestrian bounding box and score map from these feature maps. We extensively tested the proposed framework on three publicly available multimodal pedestrian detection benchmark datasets: KAIST, CVC-14, and UTokyo. Each improved state-of-the-art performance. https://youtu.be/FDJdSifuuCs shows a brief overview of this work and its qualitative findings. The paper will release our source code.
Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.
Did you like this final year project?
To download this project Code with thesis report and project training... Click Here