Abstract:
Local key action regions can improve CNN-based visual action recognition. Self-attention requires focusing on important details and ignoring others. Self-attention aids action recognition. Current self-attention methods ignore local feature vector correlations at spatial positions in CNN feature maps. We propose an effective interaction-aware self-attention model that can learn attention maps from feature vector interactions. We use a spatial pyramid for attention modeling because network layers capture feature maps at different scales. Attention scores are improved with multi-scale data. These attention scores weight feature map local feature vectors to calculate attentional feature maps. Since the spatial pyramid attention layer accepts any number of feature maps, we can easily make it spatio-temporal. Any CNN can embed our model to create a video-level end-to-end attention network for action recognition. RGB and flow streams are combined in various ways to predict human actions. Our method yields top results on UCF101, HMDB51, Kinetics-400, and untrimmed Charades.
Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.
Did you like this final year project?
To download this project Code with thesis report and project training... Click Here