R. Gnana Praveen

I am a post-doctoral researcher at the Computer Research Institute of Montreal (CRIM), working on audio-visual learning for person verification. I did PhD in artificial intelligence (focused on computer vision and affective computing) at LIVIA lab, ETS Montreal, Canada under the supervision of Prof. Eric Granger and Prof. Patrick Cardinal in 2023. In my thesis, I have worked on developing weakly supervised learning (multiple instance learning) models for facial expression recognition in videos and novel attention models for audio-visual fusion in dimensional emotion recognition.

Before my PhD, I had 5 years of industrial research experience in computer vision, working for giant companies as well as start-ups including Samsung Research India, Synechron India and upGradCampus India. I also had the privilege of working with Prof. R. Venkatesh Babu at Indian Institute of Science, Bangalore on crowd flow analysis in videos. I did my Masters at Indian Institute of Technology Guwahati under the supervision of Prof. Kannan Karthik in 2012.

I like to play rhythm instruments in my free time. I also prefer to read books and occasionally do blogging. Here is the collection of my musings.

Email  /  CV  /  Google Scholar  /  Twitter  /  ResearchGate  /  Github  /  LinkedIn

profile photo
News
  • New!! Apr 2024: Our work on "Recursive Joint Cross-modal attention" has been accepted at ABAW@CVPR 2024 Workshop
  • Mar 2024: Achieved 2nd place in the valence-arousal challenge of 6th ABAW@CVPR 2024 competition
  • Mar 2024: Our research article on joint cross-attention for audio-visual fusion has been featured in March issue of IEEE Biometrics Newsletter.
  • Mar 2024: One paper accepted at ICME 2024 (CORE A)!
  • Mar 2024: Two papers accepted at FG 2024!
  • Oct 2023: Paper accepted at NeurIPS 2023 3rd workshop on ENLSP
  • Sep 2023: Serving as a reviewer for WACV 2024
  • Jun 2023: Serving as a reviewer for ACM MM 2023
  • Jun 2023: Presented our work "Recurrent Joint Attention for Audio-Visual Fusion in Regression-based Emotion Recognition" at ICASSP 2023
  • May 2023: Successfully defended my Ph.D. Thesis titled "Deep Regression Models for Spatiotemporal Expression Recognition in Videos"
  • Mar 2023: Started working as a post-doctoral researcher at Computer Research Institute of Montreal (CRIM)"
Research

I'm interested in computer vision, affective computing, deep learning, and multimodal video understanding models. Most of my research revolves around video analytics, weakly supervised learning, facial behavior analysis, and audio-visual fusion. Selected publications of my work can be found below. Representative papers are highlighted.

Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition
R Gnana Praveen, Jahangir Alam
IEEE/CVF CVPRW, 2024  

In this work, we introduced recursive fusion of joint cross-attention across audio, visual and text modalities for multimodal dimensional emotion recognition. We participated in the valence-arousal challenge of 6th ABAW competition and achieved second place.

Cross-Attention is not always needed: Dynamic Cross-Attention for Audio-Visual Dimensional Emotion Recognition
R Gnana Praveen, Jahangir Alam
IEEE ICME, 2024   (Oral Presentation)

In this work, we addressed the problem of weak complementary relationships across audio and visual modalities due to sarcastic or conflicting emotions. We address the limitations of cross-attention in handling weak complementary relationships by introducing a novel framework of Dynamic Cross-Attention.

Dynamic Cross Attention for Audio-Visual Person Verification
R Gnana Praveen, Jahangir Alam
IEEE FG, 2024  

In this work, we addressed the problem of weak complementary relationships for effective audio-visual fusion for person verification using dynamic cross-attention.

Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention
R Gnana Praveen, Jahangir Alam
IEEE FG, 2024  

Proposed a recursive joint cross-attention model for effective audio-visual fusion for person verification using recursive attention across audio and visual modalities in the videos.

Recursive Joint Cross-Attention for Audio-Visual Speaker Verification
R Gnana Praveen, Jahangir Alam
NeurIPS Workshop, 2023  

Proposed a recursive joint cross-attention model for effective audio-visual fusion for speaker verification using recursive attention across audio and visual modalities in the videos.

Recursive Joint Attention for Audio-Visual Fusion in Regression-based Emotion Recognition
R Gnana Praveen, Patrick Cardinal, Eric Granger
IEEE ICASSP, 2023   (Oral Presentation)
Code

Proposed a recursive joint cross-attention model for effective fusion of audio and visual modalities by focusing on leveraging the intra-modal relationships using LSTMs and inter-modal relationships using recursive attention across audio and visual modalities in the video.

Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
R Gnana Praveen, Patrick Cardinal, Eric Granger
IEEE Tran. on BIOM, 2023   (Best of FG2021)
Code

Investigated the prospect of leveraging both intra and inter-modal relationships using joint cross-attentional audio-visual fusion. The robustness of the proposed model is further validated for missing audio modality along with interpretability analysis.

A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
R Gnana Praveen, Wheidima Carneiro de Melo, Nasib Ullah, Haseeb Aslam, Osama Zeeshan, Théo Denorme, Marco Pedersoli, Alessandro L. Koerich, Simon Bacon, Patrick Cardinal, Eric Granger
IEEE CVPRW, 2022   (Oral Presentation)
Code

Proposed a joint cross-attention model for effective fusion of audio and visual modalities by focusing on leveraging the intra and inter-modal relationships across audio and visual modalities in the video.

Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition
R Gnana Praveen, Eric Granger, Patrick Cardinal
IEEE FG, 2021   (Full Oral Presentation)
Code

Proposed a cross-attentional model to leverage the intermodal characteristics across audio and visual modalities for effective audio-visual fusion.

Holistic Guidance for Occluded Person Re-Identification
Madhu Kiran, R Gnana Praveen, Le Thanh Nguyen-Meidine, Soufiane Belharbi, Louis-Antoine Blais-Morin, Eric Granger
BMVC, 2021   (Oral Presentation)
Code

Proposed a Holistic Guidance (HG) method that relies on holistic (or non-occluded) data and its distribution in dissimilarity space to train on occluded datasets without the need of any external source.

Deep domain adaptation with ordinal regression for pain assessment using weakly-labeled videos
R Gnana Praveen, Eric Granger, Patrick Cardinal
Image and Vision Computing , 2021 [Impact Factor: 4.7]
Code

Proposed a deep learning model for weakly-supervised Domain Adaptation with ordinal regression using coarse sequence level labels of videos. In particular, we have enforced ordinal relationship in the proposed model using gaussian distribution.

Weakly Supervised Learning for Facial Behavior Analysis: A Review
R Gnana Praveen, Eric Granger, Patrick Cardinal
IEEE Tran. on Affective Computing (Submitted), 2021

In this paper, we have presented a comprehensive taxonomy of weakly supervised learning models for facial behavior analysis along with its challenges and potential research directions.

Deep Weakly Supervised Domain Adaptation for Pain Localization in Videos
R Gnana Praveen, Eric Granger, Patrick Cardinal
IEEE FG, 2020
arXiv / Bibtex(Citation) / Teaser Video / Video Presentation / Slides / Poster

In this paper, we have proposed a novel framework of weakly supervised domain adaptation (WSDA) with limited sequence-level labels for pain localization in videos.

Super-pixel based crowd flow segmentation in H.264 compressed videos
Sovan Biswas, R Gnana Praveen, R. Venkatesh Babu
IEEE ICIP, 2014

In this paper, we have proposed a simple yet robust novel approach for the segmentation of high-density crowd flows based on super-pixels in H.264 compressed videos.


Source taken from here