Follow
Arsha Nagrani
Arsha Nagrani
Research Scientist, Google
Verified email at google.com - Homepage
Title
Cited by
Cited by
Year
Voxceleb: a large-scale speaker identification dataset
A Nagrani, JS Chung, A Zisserman
arXiv preprint arXiv:1706.08612, 2017
26592017
Voxceleb2: Deep speaker recognition
JS Chung, A Nagrani, A Zisserman
arXiv preprint arXiv:1806.05622, 2018
25352018
Frozen in time: A joint video and image encoder for end-to-end retrieval
M Bain, A Nagrani, G Varol, A Zisserman
Proceedings of the IEEE/CVF international conference on computer vision …, 2021
9512021
Voxceleb: Large-scale speaker verification in the wild
A Nagrani, JS Chung, W Xie, A Zisserman
Computer Speech & Language 60, 101027, 2020
7202020
Attention bottlenecks for multimodal fusion
A Nagrani, S Yang, A Arnab, A Jansen, C Schmid, C Sun
Advances in neural information processing systems 34, 14200-14213, 2021
5692021
Use what you have: Video retrieval using representations from collaborative experts
Y Liu, S Albanie, A Nagrani, A Zisserman
arXiv preprint arXiv:1907.13487, 2019
4362019
Utterance-level aggregation for speaker recognition in the wild
W Xie, A Nagrani, JS Chung, A Zisserman
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
4152019
Epic-fusion: Audio-visual temporal binding for egocentric action recognition
E Kazakos, A Nagrani, A Zisserman, D Damen
Proceedings of the IEEE/CVF international conference on computer vision …, 2019
4042019
Emotion recognition in speech using cross-modal transfer in the wild
S Albanie, A Nagrani, A Vedaldi, A Zisserman
Proceedings of the 26th ACM international conference on Multimedia, 292-301, 2018
3202018
Seeing voices and hearing faces: Cross-modal biometric matching
A Nagrani, S Albanie, A Zisserman
Proceedings of the IEEE conference on computer vision and pattern …, 2018
2502018
Chimpanzee face recognition from videos in the wild using deep learning
D Schofield, A Nagrani, A Zisserman, M Hayashi, T Matsuzawa, D Biro, ...
Science advances 5 (9), eaaw0736, 2019
2002019
Localizing visual sounds the hard way
H Chen, W Xie, T Afouras, A Nagrani, A Vedaldi, A Zisserman
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021
1952021
End-to-end generative pretraining for multimodal video captioning
PH Seo, A Nagrani, A Arnab, C Schmid
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
1802022
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning
A Yang, A Nagrani, PH Seo, A Miech, J Pont-Tuset, I Laptev, J Sivic, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
1672023
Spot the conversation: speaker diarisation in the wild
JS Chung, J Huh, A Nagrani, T Afouras, A Zisserman
arXiv preprint arXiv:2007.01216, 2020
1652020
Learnable pins: Cross-modal embeddings for person identity
A Nagrani, S Albanie, A Zisserman
Proceedings of the European conference on computer vision (ECCV), 71-88, 2018
1622018
Cough against covid: Evidence of covid-19 signature in cough sounds
P Bagad, A Dalmia, J Doshi, A Nagrani, P Bhamare, A Mahale, S Rane, ...
arXiv preprint arXiv:2009.08790, 2020
1372020
Pali-x: On scaling up a multilingual vision and language model
X Chen, J Djolonga, P Padlewski, B Mustafa, S Changpinyo, J Wu, ...
arXiv preprint arXiv:2305.18565, 2023
1232023
Disentangled speech embeddings using cross-modal self-supervision
A Nagrani, JS Chung, S Albanie, A Zisserman
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
1062020
Condensed movies: Story based retrieval with contextual embeddings
M Bain, A Nagrani, A Brown, A Zisserman
Proceedings of the Asian Conference on Computer Vision, 2020
972020
The system can't perform the operation now. Try again later.
Articles 1–20