Paul Röttger

Cited by

	All	Since 2019
Citations	607	607
h-index	10	10
i10-index	10	10

320

160

240

202120222023202418 118 319 149

Public access

View all

0 articles

1 article

available

not available

Based on funding mandates

Co-authors

Bertie VidgenOxford, TuringVerified email at rewire.online
Hannah Rose KirkUniversity of OxfordVerified email at oii.ox.ac.uk
Dirk HovyBocconi UniversityVerified email at unibocconi.it
Janet B. PierrehumbertProf. of Language Modelling, Univ. of Oxford Dept. of Engineering ScienceVerified email at oerc.ox.ac.uk
Helen MargettsProfessor of Society and the Internet, University of OxfordVerified email at oii.ox.ac.uk
Giuseppe AttanasioPostdoctoral Researcher, Instituto de TelecomunicaçõesVerified email at unibocconi.it
Debora NozzaAssistant Professor, Bocconi UniversityVerified email at unibocconi.it

Paul Röttger

Postdoctoral Researcher, Bocconi University

Verified email at unibocconi.it - Homepage

Natural Language Processing Large Language Models Online Harms AI Safety


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
HateCheck: Functional Tests for Hate Speech Detection Models P Röttger, B Vidgen, D Nguyen, Z Waseem, H Margetts, J Pierrehumbert ACL 2021 (Main), 2021	191	2021
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks P Röttger, B Vidgen, D Hovy, JB Pierrehumbert NAACL 2022 (Main), 2022	93	2022
SemEval-2023 Task 10: Explainable Detection of Online Sexism HR Kirk, W Yin, B Vidgen, P Röttger ACL 2023 (Main), 2023	73	2023
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media P Röttger, JB Pierrehumbert EMNLP 2021 (Findings), 2021	52	2021
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale NAACL 2022 (Main), 2021	44	2021
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback HR Kirk, B Vidgen, P Röttger, SA Hale arXiv preprint arXiv:2303.05453, 2023	42	2023
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models P Röttger, H Seelawi, D Nozza, Z Talat, B Vidgen NAACL 2022 (WOAH), 2022	29	2022
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy NAACL 2024 (Main), 2023	23	2023
Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions F Bianchi, M Suzgun, G Attanasio, P Röttger, D Jurafsky, T Hashimoto, ... ICLR 2024 (Poster), 2023	22	2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale EMNLP 2023 (Main), 2023	10	2023
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages P Röttger, D Nozza, F Bianchi, D Hovy EMNLP 2022 (Main), 2022	8	2022
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models X Wang, B Ma, C Hu, L Weber-Genzel, P Röttger, F Kreuter, D Hovy, ... arXiv preprint arXiv:2402.14499, 2024	4	2024
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics M Orlikowski, P Röttger, P Cimiano, D Hovy ACL 2023 (Main), 2023	4	2023
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy arXiv preprint arXiv:2402.16786, 2024	3	2024
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger arXiv preprint arXiv:2311.08370, 2023	3	2023
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models HR Kirk, B Vidgen, P Röttger, SA Hale NeurIPS 2023 (SoLaR Workshop), 2023	2	2023
The benefits, risks and bounds of personalizing the alignment of large language models to individuals HR Kirk, B Vidgen, P Röttger, SA Hale Nature Machine Intelligence, 1-10, 2024	1	2024
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ C Holtermann, P Röttger, T Dill, A Lauscher arXiv preprint arXiv:2403.03814, 2024	1	2024
Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore J Haber, B Vidgen, M Chapman, V Agarwal, RKW Lee, YK Yap, P Röttger ACL 2023 (Main), 2023	1	2023
Tracking abuse on Twitter against football players in the 2021–22 Premier League Season B Vidgen, YL Chung, P Johansson, HR Kirk, A Williams, SA Hale, ... Available at SSRN 4403913, 2022	1	2022

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors