Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021 | 776 | 2021 |
Ethical and social risks of harm from language models L Weidinger, J Mellor, M Rauh, C Griffin, J Uesato, PS Huang, M Cheng, ... arXiv preprint arXiv:2112.04359, 2021 | 633 | 2021 |
Artificial intelligence, values, and alignment I Gabriel Minds and machines 30 (3), 411-437, 2020 | 489 | 2020 |
Taxonomy of risks posed by language models L Weidinger, J Uesato, M Rauh, C Griffin, PS Huang, J Mellor, A Glaese, ... Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022 | 336 | 2022 |
Improving alignment of dialogue agents via targeted human judgements A Glaese, N McAleese, M Trębacz, J Aslanides, V Firoiu, T Ewalds, ... arXiv preprint arXiv:2209.14375, 2022 | 323 | 2022 |
Effective altruism and its critics I Gabriel Journal of Applied Philosophy 34 (4), 457-473, 2017 | 141 | 2017 |
Power to the people? Opportunities and challenges for participatory AI A Birhane, W Isaac, V Prabhakaran, M Diaz, MC Elish, I Gabriel, ... Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms …, 2022 | 124 | 2022 |
Alignment of language agents Z Kenton, T Everitt, L Weidinger, I Gabriel, V Mikulik, G Irving arXiv preprint arXiv:2103.14659, 2021 | 115 | 2021 |
Model evaluation for extreme risks T Shevlane, S Farquhar, B Garfinkel, M Phuong, J Whittlestone, J Leung, ... arXiv preprint arXiv:2305.15324, 2023 | 74 | 2023 |
In conversation with artificial intelligence: aligning language models with human values A Kasirzadeh, I Gabriel Philosophy & Technology 36 (2), 27, 2023 | 71 | 2023 |
Toward a theory of justice for artificial intelligence I Gabriel Daedalus 151 (2), 218-231, 2022 | 47 | 2022 |
Sociotechnical safety evaluation of generative ai systems L Weidinger, M Rauh, N Marchal, A Manzini, LA Hendricks, ... arXiv preprint arXiv:2310.11986, 2023 | 39 | 2023 |
The Challenge of Value Alignment I Gabriel, V Ghazavi The Oxford Handbook of Digital Ethics, 2022 | 37* | 2022 |
A human rights-based approach to responsible AI V Prabhakaran, M Mitchell, T Gebru, I Gabriel arXiv preprint arXiv:2210.02667, 2022 | 34* | 2022 |
Characteristics of harmful text: Towards rigorous benchmarking of language models M Rauh, J Mellor, J Uesato, PS Huang, J Welbl, L Weidinger, S Dathathri, ... Advances in Neural Information Processing Systems 35, 24720-24739, 2022 | 26 | 2022 |
Beyond privacy trade-offs with structured transparency A Trask, E Bluemke, B Garfinkel, CG Cuervas-Mons, A Dafoe arXiv preprint arXiv:2012.08347, 2020 | 24 | 2020 |
Using the Veil of Ignorance to align AI systems with principles of justice L Weidinger, KR McKee, R Everett, S Huang, TO Zhu, MJ Chadwick, ... Proceedings of the National Academy of Sciences 120 (18), e2213709120, 2023 | 21 | 2023 |
Permissible secrets H Lazenby, I Gabriel The Philosophical Quarterly 68 (271), 265-285, 2018 | 18* | 2018 |
Effective Altruism, Global Poverty, and Systemic Change I Gabriel, B McElwee Effective Altruism, 99-114, 2019 | 12 | 2019 |
Representation in ai evaluations AS Bergman, LA Hendricks, M Rauh, B Wu, W Agnew, M Kunesch, I Duan, ... Proceedings of the 2023 ACM Conference on Fairness, Accountability, and …, 2023 | 11 | 2023 |