Dataperf: Benchmarks for data-centric ai development M Mazumder, C Banbury, X Yao, B Karlaš, W Gaviria Rojas, S Diamos, ... Advances in Neural Information Processing Systems 36, 2024 | 73 | 2024 |
The people's speech: A large-scale diverse english speech recognition dataset for commercial usage D Galvez, G Diamos, J Ciro, JF Cerón, K Achorn, A Gopi, D Kanter, M Lam, ... arXiv preprint arXiv:2111.09344, 2021 | 55 | 2021 |
Multilingual spoken words corpus M Mazumder, S Chitlangia, C Banbury, Y Kang, JM Ciro, K Achorn, ... Thirty-fifth Conference on Neural Information Processing Systems Datasets …, 2021 | 41 | 2021 |
Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora A Warstadt, A Mueller, L Choshen, E Wilcox, C Zhuang, J Ciro, ... Proceedings of the BabyLM Challenge at the 27th Conference on Computational …, 2023 | 35 | 2023 |
Dataperf: Benchmarks for data-centric ai development, 2022 M Mazumder, C Banbury, X Yao, B Karlaš, WG Rojas, S Diamos, ... URL https://arxiv. org/abs/2207.10062, 0 | 6 | |
Adversarial nibbler: A data-centric challenge for improving the safety of text-to-image models A Parrish, HR Kirk, J Quaye, C Rastogi, M Bartolo, O Inel, J Ciro, ... arXiv preprint arXiv:2305.14384, 2023 | 3 | 2023 |
LSH methods for data deduplication in a Wikipedia artificial dataset J Ciro, D Galvez, T Schlippe, D Kanter arXiv preprint arXiv:2112.11478, 2021 | 1 | 2021 |
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models HR Kirk, A Whitefield, P Röttger, A Bean, K Margatina, J Ciro, R Mosquera, ... arXiv preprint arXiv:2404.16019, 2024 | | 2024 |
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation J Quaye, A Parrish, O Inel, C Rastogi, HR Kirk, M Kahng, E van Liemt, ... arXiv preprint arXiv:2403.12075, 2024 | | 2024 |
Speech Wikimedia: A 77 Language Multilingual Speech Dataset RM Gómez, J Eusse, J Ciro, D Galvez, R Hileman, K Bollacker, D Kanter arXiv preprint arXiv:2308.15710, 2023 | | 2023 |
Speech Wikimedia: A 77 Language Multilingual Speech Dataset R Mosquera Gómez, J Eusse, J Ciro, D Galvez, R Hileman, K Bollacker, ... arXiv e-prints, arXiv: 2308.15710, 2023 | | 2023 |
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning A Warstadt, A Mueller, L Choshen, E Wilcox, C Zhang, J Ciro, R Mosquera, ... BabyLM Challenge at the 27th Conference on Computational Natural Language …, 2023 | | 2023 |