‪Juan Ciro‬ - ‪Google Scholar‬

Get my own profile

Cited by

	All	Since 2019
Citations	214	214
h-index	5	5
i10-index	4	4

0

140

70

35

105

20212022202320245 21 135 50

Juan Ciro

Juan Ciro

Software engineer, Mlcommons

Verified email at unal.edu.co

machine learning nlp


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Dataperf: Benchmarks for data-centric ai development M Mazumder, C Banbury, X Yao, B Karlaš, W Gaviria Rojas, S Diamos, ... Advances in Neural Information Processing Systems 36, 2024	73	2024
The people's speech: A large-scale diverse english speech recognition dataset for commercial usage D Galvez, G Diamos, J Ciro, JF Cerón, K Achorn, A Gopi, D Kanter, M Lam, ... arXiv preprint arXiv:2111.09344, 2021	55	2021
Multilingual spoken words corpus M Mazumder, S Chitlangia, C Banbury, Y Kang, JM Ciro, K Achorn, ... Thirty-fifth Conference on Neural Information Processing Systems Datasets …, 2021	41	2021
Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora A Warstadt, A Mueller, L Choshen, E Wilcox, C Zhuang, J Ciro, ... Proceedings of the BabyLM Challenge at the 27th Conference on Computational …, 2023	35	2023
Dataperf: Benchmarks for data-centric ai development, 2022 M Mazumder, C Banbury, X Yao, B Karlaš, WG Rojas, S Diamos, ... URL https://arxiv. org/abs/2207.10062, 0	6
Adversarial nibbler: A data-centric challenge for improving the safety of text-to-image models A Parrish, HR Kirk, J Quaye, C Rastogi, M Bartolo, O Inel, J Ciro, ... arXiv preprint arXiv:2305.14384, 2023	3	2023
LSH methods for data deduplication in a Wikipedia artificial dataset J Ciro, D Galvez, T Schlippe, D Kanter arXiv preprint arXiv:2112.11478, 2021	1	2021
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models HR Kirk, A Whitefield, P Röttger, A Bean, K Margatina, J Ciro, R Mosquera, ... arXiv preprint arXiv:2404.16019, 2024		2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation J Quaye, A Parrish, O Inel, C Rastogi, HR Kirk, M Kahng, E van Liemt, ... arXiv preprint arXiv:2403.12075, 2024		2024
Speech Wikimedia: A 77 Language Multilingual Speech Dataset RM Gómez, J Eusse, J Ciro, D Galvez, R Hileman, K Bollacker, D Kanter arXiv preprint arXiv:2308.15710, 2023		2023
Speech Wikimedia: A 77 Language Multilingual Speech Dataset R Mosquera Gómez, J Eusse, J Ciro, D Galvez, R Hileman, K Bollacker, ... arXiv e-prints, arXiv: 2308.15710, 2023		2023
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning A Warstadt, A Mueller, L Choshen, E Wilcox, C Zhang, J Ciro, R Mosquera, ... BabyLM Challenge at the 27th Conference on Computational Natural Language …, 2023		2023

The system can't perform the operation now. Try again later.

Articles 1–12