Follow
Colin Leong
Title
Cited by
Cited by
Year
Bloom: A 176b-parameter open-access multilingual language model
T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ...
11312023
Quality at a glance: An audit of web-crawled multilingual datasets
J Kreutzer, I Caswell, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ...
Transactions of the Association for Computational Linguistics 10, 50-72, 2022
882022
Quality at a glance: An audit of web-crawled multilingual datasets
I Caswell, J Kreutzer, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ...
arXiv e-prints, arXiv: 2103.12028, 2021
322021
A few thousand translations go a long way! leveraging pre-trained models for african news translation
DI Adelani, JO Alabi, A Fan, J Kreutzer, X Shen, M Reid, D Ruiter, ...
arXiv preprint arXiv:2205.02022, 2022
312022
BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022. doi: 10.48550
T Le Scao, A Fan, C Akiki, E Pavlick, S Ilic, D Hesslow, R Castagné, ...
arXiv preprint arXiv.2211.05100, 0
19
Bloom library: Multimodal datasets in 300+ languages for a variety of downstream tasks
C Leong, J Nemecek, J Mansdorfer, A Filighera, A Owodunni, ...
arXiv preprint arXiv:2210.14712, 2022
122022
Documenting geographically and contextually diverse data sources: The bigscience catalogue of language data and resources
A McMillan-Major, Z Alyafeai, S Biderman, K Chen, F De Toni, G Dupont, ...
arXiv preprint arXiv:2201.10066, 2022
122022
Guyo Jarso, Oreen Yousuf, Andre Niyongabo Rubungo, Gilles Hacheme, Eric Peter Wairagala, Muhammad Umair Nasir
D Adelani, J Alabi, A Fan, J Kreutzer, X Shen, M Reid, D Ruiter, D Klakow, ...
112022
Bibletts: a large, high-fidelity, multilingual, and uniquely african speech corpus
J Meyer, DI Adelani, E Casanova, A Öktem, DWJ Weber, S Kabongo, ...
arXiv preprint arXiv:2207.03546, 2022
92022
Bloom: A 176b-parameter open-access multilingual language model
BS Workshop, TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, ...
arXiv preprint arXiv:2211.05100, 2022
22022
Phone-ing it in: Towards flexible multi-modal language model training by phonetic representations of data
C Leong, D Whitenack
Proceedings of the 60th Annual Meeting of the Association for Computational …, 2022
22022
JWSign: A Highly Multilingual Corpus of Bible Translations for more Diversity in Sign Language Processing
S Gueuwou, S Siake, C Leong, M Müller
arXiv preprint arXiv:2311.10174, 2023
12023
Adapting to the Low-Resource Double-Bind: Investigating Low-Compute Methods on Low-Resource African Languages
C Leong, H Shandilya, BFP Dossou, AL Tonja, J Mathew, AH Omotayo, ...
arXiv preprint arXiv:2303.16985, 2023
12023
Characterization of CNN classifier performance with respect to variation in optical contrast, using synthetic electro-optical data
C Menart, C Leong, O Mendoza-Schrock, E Zelnio
Automatic Target Recognition XXIX 10988, 143-153, 2019
12019
Enhancing Multi-Domain Automatic Short Answer Grading through an Explainable Neuro-Symbolic Pipeline
F Künnecke, A Filighera, C Leong, T Steuer
arXiv preprint arXiv:2403.01811, 2024
2024
The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages
V Akerman, D Baines, D Daspit, U Hermjakob, T Jang, C Leong, M Martin, ...
arXiv preprint arXiv:2304.09919, 2023
2023
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
D Ifeoluwa Adelani, J Oluwadara Alabi, A Fan, J Kreutzer, X Shen, M Reid, ...
arXiv e-prints, arXiv: 2205.02022, 2022
2022
The system can't perform the operation now. Try again later.
Articles 1–17