Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1506 | 2023 |
Mixtral of experts AQ Jiang, A Sablayrolles, A Roux, A Mensch, B Savary, C Bamford, ... arXiv preprint arXiv:2401.04088, 2024 | 812 | 2024 |
Mistral 7B AQ Jiang, A Sablayrolles, A Mensch, C Bamford, DS Chaplot, D Casas, ... arXiv preprint arXiv:2310.06825, 2023 | 692 | 2023 |
Obelics: An open web-scale filtered dataset of interleaved image-text documents H Laurençon, L Saulnier, L Tronchon, S Bekman, A Singh, A Lozhkov, ... Advances in Neural Information Processing Systems 36, 2024 | 166 | 2024 |
The bigscience roots corpus: A 1.6 tb composite multilingual dataset H Laurençon, L Saulnier, T Wang, C Akiki, A Villanova del Moral, ... Advances in Neural Information Processing Systems 35, 31809-31826, 2022 | 153 | 2022 |
What language model to train if you have one million gpu hours? TL Scao, T Wang, D Hesslow, L Saulnier, S Bekman, MS Bari, ... arXiv preprint arXiv:2210.15424, 2022 | 96 | 2022 |
Distributed deep learning in open collaborations M Diskin, A Bukhtiyarov, M Ryabinin, L Saulnier, A Sinitsin, D Popov, ... Advances in Neural Information Processing Systems 34, 7879-7897, 2021 | 47 | 2021 |
Mistral 7B (2023) AQ Jiang, A Sablayrolles, A Mensch, C Bamford, DS Chaplot, ... arXiv preprint arXiv:2310.06825, 2023 | 43 | 2023 |
Mistral 7B F Bressand, G Lengyel, G Lample, L Saulnier arXiv preprint arXiv:2310.06825, 2023 | 38 | 2023 |
BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022. doi: 10.48550 T Le Scao, A Fan, C Akiki, E Pavlick, S Ilic, D Hesslow, R Castagné, ... arXiv preprint arXiv.2211.05100 10, 0 | 22 | |
Loubna Ben allal H Laurençon, L Saulnier, T Wang, C Akiki, AV del Moral, T Le Scao, ... | 14 | 2022 |
Training transformers together A Borzunov, M Ryabinin, T Dettmers, Q Lhoest, L Saulnier, M Diskin, ... NeurIPS 2021 Competitions and Demonstrations Track, 335-342, 2022 | 9 | 2022 |
Pixtral 12B P Agrawal, S Antoniak, EB Hanna, D Chaplot, J Chudnovsky, S Garg, ... arXiv preprint arXiv:2410.07073, 2024 | | 2024 |