The Pile: An 800GB Dataset of Diverse Text for Language Modeling L Gao, S Biderman, S Black, L Golding, T Hoppe, C Foster, J Phang, H He, ... arXiv preprint arXiv:2101.00027, 2020 | 1629* | 2020 |
Bloom: A 176b-parameter open-access multilingual language model TL Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... Transactions on Machine Learning Research, 2022 | 1605 | 2022 |
Multitask prompted training enables zero-shot task generalization V Sanh, A Webson, C Raffel, SH Bach, L Sutawika, Z Alyafeai, A Chaffin, ... The Tenth International Conference on Learning Representations (ICLR), 2022 | 1604 | 2022 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... Transactions of Machine Learning Research (TMLR), 2022 | 1176* | 2022 |
The Language Model Evaluation Harness L Gao, J Tow, S Biderman, S Black, A DiPofi, C Foster, L Golding, J Hsu, ... GitHub Repository, 2021 | 780* | 2021 |
Pythia: A suite for analyzing large language models across training and scaling S Biderman, H Schoelkopf, Q Anthony, H Bradley, K O'Brien, E Hallahan, ... International conference on machine learning (ICML), 2023 | 739 | 2023 |
GPT-NeoX-20B: An Open-Source Autoregressive Language Model S Black, S Biderman, E Hallahan, Q Anthony, L Gao, L Golding, H He, ... ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022 | 722 | 2022 |
GPT-Neo: Large scale autoregressive language modeling with Mesh-TensorFlow S Black, L Gao, P Wang, C Leahy, S Biderman GitHub Repository, 2021 | 668* | 2021 |
Crosslingual generalization through multitask finetuning N Muennighoff, T Wang, L Sutawika, A Roberts, S Biderman, TL Scao, ... 61st Annual Meeting of the Association for Computational Linguistics, 2023 | 574 | 2023 |
VQGAN-CLIP: Open domain image generation and editing with natural language guidance K Crowson, S Biderman, D Kornis, D Stander, E Hallahan, L Castricato, ... European Conference on Computer Vision (ECCV), 2022 | 434* | 2022 |
RWKV: Reinventing RNNs for the Transformer Era B Peng, E Alcaide, Q Anthony, A Albalak, S Arcadinho, H Cao, X Cheng, ... Findings of the Association for Computational Linguistics: EMNLP, 2023 | 382* | 2023 |
Quality at a glance: An audit of web-crawled multilingual datasets J Kreutzer, I Caswell, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ... Transactions of the Association for Computational Linguistics 10, 50-72, 2022 | 265* | 2022 |
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization G Ahdritz, N Bouatta, C Floristean, S Kadyan, Q Xia, W Gerecke, ... Nature Methods, 1-11, 2024 | 177* | 2024 |
Llemma: An open language model for mathematics Z Azerbayev, H Schoelkopf, K Paster, MD Santos, S McAleer, AQ Jiang, ... International Conference on Learning Representations, 2023 | 168 | 2023 |
The bigscience roots corpus: A 1.6 tb composite multilingual dataset H Laurençon, L Saulnier, T Wang, C Akiki, A Villanova del Moral, ... Advances in Neural Information Processing Systems 35, 31809-31826, 2022 | 167 | 2022 |
trlX: A framework for large scale reinforcement learning from human feedback A Havrilla, M Zhuravinskyi, D Phung, A Tiwari, J Tow, S Biderman, ... Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023 | 134* | 2023 |
The Annotated Transformer S Rush, A Huang, S Subramanian, J Sum, K Almubarak, S Biderman Workshop for NLP open source software (NLP-OSS), 2022 | 115* | 2022 |
Emergent and predictable memorization in large language models S Biderman, US Prashanth, L Sutawika, H Schoelkopf, Q Anthony, ... Advances in Neural Information Processing Systems, 2023 | 111 | 2023 |
Eliciting latent predictions from transformers with the tuned lens N Belrose, Z Furman, L Smith, D Halawi, I Ostrovsky, L McKinney, ... arXiv preprint arXiv:2303.08112, 2023 | 101 | 2023 |
You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings Z Talat, A Névéol, S Biderman, M Clinciu, M Dey, S Longpre, S Luccioni, ... ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022 | 96 | 2022 |