Follow
Noam Shazeer
Noam Shazeer
Character.ai
Verified email at character.ai
Title
Cited by
Cited by
Year
Attention is all you need
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ...
Advances in neural information processing systems 30, 2017
1268752017
Exploring the limits of transfer learning with a unified text-to-text transformer
C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ...
Journal of machine learning research 21 (140), 1-67, 2020
158772020
Palm: Scaling language modeling with pathways
A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ...
Journal of Machine Learning Research 24 (240), 1-113, 2023
37672023
Scheduled sampling for sequence prediction with recurrent neural networks
S Bengio, O Vinyals, N Jaitly, N Shazeer
Advances in neural information processing systems 28, 2015
22562015
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer
N Shazeer, A Mirhoseini, K Maziarz, A Davis, Q Le, G Hinton, J Dean
arXiv preprint arXiv:1701.06538, 2017
19182017
Image transformer
N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran
International conference on machine learning, 4055-4064, 2018
18652018
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity
W Fedus, B Zoph, N Shazeer
Journal of Machine Learning Research 23 (120), 1-39, 2022
14112022
Exploring the limits of language modeling
R Jozefowicz, O Vinyals, M Schuster, N Shazeer, Y Wu
arXiv preprint arXiv:1602.02410, 2016
13742016
Lamda: Language models for dialog applications
R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ...
arXiv preprint arXiv:2201.08239, 2022
11972022
Attention is all you need. arXiv 2017
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ...
arXiv preprint arXiv:1706.03762 3762, 2023
11012023
Generating wikipedia by summarizing long sequences
PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer
arXiv preprint arXiv:1801.10198, 2018
9302018
Adafactor: Adaptive learning rates with sublinear memory cost
N Shazeer, M Stern
International Conference on Machine Learning, 4596-4604, 2018
8242018
End-to-end text-dependent speaker verification
G Heigold, I Moreno, S Bengio, N Shazeer
2016 IEEE International Conference on Acoustics, Speech and Signal …, 2016
7642016
Gshard: Scaling giant models with conditional computation and automatic sharding
D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ...
arXiv preprint arXiv:2006.16668, 2020
7502020
Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017
V Ashish, S Noam, P Niki, U Jakob, J Llion
Attention is all you need. In Advances in neural information processing …, 2017
7322017
How much knowledge can you pack into the parameters of a language model?
A Roberts, C Raffel, N Shazeer
arXiv preprint arXiv:2002.08910, 2020
7222020
Tensor2tensor for neural machine translation
A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ...
arXiv preprint arXiv:1803.07416, 2018
6212018
Attention is all you need (2017)
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ...
arXiv preprint arXiv:1706.03762, 2019
5392019
Glu variants improve transformer
N Shazeer
arXiv preprint arXiv:2002.05202, 2020
4082020
Serving content-relevant advertisements with client-side device support
D Anderson, P Buchheit, JA Dean, GR Harik, CL Gonsalves, N Shazeer, ...
US Patent 8,086,559, 2011
4052011
The system can't perform the operation now. Try again later.
Articles 1–20