HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts G Do, K Le, Q Pham, T Nguyen, TN Doan, BT Nguyen, C Liu, ... arXiv preprint arXiv:2312.07035, 2023 | 6 | 2023 |
CompeteSMoE--Effective Training of Sparse Mixture of Experts via Competition Q Pham, G Do, H Nguyen, TT Nguyen, C Liu, M Sartipi, BT Nguyen, ... arXiv preprint arXiv:2402.02526, 2024 | | 2024 |