Enabling efficient preemption for SIMT architectures with lightweight context switching Z Lin, L Nyland, H Zhou SC'16: Proceedings of the International Conference for High Performance …, 2016 | 47 | 2016 |
Accelerate GPU concurrent kernel execution by mitigating memory pipeline stalls H Dai, Z Lin, C Li, C Zhao, F Wang, N Zheng, H Zhou 2018 IEEE international symposium on high performance computer architecture …, 2018 | 46 | 2018 |
Automatic data placement into GPU on-chip memory resources C Li, Y Yang, Z Lin, H Zhou 2015 IEEE/ACM International Symposium on Code Generation and Optimization …, 2015 | 42 | 2015 |
Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems J Gu, M Zhu, Z Zhou, F Zhang, Z Lin, Q Zhang, M Breternitz Proceedings of 5th Asia-Pacific Workshop on Systems, 1-7, 2014 | 35 | 2014 |
In-place zero-space memory protection for cnn H Guan, L Ning, Z Lin, X Shen, H Zhou, SH Lim Advances in Neural Information Processing Systems 32, 2019 | 26 | 2019 |
Scatter-and-gather revisited: High-performance side-channel-resistant AES on GPUs Z Lin, U Mathur, H Zhou Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 2-11, 2019 | 14 | 2019 |
Selectively GPU cache bypassing for un-coalesced loads C Zhao, F Wang, Z Lin, H Zhou, N Zheng 2016 IEEE 22nd International Conference on Parallel and Distributed Systems …, 2016 | 14 | 2016 |
Coordinated CTA combination and bandwidth partitioning for GPU concurrent kernel execution Z Lin, H Dai, M Mantor, H Zhou ACM Transactions on Architecture and Code Optimization (TACO) 16 (3), 1-27, 2019 | 13 | 2019 |
Exploring memory persistency models for gpus Z Lin, M Alshboul, Y Solihin, H Zhou 2019 28th International Conference on Parallel Architectures and Compilation …, 2019 | 12 | 2019 |
GPU performance vs. thread-level parallelism: Scalability analysis and a novel way to improve TLP Z Lin, M Mantor, H Zhou ACM Transactions on Architecture and Code Optimization (TACO) 15 (1), 1-21, 2018 | 10 | 2018 |
GLES: A practical GPGPU optimizing compiler using data sharing and thread coarsening Z Lin, X Gao, H Wan, B Jiang Languages and Compilers for Parallel Computing: 27th International Workshop …, 2015 | 7 | 2015 |
The Demand for a Sound Baseline in GPU Memory Architecture Research H Dai, C Li, Z Lin, H Zhou Proceedings of the Workshop on Duplicating, Deconstructing and Debunking (WDDD), 2017 | 4 | 2017 |
Poster: Accelerate GPU concurrent kernel execution by mitigating memory pipeline stalls H Dai, Z Lin, C Li, C Zhao, F Wang, N Zheng, H Zhou 2017 26th International Conference on Parallel Architectures and Compilation …, 2017 | 3 | 2017 |