SALSA: Systematic logic synthesis of approximate circuits S Venkataramani, A Sabne, V Kozhikkottu, K Roy, A Raghunathan Proceedings of the 49th Annual Design Automation Conference, 796-801, 2012 | 408 | 2012 |
A learned performance model for tensor processing units S Kaufman, P Phothilimthana, Y Zhou, C Mendis, S Roy, A Sabne, ... Proceedings of Machine Learning and Systems 3, 387-400, 2021 | 76 | 2021 |
High performance model based image reconstruction X Wang, A Sabne, S Kisner, A Raghunathan, C Bouman, S Midkiff ACM SIGPLAN Notices 51 (8), 1-12, 2016 | 70 | 2016 |
Fast distributed bandits for online recommendation systems K Mahadik, Q Wu, S Li, A Sabne Proceedings of the 34th ACM international conference on supercomputing, 1-13, 2020 | 67 | 2020 |
Xla: Compiling machine learning for peak performance A Sabne Google Res, 2020 | 61 | 2020 |
Pagoda: Fine-grained gpu resource virtualization for narrow tasks TT Yeh, A Sabne, P Sakdhnagool, R Eigenmann, TG Rogers ACM SIGPLAN Notices 52 (8), 221-234, 2017 | 59 | 2017 |
Model-based iterative CT image reconstruction on GPUs A Sabne, X Wang, SJ Kisner, CA Bouman, A Raghunathan, SP Midkiff ACM SIGPLAN Notices 52 (8), 207-220, 2017 | 47 | 2017 |
Overlap communication with dependent computation via decomposition in large deep learning models S Wang, J Wei, A Sabne, A Davis, B Ilbeyi, B Hechtman, D Chen, ... Proceedings of the 28th ACM International Conference on Architectural …, 2022 | 43 | 2022 |
Massively parallel 3D image reconstruction X Wang, A Sabne, P Sakdhnagool, SJ Kisner, CA Bouman, SP Midkiff Proceedings of the International Conference for High Performance Computing …, 2017 | 42 | 2017 |
Evaluating performance portability of OpenACC A Sabne, P Sakdhnagool, S Lee, JS Vetter Languages and Compilers for Parallel Computing: 27th International Workshop …, 2015 | 42 | 2015 |
Scaling large-data computations on multi-GPU accelerators A Sabne, P Sakdhnagool, R Eigenmann Proceedings of the 27th international ACM conference on International …, 2013 | 35 | 2013 |
Heterodoop: A mapreduce programming system for accelerator clusters A Sabne, P Sakdhnagool, R Eigenmann Proceedings of the 24th International Symposium on High-Performance Parallel …, 2015 | 28 | 2015 |
A flexible approach to autotuning multi-pass machine learning compilers PM Phothilimthana, A Sabne, N Sarda, KS Murthy, Y Zhou, ... 2021 30th International Conference on Parallel Architectures and Compilation …, 2021 | 27 | 2021 |
A generic low power scan chain wrapper for designs using scan compression A Sabne, R Tiwari, A Shrivastava, S Ravi, R Parekhji 2010 28th VLSI Test Symposium (VTS), 135-140, 2010 | 21 | 2010 |
Logic synthesis of approximate circuits S Venkataramani, VJ Kozhikkottu, A Sabne, K Roy, A Raghunathan IEEE Transactions on Computer-Aided Design of Integrated Circuits and …, 2019 | 20 | 2019 |
Confluence analysis and loop fast-forwarding for improving SIMD execution efficiency AJ Sabne, Y Lin, V Grover US Patent 9,612,811, 2017 | 20 | 2017 |
Understanding portability of a high-level programming model on contemporary heterogeneous architectures A Sabne, P Sakdhnagool, S Lee, JS Vetter IEEE Micro 35 (4), 48-58, 2015 | 16 | 2015 |
System and method for compiling or runtime executing a fork-join data parallel program with function calls on a single-instruction-multiple-thread processor Y Lin, G Chakrabarti, J Marathe, O Kwon, A Sabne US Patent 9,747,107, 2017 | 14 | 2017 |
Xla: Compiling machine learning for peak performance.(2020) A Sabne There is no corresponding record for this reference, 2020 | 13 | 2020 |
System and method for executing sequential code using a group of threads and single-instruction, multiple-thread processor incorporating the same G Chakrabarti, Y Lin, J Marathe, O Kwon, A Sabne US Patent 9,436,475, 2016 | 13 | 2016 |