Publications
Collaborations always make the whole greater than the sum of its parts! I was lucky to learn from and write papers with the following 84 co-authors (ordered alphabetically by last name):
Mustafa Abbas, Mohamed A. Abd El Ghany, Tanmay Anand, Anupreetham, Pranavi Appana, Aman Arora, Charles Augustine, Valeria Bertacco, Vaughn Betz, Kwadwo Boateng, Aatman Borda, Prerna Budhkar, Yu Cao, Gregory Chen, Paul Chow, Seyed Alireza Damaghani, Reetuparna Das, Aravind Dasu, Mohamed Eldafrawy, Mohamed A. Elgammal, Hongxiang Fan, Evangelos Georganas, Barbara Georgey, Diana Groehringer, Vidushi Goyal, Brett Grady, Sergey Gribok, Karthik Gururaj, Mathew Hall, Alexander Heinecke, Salma Hesham, James C. Hoe, Suyeon Hur, Mohamed Ibrahim, Ravi Iyer, Ali Jafari, Lizy K. John, Sangram Kate, Kenneth B. Kent, Jangwoo Kim, Joonsung Kim, Phil Knag, Ram Krishnamurthy, Raghavan Kumar, Ajay Kuzhively, Dongup Kwon, Martin Langhammer, Wayne Luk, Rui Ma, Fatemehsadat Mahmoudi, Debbie Marr, Karan Mathur, Samidh Mehta, Jiuxi Meng, Amin Mohaghegh, Abinash Mohanty, Vedant Mohanty, Stephen More, Seongmin Na, Mishali Naik, Hiroki Nakahara, Xinyu Niu, Eriko Nurvitadhi, Nicolas Papernot, Bogdan Pasca, Pragnesh Patel, Abirami Prabhakaran, Zhiqiang Que, Aishwarya Rajen, Daniel Rauch, Jens Rettkowski, Jae-sun Seo, David Sheffield, Jaewoong Sim, Srivatsan Srinivasan, Huseyin Sumbul, Phil Tompson, Kuen Hung Tsoi, Xiaowei Wang, Sadegh Yazdanshenas, Jiecao Yu, Chenglong Zeng, Taikun Zhang, Zhipeng Zhao
2024
- A Software-Programmable Neural Processing Unit for Graph Neural Network Inference on FPGAsIn IEEE International Conference on Field-Programmable Logic and Applications (FPL), 2024
- Field-Programmable Gate Array Architecture for Deep Learning: Survey & Future DirectionsIn arXiv preprint arXiv:2404.10076, 2024
- High Throughput FPGA-Based Object Detection via Algorithm-Hardware Co-DesignIn ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2024
2023
- Into the Third Dimension: Architecture Exploration Tools for 3D Reconfigurable Acceleration DevicesIn IEEE International Conference on Field Programmable Technology (FPT), 2023
-
- A Whole New World: How to Architect Beyond-FPGA Reconfigurable Acceleration Devices?In IEEE International Conference on Field-Programmable Logic and Applications (FPL), 2023
- Placement Optimization for NoC-Enhanced FPGAsIn IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2023
- Koios 2.0: Open-Source Deep Learning Benchmarks for FPGA Architecture and CAD ResearchIn IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2023
- A Fast and Flexible FPGA-Based Accelerator for Natural Language Processing Neural NetworksIn ACM Transactions on Architecture and Code Optimization (TACO), 2023
2022
- Architecture and Application Co-Design for Beyond-FPGA Reconfigurable Acceleration DevicesIn IEEE Access, 2022
- FPGA-Based AI Smart NICs for Scalable Distributed AI Training SystemsIn IEEE Computer Architecture Letters (CAL), 2022
2021
- Recurrent Neural Networks with Column-Wise Matrix-Vector Multiplication on FPGAsIn IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2021
- Specializing for Efficiency: Customizing AI Inference Processors on FPGAsIn IEEE International Conference on Microelectronics (ICM), 2021
- Koios: A Deep Learning Benchmark Suite for FPGA Architecture and CAD ResearchIn IEEE International Conference on Field-Programmable Logic and Applications (FPL), 2021
- End-to-End FPGA-Based Object Detection using Pipelined CNN and Non-Maximum SuppressionIn IEEE International Conference on Field-Programmable Logic and Applications (FPL), 2021
- FPGA Architecture: Principles and ProgressionIn IEEE Circuits and Systems Magazine (CAS-M), 2021
- Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAsIn IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2021
2020
- Neighbors From Hell: Voltage Attacks Against Deep Learning Accelerators on Multi-Tenant FPGAsIn IEEE International Conference on Field-Programmable Technology (FPT), 2020
- Beyond Peak Performance: Comparing the Real Performance of AI-Optimized FPGAs and GPUsIn IEEE International Conference on Field-Programmable Technology (FPT), 2020
- FPGA Logic Block Architectures for Efficient Deep Learning InferenceIn ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2020
2019
- Scalable Low-Latency Persistent Neural Machine Translation on CPU Server with Multiple FPGAsIn IEEE International Conference on Field-Programmable Technology (FPT), 2019
- Why Compete When You Can Work Together: FPGA-ASIC Integration for Persistent RNNsIn IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2019
- Math Doesn’t Have to Be Hard: Logic Block Architectures to Enhance Low-Precision Multiply-Accumulate on FPGAsIn ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2019
2018
- You Cannot Improve What You Do Not Measure: FPGA vs. ASIC Efficiency Gaps for Convolutional Neural Network InferenceIn ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2018
- Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAsIn IEEE International Conference on Field Programmable Logic and Applications (FPL), 2018
2017
- Hardware Acceleration of Novel Chaos-Based Image Encryption for IoT ApplicationsIn IEEE International Conference on Microelectronics (ICM), 2017
- Build Fast, Trade Fast: FPGA-Based High-Frequency Trading Using High-Level SynthesisIn IEEE International Conference on Reconfigurable Computing and FPGAs (ReConFig), 2017
- HW/SW Co-Design of The HOG Algorithm on a Xilinx Zynq SoCIn Journal of Parallel and Distributed Computing (JPDC), 2017
2015
- Real-Time Pedestrian Detection on a Xilinx Zynq Using The HOG AlgorithmIn IEEE International Conference on Reconfigurable Computing and FPGAs (ReConFig), 2015