Prabhu, K. et al. CHIMERA: a 0.92-TOPS, 2.2-TOPS/W edge AI accelerator with 2-Mbyte on-chip foundry resistive RAM for efficient training and inference. IEEE J. Solid-State Circuits 57, 1013–1026 (2022).
Jain, V. et al. TinyVers: A 0.8-17 TOPS/W, 1.7 μW-20 mW, tiny versatile system-on-chip with state-retentive eMRAM for machine learning inference at the extreme edge. In Proc. 2022 IEEE Symposium on VLSI Technology and Circuits 20–21 (IEEE, 2022).
Rossi, D. et al. 4.4 A 1.3TOPS/W @ 32GOPS fully integrated 10-core SoC for IoT end-nodes with 1.7μW cognitive wake-up from MRAM-based state-retentive sleep mode. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 60–62 (IEEE, 2021).
Ueyoshi, K. et al. DIANA: an end-to-end energy-efficient digital and analog hybrid neural network SoC. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Yue, J. et al. A 28nm 16.9-300TOPS/W computing-in-memory processor supporting floating-point NN inference/training with intensive-CIM sparse-digital architecture. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2023).
Yue, J. et al. 15.2 A 2.75-to-75.9TOPS/W computing-in-memory NN processor supporting set-associate block-wise zero skipping and ping-pong CIM with simultaneous computation and weight updating. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 238–240 (IEEE, 2021).
Liu, S. et al. 16.2 A 28nm 53.8TOPS/W 8b sparse transformer accelerator with in-memory butterfly zero skipper for unstructured-pruned NN and CIM-based local-attention-reusable engine. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 250–252 (IEEE, 2023).
Tu, F. et al. 16.1 MuITCIM: a 28nm 2.24μJ/token attention-token-bit hybrid sparse digital CIM-based accelerator for multimodal transformers. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 248–250 (IEEE, 2023).
Huang, W.-H. et al. A nonvolatile Al-edge processor with 4MB SLC-MLC hybrid-mode ReRAM compute-in-memory macro and 51.4-251 TOPS/W. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 15–17 (IEEE, 2023).
Wen, T.-H. et al. A 28nm nonvolatile AI edge processor using 4Mb analog-based near-memory-compute ReRAM with 27.2 TOPS/W for tiny AI edge devices. In Proc. 2023 IEEE Symposium on VLSI Technology and Circuits 1–2 (IEEE, 2023).
Wen, T.-H. et al. Fusion of memristor and digital compute-in-memory processing for energy-efficient edge computing. Science 384, 325–332 (2024).
Lele, A. S. et al. A heterogeneous RRAM in-memory and SRAM near-memory SoC for fused frame and event-based target identification and tracking. IEEE J. Solid-State Circuits 59, 52–64 (2024).
Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).
Khwa, W.-S. et al. A 40-nm, 2M-cell, 8b-precision, hybrid SLC-MLC PCM computing-in-memory macro with 20.5-65.0TOPS/W for tiny-Al edge devices. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
Wu, P.-C. et al. A 22nm 832Kb hybrid-domain floating-point SRAM in-memory-compute macro with 16.2-70.2TFLOPS/W for high-accuracy AI-edge devices. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 126–128 (IEEE, 2023).
Guo, A. et al. A 28-nm 64-kb 31.6-TFLOPS/W digital-domain floating-point-computing-unit and double-bit 6T-SRAM computing-in-memory macro for floating-point CNNs. IEEE J. Solid-State Circuits 59, 3032–3044 (2024).
Sinangil, M. E. et al. A 7-nm compute-in-memory SRAM macro supporting multi-bit input, weight and output and achieving 351 TOPS/W and 372.4 GOPS. IEEE J. Solid-State Circuits 56, 188–198 (2021).
Mori, H. et al. A 4nm 6163-TOPS/W/b 4790−TOPS/mm2/b SRAM based digital-computing-in-memory macro supporting bit-width flexibility and simultaneous MAC and weight update. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 132–134 (IEEE, 2023).
Chih, Y.-D. et al. 16.4 An 89TOPS/W and 16.3TOPS/mm2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 252–254 (IEEE, 2021).
Fujiwara, H. et al. A 5-nm 254-TOPS/W 221-TOPS/mm2 fully-digital computing-in-memory macro supporting wide-range dynamic-voltage-frequency scaling and simultaneous MAC and write operations. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Lee, C.-F. et al. A 12nm 121-TOPS/W 41.6-TOPS/mm2 all digital full precision SRAM-based compute-in-memory with configurable bit-width for AI edge applications. In Proc. 2022 IEEE Symposium on VLSI Technology and Circuits 24–25 (IEEE, 2022).
Chiu, Y. C. et al. A CMOS-integrated spintronic compute-in-memory macro for secure AI edge devices. Nat. Electron. 6, 534–543 (2023).
Jung, S. et al. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 601, 211–216 (2022).
Wen, T.-H. et al. 34.8 A 22nm 16Mb floating-point ReRAM compute-in-memory macro with 31.2TFLOPS/W for AI edge devices. In Proc. 2024 IEEE International Solid-State Circuits Conference (ISSCC) 580–582 (IEEE, 2024).
Chang, M. et al. A 40nm 60.64TOPS/W ECC-capable compute-in-memory/digital 2.25MB/768KB RRAM/SRAM system with embedded cortex M3 microprocessor for edge recommendation systems. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
Hung, J. M. et al. A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices. Nat. Electron. 4, 921–930 (2021).
Hung, J.-M. 8-b precision 8-Mb ReRAM compute-in-memory macro using direct-current-free time-domain readout scheme for AI edge devices. IEEE J. Solid-State Circuits 58, 303–315 (2022).
Chiu, Y.-C. et al. A 22nm 4Mb STT-MRAM data-encrypted near-memory computation macro with a 192GB/s read-and-decryption bandwidth and 25.1-55.1TOPS/W 8b MAC for AI operations. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 178–180 (IEEE, 2022).
Spetalnick, S. D. et al. A 40nm 64kb 26.56TOPS/W 2.37Mb/mm2 RRAM binary/compute-in-memory macro with 4.23x improvement in density and >75% use of sensing dynamic range. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Yoon, J.-H. et al. A 40nm 100Kb 118.44TOPS/W ternary-weight compute-in-memory RRAM macro with voltage-sensing read and write verification for reliable multi-bit RRAM operation. In Proc. 2021 IEEE Custom Integrated Circuits Conference (CICC) 1–2 (IEEE, 2022).
Khwa, W. S. et al. MLC PCM techniques to improve nerual network inference retention time by 105X and reduce accuracy degradation by 10.8X. In Proc. 2021 Symposium on VLSI Technology 1–2 (IEEE, 2021).
Xue, C.-X. et al. 15.4 A 22nm 2Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices. In Proc. 2020 IEEE International Solid-State Circuits Conference (ISSCC) 244–246 (IEEE, 2020).
Xue, C.-X. et al. 24.1 A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors. In Proc. 2019 IEEE International Solid-State Circuits Conference (ISSCC) 388–390 (IEEE, 2019).
Xue, C.-X. et al. A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices. Nat. Electron. 4, 81–90 (2020).
Chen, W.-H. et al. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In Proc. 2018 IEEE International Solid-State Circuits Conference (ISSCC) 494–496 (IEEE, 2018).
Chen, W. H. et al. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors. Nat. Electron. 2, 420–428 (2019).
Mochida, R. et al. A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture. In Proc. 2018 IEEE Symposium on VLSI Technology 175–176 (IEEE, 2018).
Wan, W. et al. 33.1 A 74 TMACS/W CMOS-RRAM neurosynaptic core with dynamically reconfigurable dataflow and in-situ transposable weights for probabilistic graphical models. In Proc. 2020 IEEE International Solid-State Circuits Conference (ISSCC) 498–500 (IEEE, 2020).
Liu, Q. et al. 33.2 A fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing. In Proc. 2020 IEEE International Solid-State Circuits Conference (ISSCC) 500–502 (IEEE, 2020).
Cai, F. et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nat. Electron. 2, 290–299 (2019).
Li, C. et al. Analogue signal and image processing with large memristor crossbars. Nat. Electron. 1, 52–59 (2018).
Ielmini, D. & Wong, H. S. P. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).
Chou, C.-C. et al. A 22nm 96KX144 RRAM macro with a self-tracking reference and a low ripple charge pump to achieve a configurable read window and a wide operating voltage range. In Proc. 2020 IEEE Symposium on VLSI Circuits 1–2 (IEEE, 2020).
Boybat, I. et al. Neuromorphic computing with multi-memristive synapses. Nat. Commun. 9, 2514 (2018).
Le Gallo, M. et al. Mixed-precision in-memory computing. Nat. Electron. 1, 246–253 (2018).
Qu, Z., Zhou, Z., Cheng, Y. & Thiele, L. Adaptive loss-aware quantization for multi-bit networks. In Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 7985–7994 (Computer Vision Foundation, 2020).
Mishra, A. & Marr, D. Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In Proc. Sixth International Conference on Learning Representations (ICLR, 2018).
Chu, T. et al. Mixed-precision quantized neural networks with progressively decreasing bitwidth. Pattern Recognit. 111, 107647 (2021).
Lecun, Y. et al. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master’s thesis, Univ. Toronto (2009).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Houshmand, P. et al. Opportunities and limitations of emerging analog in-memory compute DNN architectures. In Proc. 2020 IEEE International Electron Devices Meeting (IEDM) 29.1.1–29.1.4 (IEEE, 2020).
Shamsoshoara, A. et al. The FLAME dataset: Aerial Imagery Pile burn detection using drones (UAVs). IEEE DataPort (2020).
Warden, P. Speech commands: a dataset for limited-vocabulary speech recognition. Preprint at https://arxiv.org/abs/1804.03209 (2018).
Chowdhery, A. et al. Visual wake words dataset. Preprint at https://arxiv.org/abs/1906.05721 (2019).
Markus, N. et al. A white paper on neural network quantization. Preprint at https://arxiv.org/abs/2106.08295 (2021).
Jacob, B. et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) 2704–2713 (Computer Vision Foundation, 2018).
Sun, S., Bai, J., Shi, Z., Zhao, W. & Kang, W. CIM2PQ: an arraywise and hardware-friendly mixed precision quantization method for analog computing-in-memory. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 43, 2084–2097 (2024).
Chen, Y.-W. et al. SUN: dynamic hybrid-precision SRAM-based CIM accelerator with high macro utilization using structured pruning mixed-precision networks. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 43, 2163–2176 (2024).
Agrawal, A. et al. A 7nm 4-core AI chip with 25.6TFLOPS hybrid FP8 training, 102.4TOPS INT4 inference and workload-aware throttling. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 144–145 (IEEE, 2021).
Wang, J. et al. A compute SRAM with bit-serial integer/floating-point operations for programmable in-memory vector acceleration. In Proc. 2019 IEEE International Solid-State Circuits Conference (ISSCC) 224–225 (IEEE, 2019).