Seunghyun Park (Integrated Ph.D. Student)

main 

Ph.D Candidate, AI Processor Accelerator
AI-Embedded System/Software on Chip (AI-SoC) Lab
School of Electronics Engineering, Kyungpook National University
Phone: +82 053 940 8648
E-mail: ijjh0435 [@] gmail[DOT] com
[Homepage] [Google Scholar] [SVN] [CV]

Repository Commit History

main 

Introduction

Brief Introduction

A holistic AI accelerator designer from low-level to high-level

Full Bio Sketch

Mr. Park received his B.S. degree in Electronics Engineering at Kyungpook National University, Daegu, Republic of Korea in 2023. He is currently an integrated Ph.D. student in School of Electronic and Electrical Engineering at Kyungpook National University, Daegu, Republic of Korea. His research interests include artificial intelligence (AI) accelerator design. He conducts research about low-power, small-area, high-speed accelerator architecture and design/verification methodology. Currently, he is developing a structure that allows AI operations to perform with high efficiency but low power on edge-devices like MCU, utilizing techniques such as tiling and improved off-chip communication.

His previous research primarily focused on studies related to DSP (Digital Signal Processing). In particular, he concentrated on research about acoustic signals, covering areas like active noise cancellation and 3D audio. Notably, he addressed the shortcomings of existing noise-canceling algorithms by utilizing artificial intelligence models for real-time noise processing and designed accelerators for binaural reproduction, resulting in the publication of several papers.

Research Topic

CNN Acceleration for LiDAR Signal Processing Systems

In field of artificial intelligence, Image is commonly expressed by matrix. Hardware process input matrix through filter(kernel) then output the matrix processed image. And images consist of three colors, R, G, B. So images are 3-dimentional tensor where convolutional neural networks process. For efficient image processing convolutional neural network acceleration processor are needed. To design convolution processor this research uses Verilog RTL simulation. Accelerating means processor could run immediately when data is come. So, in this research puts effort on control logic which calculates memory address. Starting with assembly language code for matrix multiplier, calculating memory address can be automated. With methods like automation memory address calculation and loop unrolling, we can achieve the goal which is high-performance processor implementation. Also, minimize the Les (Logic Elements) low-power, small-size design can be implemented. One of the minimalization method is MAC (Multiply And Accumulate). MAC is powerful because in one instruction multiply and add calculation is immediately performed by combinational logic circuit. MAC processor calculates better than non-MAC processor with about 60% reduced time.

CNN Accelerator for Noise Canceller

Convolutional neural networks (CNNs) are prevalent in image processing systems. However, there are not sufficient studies on acoustic systems. The primary research focuses on the acoustic system, a low-power hardware implementation of noise cancellation. However, conventional adaptive noise cancellation suffers slow convergence. Furthermore, existing CNNs have a bottleneck in memory and power. In the present work, we propose efficient acoustic noise cancellation architecture to accelerate processing speed and reduce power consumption. Our proposed architecture has an efficient data transfer technique using even-odd buffer and low-power CNNs noise cancellation algorithms. With our proposed architecture, the simulation result shows that the overall processing time was reduced by 20.3% and the power consumption was reduced by 6.1%compared to the single buffer

Tile-Connected AI Computation Optimization

Designing systems that transcend specific applications like ANC and can be applied to more general accelerator architectures is currently one of the most important topics in the chip design field. Due to resource constraints in artificial intelligence operations or high-performance computing, performance often becomes bound by I/O bandwidth or computing elements. Therefore, there is a proposal for structures that can be implemented at a low cost, not just by widening bandwidth or increasing the number of processing elements. Research is underway to create a more efficient data control path with a structure that is low-power but does not compromise on performance or accuracy using the Radix-4 Booth algorithm in a Bit-separable manner, or through a tightly coupled software/hardware structure for off-chip communication

Bit-Separable Multipliers and Dynmic Range Decoder in CNN Accelerators

The integration of AI into modern devices demands advancements in hardware to achieve high computational power and low latency. One of the most promising solutions to these challenges is the implementation of bit-separable radix-4 Booth multipliers (BSM) combined with dynamic range decoding (DRD) in CNN accelerators. This research focuses on developing and optimizing these technologies to enhance the performance and energy efficiency of AI applications. By structurally dividing the multiplier and selectively processing only the necessary bits, the BSM significantly improves computational speed and power efficiency. The innovative use of DRD allows for skipping redundant computations, thus maximizing the use of available memory bandwidth and hardware resources. Experiments with various CNN architectures, including MobileNet, have shown notable improvements, such as a 29% increase in processing speed and a 28% reduction in power consumption. This research demonstrates how hardware innovations can lead to substantial software performance enhancements, enabling efficient computation within existing AI frameworks. Our study delves into the detailed design and practical implementation of BSM and DRD within CNN basic blocks, showcasing their seamless integration and effectiveness across different AI models. The consistent pattern of zero activation in ReLU layers contributes significantly to power savings, making this approach particularly effective for edge devices with limited resources. Future research directions include optimizing BSM integration in sequential models and exploring variable DRD techniques to further enhance performance.

Chip Design

Tile-based CNN Accelerator (2023-05-01)

IEEE COOL Chips 2024, Power-Efficient CNN Accelerator Design with Bit-Separable Radix-4 Booth Multiplier

  • Commercial MCU design

  • DMA module design for Off-chip Interface

  • GCC compile environment setup for SW execution

  • Stimulus generator in C code

  • Custom UART driver

  • SPI PHY design and verification

  • Custom radix-4 Booth multiplier design

  • AXI-4 based FIFO and on-chip bus design

  • Linker script adjustment for CNN data alignment

  • PCB design with various IO peripherals

Full Custom Bit-Separable Radix-4 Booth Multiplier

IEIE 2024, Design of Bit-Separable Radix-4 Booth Multiplier

  • Full custom multiplier design with bit-separable mechanism

  • Experience with full custom design tools; Cadence Virtuoso, Synopsys Custom Compiler, Laker

  • Power consumption simulation with Synopsys PrimePower

  • Various primitive cell design (NAND, NOR, FF, INV, MUX…)

Full Custom SRAM PIM for Edge CNN Training

  • SRAM Cell design (using thin cell) from scratch

  • Spice simulation using digital vector file format

  • Standard cell design flow

  • 500nm fabrication process, maximum 75MHz clock speed

Publications

Journal Publications (KCI 1, SCI 3)

  • Seunghyun Park and Daejin Park. Lightweighted FPGA Implementation of Symmetric Buffer-based Active Noise Canceller with On-Chip Convoluation Acceleration Units (KCI) Journal of the Korea Institute of Information and Communication Engineering, 2022.

  • Seunghyun Park and Daejin Park. Low-Power FPGA Realization of Lightweight Active Noise Cancellation with CNN Noise Classification (SCI) Electronics, 12(11):2511-2526, 2023.

  • Seunghyun Park and Daejin Park. Low-Power Scalable TSPI: A Modular Off-Chip Network for Edge AI Accelerators (SCI) IEEE Access, 2024.

  • Seunghyun Park and Daejin Park. Bit-Separable Multiplier in CNN Accelerator: Analyzing Partial Results for Post-Optimization (SCI) (Under Revision) IEEE Micro, 2024.

Conference Publications (Intl. 5, Dom. 1)

  • Seunghyun Park and Daejin Park. Low-Power LiDAR Signal Processor with Point-of-Cloud Transformation Accelerator In IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW), 2022.

  • Seunghyun Park and Daejin Park. Lightweighted FPGA Implementation of Even/Odd-Buffered Active Noise Canceller with On-Chip Convoluation Acceleration Unit In IEEE ICEIC 2023, 2023.

  • Seunghyun Park, Dongkyu Lee, and Daejin Park. Tcl-based Simulation Platform for Light-weight ResNet Implementation In IEEE ISOCC 2023, 2023.

  • Seunghyun Park and Daejin Park. Integrated 3D Active Noise Cancellation Simulation and Synthesis Platform Using Tcl In IEEE International Conference on Embedded Multicore Manycore Systems-on-Chip (MCSoC 2023), 2023.

  • Seunghyun Park and Daejin Park. Power-Efficient CNN Accelerator Design with Bit-Separable Radix-4 Booth Multiplier In IEEE COOLChips 2024, 2024.

  • Seunghyun Park and Daejin Park. Design of Bit-Separable Radix-4 Booth Multiplier In IEIE Summer Conference 2024, 2024.

Participation in International Conference

  • IEEE ICCE-TW 2022, Taipei, Taiwan

  • IEEE A-SSCC 2022, Taipei, Taiwan

  • IEEE ASP-DAC 2023, Tokyo, Japan

  • IEEE ICEIC 2023, Singapore, Singapore

  • IEEE COOLChips 2023, Tokyo, Japan

  • IEEE ISOCC 2023, Jeju, Korea

  • IEEE MCSoC 2023, Singapre, Singapore

  • IEEE COOLChips 2024, Tokyo, Japan

  • IEEE EMSOFT 2024, Raleigh, USA

Last Updated, 2024.10.12