Hardware Implementation of a Bfloat16 Exponential Function for Softmax Computation
R. Feiglewicz, A. Kos (AGH University of Krakow, Poland)
Transformer-based artificial intelligence models are increasingly deployed in mobile robotic systems. Many applications require computations to be performed locally under an edge computing paradigm, which necessitates the development of customized AI accelerators tailored for robotics to increase throughput, reduce latency, and minimize energy consumption. This paper presents a hardware-efficient implementation of the exponential function based on piecewise linear (PWL) approximation to accelerate softmax computation within the attention mechanism during transformer inference. The proposed design targets resource-constrained edge devices used in robotic platforms. The implementation was evaluated using the MMLU-Pro benchmark, comparing the proposed custom solution operating in bfloat16 (brain float 16) precision with a float32 reference implementation. The results demonstrate that the proposed approach achieves inference accuracy comparable to the float32 baseline while reducing computational complexity. Furthermore, the design was synthesized using high-level synthesis (HLS) to estimate FPGA resource utilization, providing insight into the feasibility and efficiency of the proposed accelerator for edge robotic applications.
Download one page abstract


