FPGA-based neural network accelerator outperforms GPUs

Xilinx Developer Forum: Claimed to be the highest performance convolutional neural network (CNN) on an fpga, Omnitek’s CNN is available now. The deep learning processing unit (DPU) is future-proofed, explained CEO Roger Fawcett, due to the programmability of the fpga.

It was demonstrated as a GoogLeNet Inception-v1 CNN, using eight-bit integer resolution. It achieved 16.8 terra operations per second (TOPS) and can inference over 5,300 images per second on a Xilinx Virtex UltraScale+ XCVU9P-3 fpga. The modular, scalable approach, makes it suitable for object detection and video processing applications at the edge and in the cloud, explained Fawcett, as well as for inference in data centres and intelligent cameras.

The DPU can be configured to provide optimal compute performance for neural network topologies in machine learning applications, using the parallel DSP architecture, distributed memory and reconfigurability of logic and connectivity for different algorithms.

The DPU achieves over 50% higher performance than any competing CNNs and out-performs GPUs for a given power or cost budget, claims the company. “The fpga is a world-beating platform and architecture, which is very flexible for future-proofing and can outperform GPUs in AI, with lower latency,” added Fawcett.

The company has also announced it is sponsoring a DPhil (PhD0 at Oxford University to study techniques for implementing deep learning acceleration on fpgas. The work will be in collaboration with Omnitek’s own research into AI compute engines and algorithms.