Skip to main content
Back to blog

Artificial Neural Networks on the Edge

Artificial Neural Networks on the Edge

Introduction

As Artificial Intelligence becomes increasingly prevalent, Beta Solutions investigates how these advancements can create new capabilities for client projects. This article surveys AI fundamentals, explores Neural Networks as core building blocks, and examines implementing Neural Networks on Edge devices.

Background of Neural Networks

Neural network research dates back to the late 1940s, with "Artificial Intelligence" coined by John McCarthy, who co-founded MIT's AI Lab in 1959. Earlier limitations stemmed from insufficient processing power. Today, cloud computing arrays and GPU farms enable practical large-scale networks to train rapidly. Mobile devices now possess adequate processing capacity to run smaller neural networks for voice recognition and similar tasks.

Current AI remains in the "Narrow AI" era, solving specific problems like image identification or speech recognition rather than exhibiting general intelligence. A "Narrow AI" system requires substantial training data and performs only as effectively as that data allows.

How does Artificial Neural Network work?

Artificial Neural Networks emulate brain learning. A child learns to identify cats through repeated exposure and labeling, eventually becoming proficient at distinguishing cats from non-cats.

A simplified network involves data input (such as camera data) presented to multiple hidden weighted nodes, then processed through mathematical operations (multiplication and addition) to produce weighted output identifying objects of interest.

Artificial Neural Network

Convolutional Neural Networks (CNN) typically handle image or speech processing, enabling spatial recognition, determining where eyes relate to a nose, for example, significantly improving image recognition.

Convolutional Neural Network

Basic Training Steps

  1. Capture data
  2. Label data
  3. Train neural network
  4. Run neural network

Capture Label Train Run

For cat identification, one would need many labeled cat images and equally numerous non-cat images. An optimizing algorithm adjusts hidden node weights until outputs correctly classify images. Results quality depends entirely on training data quality.

A primary disadvantage: neural networks function as "black boxes." Users never understand internal operations. Data input becomes "magic" multiplication and addition, then output emerges. ANNs never achieve 100% accuracy; statistical outliers always exist, though usually acceptable performance suffices.

Artificial Neural Network Tools

High-level cloud-based APIs include Google Cloud AutoML, Microsoft Azure Computer Vision, and Amazon Rekognition. Users upload images, classify them, then call simple web APIs for identification. These typically employ CNNs with various optimization techniques, requiring minimal computer vision knowledge. However, these solutions lock users into specific providers incompatible with portable edge devices.

Custom ANN training uses primarily open-source tools: Google's TensorFlow and Berkeley's Caffe. TensorFlow offers easier Python interfaces with abundant tutorials and diverse use cases.

For embedded devices, CNN models convert to TensorFlow Lite, which quantizes models, converting floating-point (32-bit) weights to 8-bit integers. This enables operation on smaller embedded processors. Although quantization reduces accuracy roughly 3%, memory savings (~4x) and energy savings prove substantial. Pruning removes minimally-effective weights.

Artificial Neural Network Training

Training ANNs demands immense cloud processing and vast labeled image collections. Transfer learning, retraining already-trained CNNs on new specific images, reduces cloud computing requirements.

Two transfer learning approaches exist:

  • Fine-tuning pre-trained models proves susceptible to overfitting and requires substantial additional training images.
  • Feature-transfer exposes pre-trained CNNs to specialized training sets, extracting intermediate layer information capturing low-to-high level image features. This extracted information trains simpler machine learning systems like Support Vector Machines (SVMs). Feature transfer with SVMs works better for specialized tasks different from original tasks, requires smaller image sets, and proves computationally more efficient with less overfitting susceptibility.

Pre-trained networks originate from ImageNet challenges providing labeled images for CNN accuracy competitions. Winning networks demonstrate <10% error, matching human-level performance. Common networks include AlexNet, GoogleNet, VGGNet, and ResNet.

Embedded Processors

A look at the Edge

"The Edge" means processing data on embedded devices rather than in cloud servers. Advantages include reduced data transmission for cellular devices with cameras. Only relevant data reaches cloud servers rather than every image. Disadvantages include requiring sufficient embedded device computational power, particularly important for high-frame-rate video object identification.

Can ANNs operate on the Edge?

Short answer: Yes. Advancing embedded processing power, low-power processors, and efficient neural networks enable many ANNs to run on Edge devices.

Longer answer: It depends. Power restrictions, memory limitations, processing capacity, and overall cost require balancing when determining Edge ANN feasibility.

Embedded System on Chips (SoCs)

Running neural networks on embedded systems

Emerging SoCs capable of Edge ANN processing include:

  • Raspberry Pi
  • Movidius NCS
  • NCS2
  • Nvidia Jetson
  • Coral USB Accelerator (plug-in USB peripheral for Raspberry Pi compatibility)

Benchmarking shows these ANN accelerators approach high-performance PC performance. However, substantial power consumption makes battery-powered device implementation unlikely, and relative expense limits suitable applications.

The Raspberry Pi 4 offers the most affordable, accessible embedded machine learning entry point, usable independently with TensorFlow Lite or paired with Google's Coral USB Accelerator for premium performance. Idle current draw ranges between 410 to 600mA (~4 hours on AA batteries) with peak current 860 to 1430mA @ 5V (~3 hours on AA batteries).

Nvidia's Jetson device provides a commercial Raspberry Pi alternative. Google's Coral SoC achieves 4TOPS at 8-bit performance, consuming 2TOPS/Watt, requiring PCIe or USB 2.0 interfacing. Despite lower power usage, it still needs higher-end Linux embedded devices with substantial memory.

ARM Cortex-M4 processors offer small ANN approaches, permitting ~1µA sleep current (exceeding 10 years on AA batteries) with efficient µA/MHz power usage. Limitations include restricted FLASH and RAM, typically 512kb RAM maximum, 1Mb internal flash, constraining ANN size, though external SPI flash can store networks. This approach suits smaller ANNs or acceptable lower operation speeds. The ARM Cortex-M CMSIS-NN software framework provides up to 5x performance improvements. ST Micro's STM32Cube.AI framework simplifies low-power embedded device neural network operation, supporting various STM32 lineup processors.

New dedicated CNN SoCs like China's Kendryte K210 offer low-power approaches. This "low-end" MCU delivers impressive 1-Tera Operations per second (TOPS) with 8-bit quantized networks at approximately 0.3Watts. It runs small CNNs (~5-6MB) at QVGA@60fps or VGA@30fps, stores larger CNNs in external flash (at reduced speeds), and costs significantly less than most SoCs. The processor features two 64-bit RISC-V CPUs with independent floating-point units. Maximum quantized network size reaches 5-6MB using onboard memory. Typical power consumption stays below 1W with 200 to 300mA current draw.

FPGAs and ASICs accelerate Edge CNNs. FPGAs require network quantization, reducing accuracy roughly 3%. Most FPGA research employs small CNNs, achieving 15% speedup versus MCU solutions. FPGAs demand complex implementation with significant power supply and consumption requirements.

Numerous Edge AI modules exist. Consult GitHub repositories for edge-ai device collections or edge-ai-vision.com for vision-based solutions. Increasingly, processors incorporate CNN processing accelerators.

List of Embedded Vision AI ICs

Conclusion

Operating Artificial Neural Networks on Edge devices now proves feasible even with fairly low-power MCUs, unlocking previously impossible AI-based applications like image analysis and classification.

However, numerous trade-offs require consideration when developing "AI on the Edge" products. Power consumption, processing power limitations, and overall cost represent three critical factors.

Do you have a product concept that might benefit from embedded artificial intelligence? Contact Beta Solutions to explore possibilities.

Engineering
Let's talk

Got a project in mind?

Tell us what you're building. We'll give you an honest view on fit, scope, and timeline.

Get in touch