“More power” – that’s what AI and especially computer vision is often associated with. Indeed, the algorithms need processing power which can result in a complex and costly infrastructure. However, recent advances in microcontroller architectures can make AI and especially computer vision a lot more accessible with costs per device as low as $20.
Computer vision systems can reduce operating costs by supporting personnel in standard or repetitive tasks such as monitoring machines or facilities. Most of the recent advances in deep learning for computer vision rely on expensive graphics cards, such as the NVIDIA Titan V ($3000). Moreover, most architectures are based on cloud infrastructure. This may be problematic due to latency concerns, complete lack of network infrastructure, or security concerns. These drawbacks can be addressed by employing cost-effective inference on the edge.
Recent advances in microcontroller architectures are enabling deep learning on smaller, cheaper, and lower-power devices, bringing down the cost to as low as $20. These devices form the basis of near real-time applications that can run for months or even years without maintenance. Examples of these applications include preventive maintenance and quality control in production lines, as well as surveillance for security and safety applications.
Microcontroller devices for deep learning inference
There are a currently a variety of options for microcontroller devices that are capable of running deep learning inference. These include full camera/microcontroller packages, such as the OpenMV camera, processor/co-processor products, such as the ARM Cortex-M55/Ethos-U55 combination, and other options in-between with varying selections of peripherals. Below is a table of interesting options and their relevant characteristics:
It is important to note that these options are all intended to run inference models, as opposed to training a model. This means training must still be done on more powerful hardware, but this does not need to purchased outright.
How to develop edge computer vision applications
Development of an edge computer vision application begins just like with traditional deep learning computer vision applications. There is the initial data collection phase, during which images are collected under the same conditions as they would be in the completed solution. The data is then cleaned and preprocessed before training a model appropriate to the inference device.
Once the trained model achieves expected results in the validation environment, it is prepared for deployment on the microcontroller through an optimization process to take advantage of the microcontroller’s features . For certain models, there are further optimizations that remove unnecessary overhead, making it even more efficient on small devices.
This optimized model can then be deployed to the same platform used for initial data capturing and energy consumption benchmarks can be performed. These platforms can be highly optimized for power consumption, depending on requirements, which opens the possibility of running them on battery power alone for extended periods of time without human intervention.
Hybrid edge/server architecture
In certain use cases the microcontroller devices may record new situations that they are unable to classify. This presents an opportunity to use a hybrid edge/server architecture commonly used in self-driving cars to collect/utilize new data, as described by Mohanbir Sawhney. Edge devices (microcontrollers) run inference models and collect new data, and a server retrains the models using the new data in order to expand the abilities of the edge devices. Having implemented such a pipeline in a couple of projects, we perceived that developing a proof-of-concept in this area is not overly complex.
Similar to traditional deep learning solutions
Developing such a solution with microcontrollers is an effective approach to reduce the solution’s computational power, energy footprint and hardware cost, without requiring significant new infrastructure, in contrast to current approaches with powerful hardware. The process remains similar to a traditional deep learning solution, with some extra optimizations to bring the application to a smaller device with a smaller footprint. A bonus is that these deployments do not require connectivity unless continuous data collection is necessary. For continuous data collection, it is also possible to create an unsupervised learning pipeline, similar to that described in the blogpost of my colleague Kevin Henrichs.
Davinder Chandhok is a Professional Data Scientist with a focus on computer vision. He has been involved in the development of computer vision solutions for industrial equipment since 2018 and recently began developing skills in cloud technologies to enhance data science projects with cloud connectivity. Davinder Chandhok holds a Bachelor of Engineering in Automation and Control Engineering.