One of the greatest challenges faced by users who are visually impaired is identifying packaged foods, both in a grocery store and also in their kitchen cupboard at home. This is because many foods share the same packaging, such as boxes, tins, bottles and jars, and only differ in the text and imagery printed on the label. However, the ubiquity of smart mobile devices provides an opportunity to address such challenges using machine learning (ML).
In recent years, there have been significant improvements in the accuracy of on-device neural networks for various perception tasks. When coupled with the increased computing power in modern smartphones, it is now possible for many vision tasks to yield high performance while running entirely on a mobile device. The development of on-device models such as MnasNet and MobileNets (based on resource-aware architecture search) in combination with on-device indexing allows one to run a full computer vision system, such as labeled product recognition, entirely on-device, in real time.
Leveraging developments such as these, we recently released Lookout, an Android app that uses computer vision to make the physical world more accessible for users who are visually impaired. When the user aims their smartphone camera at the product, Lookout identifies it and speaks aloud the brand name and product size. To accomplish this, Lookout includes a supermarket product detection and recognition model with an on-device product index, along with MediaPipe object tracking and an optical character recognition model. The resulting architecture is efficient enough to run in real-time entirely on-device.
A completely on-device system has the benefit of being low latency and with no reliance on network connectivity. However, this means that for a product recognition system to be truly useful to the users, it must have a on-device database with good product coverage. These requirements drive the design of the datasets used by Lookout, which consist of two million popular products chosen dynamically according to the user’s geographic location.
To continue reading this article, click here.