The HBO show Silicon Valley released a real AI app that identifies hotdogs — and not hotdogs — like the one shown on season 4’s 4th episode (the app is now available on Android as well as iOS!)
To achieve this, we designed a bespoke neural architecture that runs directly on your phone, and trained it with Tensorflow, Keras & Nvidia GPUs.
While the use-case is farcical, the app is an approachable example of both deep learning, and edge computing. All AI work is powered 100% by the user’s device, and images are processed without ever leaving their phone. This provides users with a snappier experience (no round trip to the cloud), offline availability, and better privacy. This also allows us to run the app at a cost of $0, even under the load of a million users, providing significant savings compared to traditional cloud-based AI approaches.
The app was developed in-house by the show, by a single developer, running on a single laptop & attached GPU, using hand-curated data. In that respect, it may provide a sense of what can be achieved today, with a limited amount of time & resources, by non-technical companies, individual developers, and hobbyists alike. In that spirit, this article attempts to give a detailed overview of steps involved to help others build their own apps.
If you haven’t seen the show or tried the app (you should!), the app lets you snap a picture and then tells you whether it thinks that image is of a hotdog or not. It’s a straightforward use-case, that pays homage to recent AI research and applications, in particular ImageNet.
While we’ve probably dedicated more engineering resources to recognizing hotdogs than anyone else, the app still fails in horrible and/or subtle ways.