Changing content displayed for a given camera input stream makes your app feel like magic. I wanted to try my hand at exactly this for my app that simulates the vision of a cat.
UPDATE for 2022: turns out my choice lined out below was deprecated rather quickly. The currently recommended package is google_ml_kit.
After surveying the situtation for a bit, I found a variety of packages that looked relevant
To me, the official ML Vision package for firebase seemed most promising because:
- Model is on device and easy to install
- Easy to integrate into an existing flutter app
- Option to switch to a custom model in the future using firebae_ml_custom
It didn’t take long until I ran into the first complication. When trying the example app for the firebase_ml_vision
image labeler, my test phone (a Pixel 3a) only ever produced the labels metal
and pattern
.
When working with ML models previously, I had similar issues when the input was garbled. For example when the image width was set too small and pixels from line n
leaked onto line n+1
. I didn’t have to search far until I found others with the same issue (opensource ftw!). According to the comment by benjastudio, this behavior was indeed caused by some phones adding additional padding to their image data for certain resolution settings.
I implemented the workaround described by benjastudio and boom: The detector started returning reasonable labels. But I didn’t want to keep employing workarounds: They are brittle and break in the future when the underlying assumptions change. Instead, I sent a PR to the owners of the repository. Review still pending!
For now, I’m using my patched version of the firebase_ml_vision
, detecting dogs reliably.