The image stabilization technology that is now standard on most phone cameras uses small springs to hold the lens suspended in liquid. When someone speaks near the camera lens, it causes these springs to vibrate, which bends the light ever so slightly.
The rolling shutter technique that most phone cameras use to capture images also plays a role in Side Eye's ability to extract audio.
The rolling shutter means that the image is captured one row at a time, from top to bottom. This allows Side Eye to amplify the vibrations in the camera lens springs over time, which results in better audio quality.
The audio that Side Eye produces is muffled, but it can still be used to extract a lot of information.
For example, Kevin Fu, a professor of electrical and computer engineering and computer science at Northeastern University, has been able to train AI tool to identify specific words and phrases, even when they are spoken very quietly.
He has also been able to train Side Eye to identify the exact person who is speaking, although this is not as accurate yet.
It has a number of potential applications, both good and bad.
On the one hand, it could be used to create new forms of surveillance technology. For example, it could be used to monitor people's conversations in public places without their knowledge or consent.
On the other hand, the tool could also be used to gather digital evidence for use in court. For example, it could be used to prove that someone was at a crime scene even if they claim to have been elsewhere.