Microsoft's new AI tool can make pictures talk

| Samaa Web Desk Apr 22, 2024

Microsoft AI

Stay Connected, Stay Informed - Follow SamaaTV on Whatsapp for Real-time Updates!

The line between what is real and what is not becoming smaller and thinner thanks to the new AI tool from Microsoft.

The technology, known as VASA-1, turns a still photo of a person's face into an animated video of them singing or talking.

The software company claims that lip movements are 'exquisitely synchronized' with sounds to create the impression that the person has come to life.

For instance, "The Mona Lisa," a 16th-century masterwork by Leonardo da Vinci, begins to rhyme crudely in an American accent.

Microsoft is keeping the technology private while acknowledging that it may be "misused for impersonating humans."

VASA-1 captures a still image of a face, be it a photograph of a real person or a fictitious figure depicted in art or painting.

It then 'carefully' synchronizes this with speech 'from any individual' to bring the face to life.

Because it was trained on a database of facial expressions, the AI is even able to move the still image while the speech is being spoken in real-time.

Researchers from Microsoft characterize VASA as a 'framework for generating lifelike talking faces of virtual characters' in a blog post.

They say that "it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviours."

“Our method is capable of not only producing precious lip-audio synchronisation but also capturing a large spectrum of emotions and expressive facial nuances and natural head motions that contribute to the perception of realism and liveliness.”

The team believes that VASA-1 might allow digital AI avatars to "engage with us in ways that are as natural and intuitive as interactions with real humans" in terms of use cases.

Fraud is another possible concern, as individuals may be tricked by a phoney communication claiming to be from a person they trust on the internet.

"Seeing is most definitely not believing anymore," according to ESET security specialist Jake Moore.

“As this technology improves, it is a race against time to make sure everyone is fully aware of what is capable and that they should think twice before they accept correspondence as genuine,” he told MailOnline.

VASA-1 is "not intended to create content that is used to mislead or deceive," the Microsoft specialists stated, voicing their anticipation for any public complaints.

"However, like other related content generation techniques, it could still potentially be misused for impersonating humans," they add.

"We are interested in applying our technique to advance forgery detection, and we oppose any behaviour that creates misleading or harmful contents of real people."

“Currently, the videos generated by this method still contain identifiable artefacts, and the numerical analysis shows that there's still a gap to achieve the authenticity of real videos.”

Microsoft acknowledges that current techniques fall short of "achieving the authenticity of natural talking faces," but artificial intelligence (AI) is developing rapidly.

Microsoft Ai