Deepfake for good: CannyAI develops VDR tech to engage audiences in any language

It all started with Netflix, says Jonathan Heimann, co-founder of Israeli VDR (video dialogue replacement) company CannyAI about the origins of the startup. Founded in 2017, CannyAI harnesses deep tech to localize videos into any language and any dialogue, eliminating a need for subtitles.

“When Netflix started streaming in Israel, they did some experiments with how localization would work best here,” Heimann tells NoCamels, explaining that he and his co-founder, Omer Ben-Ami, felt that watching shows that were obviously dubbed was an “awful experience.”

They were immediately moved to try to address the challenge and were surprised to find that no one else was doing so as well. The pair quickly learned that not being able to change the dialogue of footage after recording was a central problem, and the only solution would rely on emerging technologies. With their software development background from the IDF and Tel Aviv University, Heimann and Ben-Ami founded and bootstrapped CannyAI three years ago.

CannyAI offers VDR services to answer the increasing need for customized, personalized, and localized content, like adapting commercials to different languages or dialects. VDR can easily be confused with deepfakes, an open-source project for face swapping that is now used as an umbrella term for certain video and audio manipulations using artificial intelligence. But CannyAI says it is not the same. The difference is subtle—face swapping (like in deepfakes) can be used to replace faces in a video, whereas VDR replaces audio “while keeping the lips and facial expressions in sync to a source video.”

This CannyAI video of world leaders singing John Lennon’s “Imagine” is a good demonstration of the tech, and with a rather positive message.

So is a famous 2019 video, created with CannyAI’s VDR tech, of Facebook founder Mark Zuckerberg seemingly bragging about having “total control of billions of people’s stolen data.” The real video is one from an address Zuckerberg gave in 2017 about Russia’s interference in the US elections on Facebook.

The widespread use of deepfakes in fake news, often in the midst of national political campaigns, is cause for concern for the company. This and other abuses of technology, including for pornographic content, has caused an aversion to this field. With the “Imagine” video, Ben-Ami told FX Guide last April, “There’s a lot of hype on that, around the fake news with this technology and we wanted to do something with a strong unifying message, to show some positive uses for this technology.”

“Of course, we’re taking these ethical considerations into account,” Heimann tells NoCamels. CannyAI describes procedures of an in-house team that monitors and approves all target and source videos with their terms and conditions and guidelines, to ensure that the proper ownership and rights are in place and that there is no malice behind the videos. The full process includes uploading target and source videos, a compliance review, application of AI algorithms, quality assurance, and watermarking before producing the final high-quality video.

The company’s original direction was to provide a product for better dubbing of TV shows, reuse of existing footage, and conversion of training videos to different languages. However, CannyAI is currently developing a solution that will eliminate customers’ need for target video altogether.

Having recognized their customers’ high expenses for video production, which includes hiring a studio, a camera team, an editor, visual effects, and so on, CannyAI is working on high-quality video stock footage, so that “like Shutterstock, everyone would be able to choose a video of a person talking, but also change the dialogue in the footage,” explains Heimann. This will allow small businesses to create quality productions of testimonials, explainer videos, and live-action presentations that would otherwise be out of their budget. For this purpose, the company is currently looking for actors, animators, studios, and creators to join them.

Read this: No honor among thieves—Why the dark web is having a trust crisis

Projects doing similar work today include Face2Face, which does facial reenactment and which served as a source of inspiration for CannyAI’s founders, and its spin-off company Synthesia, which addresses language dubbing. Various research groups are working on similar solutions.

CannyAI distinguishes itself from its competition by focusing on the specific use case of high-quality or complex production and achieving a high-quality lip-sync effect, whereas other solutions out there target lower-end productions and mimicking rough facial expressions, for example. CannyAI also uses different technical implementation. Achieving what CannyAI does with traditional computer graphics manipulations is extremely tedious, to the point that it’s done rarely, and even when it is done, it’s used only to improve lip-syncing for a word or two.

Despite being in early business development stages, the company has already worked with MIT, Warner Music Group, Keshet, and even collaborated on a short film that was presented at the International Documentary Film Festival Amsterdam (IDFA) last April. CannyAI has also received significant mentions in Israeli mainstream media such as Channel12 and Mako.

Heimann and Ben-Ami have two advisors on their side, one with a Yale PhD in statistics and the other who works in audio post-production in the film industry.

In 2018, CannyAI participated in the first cycle of the accelerator launched by Israel’s Internal Security Agency (ISA, also known as the Shin Bet in Hebrew) and TAU Ventures, the investment arm of Tel Aviv University.

This article first appeared in NoCamels, which covers innovations from Israel for a global audience.