A fresh name has emerged in the rapidly expanding landscape of text-to-video generation tools: Shengshu Technology’s Vidu AI. Launched globally on July 30, Vidu AI allows users to convert text (in both Chinese and English) and images into crisp video clips of 4 or 8 seconds in duration. The platform comes equipped with upscaling capabilities, bringing those clips to full 1080p resolution, setting a high bar for quality.
Vidu’s arrival signals yet another step forward for China’s ambitions to develop generative artificial intelligence. In tandem with Kuaishou’s Kling AI, MiniMax’s Hailuo AI, and a growing roster of others, Shengshu’s tool is joining the fray just as China’s AI developers work to match, if not outpace, more established international names like OpenAI’s Sora and Google’s Veo.
Media reports have highlighted several key features that make Vidu AI stand out. First is its efficiency—Vidu can reportedly generate a 4-second video clip in just 30 seconds, making it one of the fastest models of its kind.
There’s also the “reference-to-video” feature, which ensures consistency in subjects, settings, and visual styles across multiple clips—an especially useful tool for creators working in dynamic formats like films and games, where coherence is essential.
Vidu’s ability to generate anime-style videos has also drawn attention, which could make it a preferred tool for creators aiming to replicate the distinct aesthetic of Japanese anime. But how does Vidu AI stack up in real-world tests?
How Vidu AI compares
To gauge Vidu AI’s performance, KrASIA put the tool through a series of prompts previously tested on Kling AI and Hailuo AI. The comparisons were designed to evaluate not only the quality of the generated videos but also the coherence, creativity, and speed of each tool.
When prompted to generate a video of a “realistic puppy driving a car,” Vidu AI delivered a clip that, while visually engaging, leaned more toward a toy-like representation of the car rather than the real thing.
The puppy was well-positioned behind the wheel, but it didn’t quite feel like the driver—it seemed more placed there than actively interacting with the scene. Hailuo AI encountered a similar issue, struggling to achieve full realism. It seems that, for prompts like this, adding more detailed inputs—such as specifying the car model or dog breed—could result in a stronger output.
Next, Vidu AI was tasked with a more playful challenge: a “cute kitten eating lunch like a human.”
Here, the tool performed on par with both Kling AI and Hailuo AI, producing a charming scene of a kitten mimicking human behavior at the table.
There was no significant difference in visual quality across the models, but Vidu AI’s output was smooth, hitting the mark for anthropomorphic charm. Interestingly, Kling AI showed a slight edge in realism, particularly in its depiction of the kitten using utensils in a human-like fashion.
Things took a more complex turn when Vidu was prompted to create “astronauts repairing a space station orbiting Earth.”
Vidu AI’s version of the space station leaned conservative, resembling more of a satellite compared to Hailuo AI’s orbital hub-like structure. However, where Vidu AI shone was in the movement of the astronauts—they appeared to actively manipulate wiring, adding an ambitious dynamic to the scene. Despite some fuzziness in the frames due to upscaling, Vidu’s output felt more alive than Kling AI’s, whose movements appeared far more static.
When asked to generate “medieval knights in combat,” Vidu AI struggled to generate a fluid, realistic fight sequence. The knights moved stiffly, and the action lacked fluidity. Hailuo AI faced similar difficulties, producing incoherent action sequences that failed to deliver.
However, after refining the prompt to specify “two medieval knights in combat,” Vidu’s performance improved, with more defined characters and movements, though it still didn’t reach the desired level of dynamism.
To test Vidu AI’s much-touted consistency and anime-style rendering, the test was extended with a prompt that played to its strengths: “samurais in combat, anime style.” Vidu AI outperformed its competitors here, generating striking visuals that captured the classic anime aesthetic. The movements were fluid and highly stylized, mirroring the conventions of Japanese animation.
Kling AI, by contrast, struggled to evoke the anime style, delivering a samurai that looked too realistic for the prompt. Hailuo AI fared better with its art style but stumbled in portraying the expected combat scenario.
One final test was designed to assess Vidu AI’s reference-to-video capability—a feature that is designed to maintain visual consistency when transferring styles between formats. Using an image generated by Kling AI—a profile of a woman with blonde hair and blue eyes—as a reference, a prompt was issued for Vidu AI to create a video placing the woman in a beach setting.
Vidu AI excelled here, nearly perfectly replicating the woman’s facial features and outfit while seamlessly transitioning her into a sunlit beach environment. Natural shadows and lighting effects added depth, showcasing Vidu AI’s ability to maintain continuity across formats.
With these prompts, Vidu AI completed each task within a minute, significantly faster than Kling AI and Hailuo AI, with Kling AI taking noticeably longer. Though Vidu AI sometimes exceeded its advertised 30-second generation time, it consistently delivered outputs in under a minute. Of course, factors like platform traffic and network latency could have influenced these times, meaning Vidu AI may perform even faster under optimal conditions.
The tech and team behind Vidu AI
Vidu AI’s capabilities are powered by Shengshu’s universal vision transformer (U-ViT) model. Developed by chief scientist Zhu Jun and his team, U-ViT was first introduced in a 2022 research paper. This system combines transformer and diffusion algorithms, resulting in a flexible and powerful architecture capable of generating a wide array of video outputs.
Since its launch, Vidu AI has begun making waves, particularly in the film industry. Notably, Chinese director Li Ning is said to be incorporating Vidu AI and other generative AI tools into the production of what is set to become China’s first fully AI-generated movie, scheduled for release later this year. Vidu AI’s ability to maintain visual consistency across scenes is likely essential in this groundbreaking project, offering a glimpse into how AI could transform future filmmaking.
Shengshu itself is a young company, founded in March 2023 by a team from Tsinghua University’s Institute for AI Industry Research (AIR). Despite its youth, Shengshu has quickly managed to secure substantial financial backing. In March, the company completed a funding round led by Qiming Venture Partners, with participation from several other investors. Just a few months later, Shengshu announced a pre-Series A round co-led by Baidu and the Beijing Artificial Intelligence Industry Investment Fund. Ant Group, Alibaba’s financial affiliate, is also on board, adding to Shengshu’s impressive list of backers.
While Vidu AI has made a good first impression, it’s far from alone in the burgeoning generative AI space. The competition is heating up fast. In July, Zhipu AI’s video-generating tool, Ying, hit the market. At the same time, ByteDance’s Faceu Technology—best known for its video editing app CapCut—has also entered the fray. Faceu’s Jimeng AI, still in its early rollout stages, could soon emerge as another serious contender in the race to dominate the AI-generated video market.
Regardless, Shengshu is making its ambitions clear. CEO Tang Jiayu has his sights set on challenging global giants like OpenAI and Google. Accordingly, Shengshu’s current focus is on refining its core applications in film production, anime creation, and the digital restoration of cultural relics—areas that align with China’s broader push to lead in AI-driven industries.