FB Pixel no scriptShengshu and Kuaishou unveil new AI systems to improve generative video consistency
MENU
KrASIA
News

Shengshu and Kuaishou unveil new AI systems to improve generative video consistency

Written by T. K. Lin Published on   4 mins read

Share
Graphic by KrASIA.
Vidu Q2 and Kling O1 aim to keep AI-generated videos consistent from frame to frame.

Shengshu Technology and Kuaishou each introduced new multimodal artificial intelligence systems this week that emphasize consistency across characters, scenes, and shots. Shengshu expanded its Vidu Q2 model with a full image generation stack, while Kuaishou launched Kling O1, a unified engine for text, images, and video. Although the releases differ in scope, both aim to solve the same problem: making AI systems reliably maintain continuity across images and video.

This marks a shift from earlier stages of generative video development. In 2024, comparisons between models focused on realism, speed, and prompt execution. Tests by KrASIA that year showed that many systems could produce visually striking clips but struggled with consistency, especially across multiple shots. Shengshu’s and Kuaishou’s latest updates point toward a new priority: whether models can handle the structured, multi-shot sequences that creative teams work with every day.

A push toward reliable video creation

Even as video generation models improve in visual fidelity, creators using them continue to encounter issues with structural coherence. Characters can shift between frames, props may disappear, and styles sometimes change without clear cause. Even short sequences can lose continuity once the camera moves or a scene changes.

Vidu Q2 and Kling O1 both frame this as the core barrier to wider adoption, especially in short-form video, advertising, and early-stage film production. In previous tests, Vidu’s reference-to-video feature showed how subject and style consistency could improve results. Kling, meanwhile, often produced realistic stills but struggled with temporal consistency during fast action. The new releases target these weaknesses.

Why Vidu and Kling are prioritizing continuity

Unlike general-purpose AI labs, Shengshu and Kuaishou operate in ecosystems where short-form video and commerce-driven content are central. Their users depend on workflow stability as much as visual quality:

  • Shengshu’s earlier Vidu iterations emphasized fast generation, cost efficiency, and template-driven tools to help creators produce frequent short clips with consistent characters or brand assets.
  • Kuaishou, whose platform depends on video-led engagement across advertising, live streaming, e-commerce, and short dramas, follows a similar logic. Kling O1 is built not just to generate new video but to edit and refine existing footage, aligning with how creators work on the platform.

This shared background explains why both companies now emphasize models that can connect still images, references, and video sequences without introducing character drift or structural inconsistencies.

What the latest releases add

Vidu Q2’s new image generation stack expands it from a video-focused tool into a unified system for still and moving images. It supports text-to-image generation, multi-reference identity preservation, and detailed image editing. Shengshu said these features build on Vidu’s strength in varied visual styles while helping creators keep characters consistent across assets.

Vidu Q2 now produces high-resolution outputs up to 4K, with image generation times as fast as five seconds, depending on complexity. Crucially, the still images can be used as references for video sequences, allowing creators to move from character design to motion without recreating assets.

Kling O1 represents a significant architectural shift from earlier versions. Built on a “multimodal visual language” framework, it can process text, images, subjects, and video clips in a single workflow. The system supports combined editing and generation tasks, such as inserting a subject while adjusting the background.

One of the model’s key features is what Kuaishou describes as “director-like memory,” intended to maintain character identity during dynamic camera movements and multi-subject scenes. It supports three- to ten-second generation lengths, and first- and last-frame control is expected to match the same range. Unlike earlier versions that required manual masking or frame-by-frame fixes, Kling O1 aims to deliver pixel-level edits through natural language prompts.

How the new releases compare with other video models

The generative video landscape is starting to split into clearer segments:

  • Film-oriented models like OpenAI’s Sora, Google’s Veo, and Runway’s Gen-4.5, which focus on longer shots and realistic physics.
  • Short-form and stylized generators, where Vidu Q2 fits, optimized for social content, advertising, and animation.
  • Unified creation-and-editing engines, represented by Kling O1, combining reference-based generation, video editing, and restructuring.
  • Avatar-driven systems such as HeyGen and Synthesia.
  • Previsualization and storyboarding tools that rely on reusable characters and stable asset pipelines, a direction now shared by Vidu Q2 and Kling O1.

Across these categories, consistency involves several technical challenges: maintaining character identity across shots and angles, preserving props and spatial layouts, keeping color and style coherent during edits, handling multi-subject scenes, and supporting image-to-video transitions without drift. These capabilities determine whether AI-generated video can move from experimentation to regular production.

Vidu Q2’s unified image-video system and Kling O1’s multimodal engine both attempt to standardize this within their respective workflows.

The releases of Vidu Q2 and Kling O1 indicate that competition in generative video is moving away from headline-grabbing realism and toward workflow stability and continuity. Rather than trying to solve every use case, models are beginning to specialize in formats and production styles where they can deliver reliable, repeatable results.

For platforms built around short-form video and high-volume content creation, the next phase of AI video generation may depend less on the quality of individual clips and more on whether models can maintain coherent characters, scenes, and assets across the broader creative process.

Share

Loading...

Loading...