Kling AI launches Video 2.6 with native audio generation to speed end-to-end content creation

Kling AI, an artificial intelligence platform for creatives, has announced a major upgrade with the launch of its Video 2.6 model, which now supports native audio generation. The release reflects the company’s push to simplify and consolidate the content production workflow by offering a more complete end-to-end solution.

Video 2.6 integrates the generation of video, dialogue, sound effects, and ambient audio into a single step. This removes the need for creators to use separate tools for visuals and sound and then manually synchronize them, eliminating a bottleneck that has long limited the efficiency of AI-driven media production.

According to Kling, the model’s enhanced semantic understanding enables it to interpret a wide range of inputs, from simple text descriptions and spoken commands to complex, multi-scene storylines. This helps ensure that the visuals and audio it generates align with a creator’s intended direction.

A key upgrade is the system’s expanded audio capabilities and tighter audiovisual synchronization. Video 2.6 can generate a broad spectrum of sounds, including lifelike human voices for speech, singing, and rapping, as well as detailed environmental effects such as breaking glass, crackling fire, and ocean waves. The model also supports granular control through prompts, letting creators specify emotions, tone, rhythm, and even volume—for example, shifting from a whisper to a dramatic scream.

By consolidating these features into a single workflow, Kling expects Video 2.6 to reduce production costs and shorten turnaround times for studios, influencers, and other creative professionals. The company sees the platform as essential for creators who want to deliver immersive visual and auditory experiences with less reliance on post-production.

This article was published in partnership with Kling AI.