Veo 3.1

Veo 3.1 is an AI video generation model released by Google DeepMind in January 2026, featuring powerful semantic understanding and multi-modal reference capabilities.

Core Features

Audio Synchronization: Synchronously synthesizes ambient sound and dialogue while generating video, achieving natural lip-syncing.
Deep Semantic Understanding: Leveraging Gemini's language processing capabilities, it can precisely execute complex instructions involving professional camera language (such as dolly zoom, low-angle tracking).
Reference Image Locking: Supports uploading 1-3 reference images (character sketches, product photos, or scene settings) to extract textures, tones, and features as "visual anchors," ensuring character and scene consistency.

The single generation duration is about 8 seconds, but it supports scene extension through the "Extend" function, allowing clips to be concatenated into narrative videos of over 1 minute.

Try Image to Video Try Text to Video