Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio

Editorial Team· May 23, 2026 · 11:15 AM UTC ·3 min read · 0 reactions · 0 comments · 22 views

#technology #artificial intelligence #video generation

Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio

TL;DR · WeSearch summary

Google has introduced Gemini Omni, a new multimodal AI model capable of generating video from various inputs including text, images, and audio. This model represents a significant advancement in video generation technology, allowing for the creation of short clips with synchronized audio. Gemini Omni is set to replace the previous Veo model and aims to enhance user interaction through a conversational interface for editing.

Key facts

▪Gemini Omni can generate short video clips of approximately 10 seconds in length.
▪The model emphasizes improvements in world understanding, physics simulation, and character consistency.
▪Google plans to expand the clip length capabilities over time, although no specific timeline has been provided.

Original article

Crypto Briefing · Editorial Team

Read full at Crypto Briefing →

Opening excerpt (first ~120 words) tap to expand

Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio The multimodal model turns text, images, audio, and existing footage into realistic video clips, with implications that ripple well beyond Mountain View. Share Add us on Google by Editorial Team May. 23, 2026 window.sevioads = window.sevioads || []; var sevioads_preferences = []; sevioads_preferences[0] = {}; sevioads_preferences[0].zone = "01f21ccf-2092-46b1-9ac7-8c44cc782e0f"; sevioads_preferences[0].adType = "native"; sevioads_preferences[0].inventoryId = "c5700508-581b-472c-8fdd-a931cdbfc8e1"; sevioads_preferences[0].accountId = "1e47efc1-ec2d-4fca-a8b9-354e249e5095"; sevioads.push(sevioads_preferences); Google DeepMind just dropped what might be the most capable video generation model…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Crypto Briefing.

Anonymous · no account needed

Discussion

0 comments

Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio

Discussion

More from Crypto Briefing