WeSearch

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

Rebecca Bellan· ·5 min read · 0 reactions · 0 comments · 11 views
#technology#artificial intelligence#video creation
Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start
⚡ TL;DR · AI summary

Google has introduced Gemini Omni, a new family of multimodal models capable of generating videos from various inputs like images, audio, and text. This development aims to enhance user creativity by allowing them to create high-quality videos that reflect a deep understanding of different subjects. The long-term vision for Omni includes generating images from audio and vice versa, expanding the possibilities of content creation.

Key facts
Original article
TechCrunch · Rebecca Bellan
Read full at TechCrunch →
Opening excerpt (first ~120 words) tap to expand

When Google launched Gemini three years ago, the goal was to build a multimodal large language model — a single neural network that was trained on text, image, audio, and video and could generate content in any of those formats. Today, at its Google I/O developer conference, the company took a concrete step toward that goal with Gemini Omni, a new family of multimodal models that Google CEO Sundar Pichai says will be able to “create anything from any input.” Omni will start with video. Users can now combine images, audio, video, and text, and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The result is high-quality videos that reflect an understanding of physics, culture, history, and science.

Excerpt limited to ~120 words for fair-use compliance. The full article is at TechCrunch.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from TechCrunch