#multimodal — Tagged Stories

Every story in the WeSearch catalog tagged with #multimodal, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

60 stories tagged with #multimodal, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag → or search "Multimodal"

RELATED TAGS

#ai8 #ml5 #technology3 #multimodal-systems2 #api2 #python2 #open-source1 #multimodal-models1 #xiaomi1 #ai-models1 #google-deepmind1 #model-architecture1

ARXIV.ORG

MedRealMM: A Real-World Multimodal Benchmark for Chinese Online Medical Consultation

arXiv:2607.09142v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed in online medical consultation, yet existing benchmarks remain poorly aligned…

27 views · Mon, 13 Jul 2026 06:15:33 GMT

#medrealmm #real-world

ARXIV CS.AI

Multimodal Reward Hacking in Reinforcement Learning

Reinforcement learning (RL) is increasingly used to align multimodal large language models (MLLMs), but higher rewards do not always imply better task performance. This risk is amp…

18 views · Mon, 13 Jul 2026 04:20:37 GMT

#reward #hacking

ARXIV CS.AI

SAGEAgent: A Self-Evolving Agent for Cost-Aware Modality Acquisition in Multimodal Survival Prediction

Does every cancer patient truly need a complete diagnostic workup for accurate survival prediction? In multimodal clinical oncology, diagnostic modalities follow a clinically manda…

14 views · Mon, 13 Jul 2026 04:20:37 GMT

#sageagent #self-evolving #agent

ARXIV.ORG

MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thinking Synergy

arXiv:2606.27652v1 Announce Type: new Abstract: We find that explicit reasoning does not necessarily translate into better multimodal emotion recognition (MER) accuracy, even thoug…

41 views · Mon, 29 Jun 2026 07:20:58 GMT

#mer-r #emotion

ARXIV.ORG

What We are Missing in Multimodal LLM Evaluation?

arXiv:2606.26348v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) can process diverse inputs, e.g., text, images, audio, and video, and generate textual resp…

34 views · Fri, 26 Jun 2026 05:20:40 GMT

#what #missing

R/LOCALLLAMA

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

36 views · Wed, 03 Jun 2026 17:22:53 GMT

TECHMEME

Alibaba releases Qwen3.7-Plus, a multimodal proprietary model with a 1M-token context window, costing $2 per 1M tokens, 60% less than text-only Qwen3.7-Max (Carl Franzen/VentureBeat)

28 views · Wed, 03 Jun 2026 10:02:03 GMT

ARXIV CS.AI

CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection

The rapid rise of generative AI has made multimodal fake news increasingly realistic and pervasive, posing severe threats to public trust and social stability. Existing detection m…

46 views · Wed, 03 Jun 2026 04:11:55 GMT

#artificial intelligence #fake news #machine learning

ARXIV CS.AI

CP-Agent: Context-Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations

Cell Painting combines multiplexed fluorescent staining, high-content imaging, and quantitative analysis to generate high-dimensional phenotypic readouts to support diverse downstr…

46 views · Wed, 03 Jun 2026 04:11:55 GMT

#artificial intelligence #drug discovery #cell biology

INVESTING.COM — NEWS

Tempus AI presents multimodal foundation model results at ASCO

24 views · Fri, 29 May 2026 13:30:01 GMT

HINDUSTAN TIMES — TOP

India needs 216 multimodal logistics parks by 2047 for smooth freight movement: CII

A Union government official had told HT that the government had now downscaled its approach to build only nine of them, primarily due to land acquisition constraints | India News…

40 views · Fri, 29 May 2026 09:45:00 GMT

#logistics #infrastructure #transportation

STEPFUN

Step 3.7 Flash – Open-source multimodal model for speed and agents

22 views · Fri, 29 May 2026 04:29:41 GMT

#technology #artificial intelligence #open-source

ARXIV CS.AI

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Multimodal large language model (MLLM)-based embodied agents have shown strong potential for solving complex tasks in physical environments. However, personalized assistance requir…

37 views · Wed, 27 May 2026 04:07:56 GMT

#artificial intelligence #machine learning #personalization

ARXIV CS.AI

Advancing Creative Physical Intelligence in Large Multimodal Models

Large multimodal models (LMMs) have rapidly advanced in perception and reasoning; however, it remains unclear whether these capabilities generalize to discovering visually grounded…

44 views · Wed, 27 May 2026 04:07:56 GMT

#artificial intelligence #machine learning #creativity

ARXIV CS.AI

PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design

Polymer discovery is central to fields ranging from energy storage to biomedicine, but it is hindered by an astronomically large chemical design space and fragmented representation…

32 views · Wed, 27 May 2026 04:07:56 GMT

#artificial intelligence #machine learning #polymer science

ARXIV CS.AI

LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?

Advanced Large Multimodal Models (LMMs) have demonstrated impressive performance in K-12 reasoning tasks, exhibiting great promise as intelligent tutors. Realizing this potential r…

28 views · Wed, 27 May 2026 04:07:56 GMT

#artificial intelligence #education #machine learning

DEV.TO (TOP)

Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes

Look, I’m a backend engineer. I don’t have time to read through 40 pages of model cards before...…

28 views · Tue, 26 May 2026 04:37:43 GMT

#api #ai #python

ARXIV CS.AI

ConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology

Healthcare models are transitioning from unimodal prediction toward multimodal reasoning over heterogeneous diagnostic inputs. In computational pathology, for complex tumor subtype…

27 views · Tue, 26 May 2026 04:07:43 GMT

#artificial intelligence #healthcare #computational pathology

ARXIV CS.AI

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist mode…

28 views · Tue, 26 May 2026 04:07:43 GMT

#artificial intelligence #machine learning #technology

GITHUB

Show HN: Gemini Omni – A curated list of native multimodal guides and showcases

A curated list of awesome Google Gemini Omni prompt guides, interactive platforms, and creative showcases. - cnemri/awesome-gemini-omni…

34 views · Mon, 25 May 2026 10:37:36 GMT

#technology #artificial intelligence #video editing

ARXIV CS.AI

Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment

Multimodal large language models (MLLMs) need efficient mechanisms to update knowledge without degrading existing capabilities. While intrinsic multimodal knowledge editing achieve…

25 views · Mon, 25 May 2026 04:07:35 GMT

#artificial intelligence #machine learning

ARXIV CS.AI

LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding

Multimodal Retrieval-Augmented Generation (RAG) has emerged as an effective paradigm for enhancing Large Language Models (LLMs) with external knowledge. However, existing multimoda…

30 views · Mon, 25 May 2026 04:07:35 GMT

#information retrieval #artificial intelligence #machine learning

ARXIV CS.AI

RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis

Chronic osteomyelitis presents substantial prognostic challenges due to its high recurrence risk and complex postoperative recovery trajectories. Traditional assessment often relie…

19 views · Mon, 25 May 2026 04:07:35 GMT

#healthcare #artificial intelligence #machine learning

DEV.TO (TOP)

Gemma 4: The 128K Multimodal Powerhouse in Your Terminal

A raw, developer-first look at Google’s new open-weight Gemma 4 family—featuring a hands-on local...…

30 views · Mon, 25 May 2026 02:37:35 GMT

#ai #technology #development

DEV.TO (TOP)

Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes

Look, I’m a backend engineer. I don’t have time to read through 40 pages of model cards before...…

28 views · Sat, 23 May 2026 23:37:28 GMT

#api #benchmarking

CRYPTO BRIEFING

Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio

Google DeepMind unveiled Gemini Omni at Google I/O, a multimodal AI model family for video generation with implications for decentralized compute and Web3 media.…

21 views · Sat, 23 May 2026 11:37:26 GMT

#technology #artificial intelligence #video generation

DEV.TO (TOP)

When AI Reads Blueprints: The Hidden Attack Surface of Multimodal Engineering Intelligence

description: "A security analysis of steganographic prompt injection and data poisoning...…

25 views · Sat, 23 May 2026 09:07:26 GMT

#cybersecurity #ai #engineering

DEV.TO (TOP)

Gemma 4 is Here: The Dawn of Local Multimodal Reasoning

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 Gemma 4 is Here: The...…

19 views · Sat, 23 May 2026 07:37:25 GMT

#technology #artificial intelligence #software development

DEV.TO (TOP)

The Edge AI Revolution: Why Gemma 4 E4B is a Game-Changer for Offline Multimodality

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 The Cloud is Great, But...…

30 views · Fri, 22 May 2026 19:02:02 GMT

#ai #technology #disaster response

DEV.TO (TOP)

Replicating a Language-Learning Comedy Short with Claude Code — Gemini as a Multimodal Sub-Agent

Building a local GPU + Gemini 3.1 Pro hybrid pipeline that generates publishable comedy Shorts from a single line of text in under 60 seconds.…

23 views · Fri, 22 May 2026 11:32:01 GMT

#ai #comedy #language-learning

ARXIV CS.AI

Evaluating multimodal emotion recognition in proactive conversational agents: A user study

This article presents a multimodal emotion recognition module integrated into a proactive Socially Interactive Agent (SIA) powered by generative artificial intelligence. The system…

29 views · Fri, 22 May 2026 04:02:00 GMT

#artificial intelligence #human-computer interaction #emotion recognition

ARXIV CS.AI

Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding

Real-world time series come with text: metadata, descriptions, news, reports. Yet time series foundation models process numerical sequences in isolation, and the multimodal text-an…

29 views · Fri, 22 May 2026 04:02:00 GMT

#machine learning #artificial intelligence #natural language processing

ARXIV CS.AI

JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA

Industrial anomaly detection has been significantly advanced by Large Multimodal Models (LMMs), enabling diverse human instructions beyond detection, particularly through visually …

30 views · Fri, 22 May 2026 04:02:00 GMT

#computer vision #artificial intelligence #machine learning

ARXIV CS.AI

Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities

Multimodal semantic segmentation benefits remote sensing analysis by combining complementary information from different sensor modalities. In real-world remote sensing applications…

35 views · Fri, 22 May 2026 04:02:00 GMT

#computer vision #artificial intelligence #remote sensing

ARXIV CS.AI

SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction

Multimodal IE in social media is difficult because a post may attach multiple images that are weakly related, redundant, or even misleading with respect to the text. In this settin…

31 views · Fri, 22 May 2026 04:02:00 GMT

#computer vision #artificial intelligence #machine learning

ARXIV CS.AI

Hallucination as Exploit: Evidence-Carrying Multimodal Agents

Multimodal agents use screenshots, documents, and webpages to choose tool calls. When a false visual claim triggers a click, email, extraction, or transfer, hallucination becomes a…

35 views · Wed, 20 May 2026 04:04:59 GMT

#artificial intelligence #security #multimodal agents

ARXIV CS.AI

GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction

Existing affective-computing, social-signal-processing, and meeting corpora capture important parts of human interaction, but they rarely support analysis of affect in co-located g…

28 views · Wed, 20 May 2026 04:04:59 GMT

#artificial intelligence #datasets #collaboration

ARXIV CS.AI

Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking

Checkpoint selection for multimodal large language models (MLLMs) presents significant challenges when performance differentials are marginal and evaluation signals are prone to no…

31 views · Wed, 20 May 2026 04:04:59 GMT

#machine learning #artificial intelligence #language models

CRYPTO BRIEFING

Google unveils Gemini Omni, its first native multimodal AI model built for enterprises

Google unveiled Gemini Omni at I/O, its first native multimodal AI model for enterprises that processes video, audio, images, and text from a single architecture.…

34 views · Tue, 19 May 2026 19:34:57 GMT

#technology #artificial intelligence #enterprise

TOWARDS DATA SCIENCE

Deploying a Multistage Multimodal Recommender System on Amazon Elastic Kubernetes Service

A practical walkthrough of building and deploying a multistage, multimodal recommender system on Amazon EKS, covering data pipelines, model training, Bloom filters, feature caching…

46 views · Tue, 19 May 2026 18:19:57 GMT

#machine learning #ecommerce #cloud computing

TOM'S GUIDE

Gemini Omni Flash can create and edit videos with your voice and it feels like the future of multimodal AI

Gemini Omni Flash sounds like it’ll be an essential new AI content creation tool…

27 views · Tue, 19 May 2026 18:14:57 GMT

#ai #video #technology

TECHMEME

Google launches the Gemini Omni multimodal model, saying it can "create anything from any input", starting with video generation, for Google AI subscribers (Carl Franzen/VentureBeat)

39 views · Tue, 19 May 2026 18:09:59 GMT

CNET — NEWS

Google Introduces Gemini Omni, a Multimodal AI That Knows the World

Starting with video, Omni will eventually be able to create any output from any input.…

37 views · Tue, 19 May 2026 18:09:57 GMT

#technology #ai #video

ARXIV CS.AI

TTE-Flash: Accelerating Reasoning-based Multimodal Representations via Think-Then-Embed Tokens

Recent research has demonstrated that Universal Multimodal Embedding (UME) benefits significantly from Chain-of-Thought (CoT) reasoning. In this paradigm, a generative model produc…

28 views · Tue, 19 May 2026 04:04:57 GMT

#artificial intelligence #machine learning

ARXIV CS.AI

Learning to Learn from Multimodal Experience

Experience-driven learning has emerged as a promising paradigm for enabling agents to improve from interaction trajectories by accumulating and reusing past experience. However, ex…

24 views · Tue, 19 May 2026 04:04:57 GMT

#artificial intelligence #machine learning

ARXIV CS.AI

F2IND-IT! -- Multimodal Fuzzy Fake Indian News Detection using Images and Text

Biased manipulation of facts across regional and national media outlets complicates misinformation detection in diverse landscapes like India. This paper introduces a novel multimo…

29 views · Tue, 19 May 2026 04:04:57 GMT

#artificial intelligence #fake news #media

ARXIV CS.AI

CatalyticMLLM: A Graph-Text Multimodal Large Language Model for Catalytic Materials

Property prediction and inverse structural design of catalytic materials are typically modeled as two independent tasks: the former predicts target properties from given structures…

27 views · Tue, 19 May 2026 04:04:57 GMT

#artificial intelligence #machine learning #catalytic materials

ARXIV CS.AI

Multimodal Cultural Heritage Knowledge Graph Extension with Language and Vision Models

The preservation and interpretation of cultural heritage increasingly rely on digital technologies, among which Knowledge Graphs (KGs) stand out for their ability to structure vast…

34 views · Tue, 19 May 2026 04:04:57 GMT

#artificial intelligence #cultural heritage #knowledge graphs

ARXIV CS.AI

EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness

While increasing research focuses on the emotional well-being of agile team members, a significant gap remains in emotion monitoring studies for Scrum Masters and meeting organizer…

36 views · Tue, 19 May 2026 04:04:57 GMT

#artificial intelligence #scrum #emotions

ARXIV CS.AI

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Multimodal large language models are increasingly used as agent backbones that understand multimodal inputs, plan retrieval actions, invoke external tools, and reason over retrieve…

33 views · Tue, 19 May 2026 04:04:57 GMT

#artificial intelligence #machine learning #computer vision

ARXIV CS.AI

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

Multimodal large language models (MLLMs) often fail to transfer safety capabilities learned in the text modality to semantically equivalent non-text inputs, revealing a persistent …

26 views · Tue, 19 May 2026 04:04:57 GMT

#artificial intelligence #safety #machine learning

HUGGINGFACE

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

Join the discussion on this paper page…

32 views · Mon, 18 May 2026 05:04:56 GMT

#artificial intelligence #machine learning #computer vision

ARXIV CS.AI

Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation

We introduce Agent4POI, the first POI recommendation framework that generates context-conditioned multimodal representations at recommendation time, rather than relying on static P…

31 views · Mon, 18 May 2026 04:04:54 GMT

#artificial intelligence #information retrieval #multimodal systems

ARXIV CS.AI

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics …

31 views · Mon, 18 May 2026 04:04:54 GMT

#machine learning #artificial intelligence #data science

ARXIV CS.AI

ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

Multimodal large language models (MLLMs) may memorize sensitive cross-modal information during pretraining, making machine unlearning (MU) crucial. Existing methods typically evalu…

32 views · Mon, 18 May 2026 04:04:54 GMT

#machine learning #artificial intelligence #language models

R/HOMELAB

Fibra monomodale (SMF) VS multimodale (MMF) Consigli.

36 views · Sun, 17 May 2026 13:52:16 GMT

DEV.TO (TOP)

Gemma.Witness - Offline Multimodal Evidence Capture with Gemma 4

An offline-first multimodal evidence capture system built on Gemma 4, designed for environments where cloud access and chain-of-custody assumptions fail.…

23 views · Sun, 17 May 2026 04:03:58 GMT

#technology #evidence #offline

DEV.TO (TOP)

Gemma 4: From Raspberry Pi to Research Workstation — One Architecture, No Quality Compromise

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 TL;DR — Gemma 4 is four...…

41 views · Sun, 17 May 2026 02:40:19 GMT

#ai models #machine learning #google deepmind

TECHMEME

Nvidia launches Nemotron 3 Nano Omni, an open multimodal model with a 30B-A3B hybrid MoE architecture; the Nemotron 3 family saw 50M+ downloads in the past year (Kyt Dotson/SiliconANGLE)

Kyt Dotson / SiliconANGLE : Nvidia launches Nemotron 3 Nano Omni, an open multimodal model with a 30B-A3B hybrid MoE architecture; the Nemotron 3 family saw 50M+ downloads in the p…

27 views · Tue, 28 Apr 2026 22:51:15 GMT

HUGGINGFACE

Xiaomi open-sources MiMo-V2.5: 311B A15B 1M-context omnimodal model

We’re on a journey to advance and democratize artificial intelligence through open source and open science.…

34 views · Tue, 28 Apr 2026 05:39:00 GMT

#artificial intelligence #machine learning #open source

Browse more

All tags Search "Multimodal" RSS feed World US Technology Markets

Multimodal coverage.

MedRealMM: A Real-World Multimodal Benchmark for Chinese Online Medical Consultation

Multimodal Reward Hacking in Reinforcement Learning

SAGEAgent: A Self-Evolving Agent for Cost-Aware Modality Acquisition in Multimodal Survival Prediction

MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thinking Synergy

What We are Missing in Multimodal LLM Evaluation?

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Alibaba releases Qwen3.7-Plus, a multimodal proprietary model with a 1M-token context window, costing $2 per 1M tokens, 60% less than text-only Qwen3.7-Max (Carl Franzen/VentureBeat)

CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection

CP-Agent: Context-Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations

Tempus AI presents multimodal foundation model results at ASCO

India needs 216 multimodal logistics parks by 2047 for smooth freight movement: CII

Step 3.7 Flash – Open-source multimodal model for speed and agents

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Advancing Creative Physical Intelligence in Large Multimodal Models

PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design

LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?

Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes

ConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Show HN: Gemini Omni – A curated list of native multimodal guides and showcases

Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment

LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding

RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis

Gemma 4: The 128K Multimodal Powerhouse in Your Terminal

Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes

Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio

When AI Reads Blueprints: The Hidden Attack Surface of Multimodal Engineering Intelligence

Gemma 4 is Here: The Dawn of Local Multimodal Reasoning

The Edge AI Revolution: Why Gemma 4 E4B is a Game-Changer for Offline Multimodality

Replicating a Language-Learning Comedy Short with Claude Code — Gemini as a Multimodal Sub-Agent

Evaluating multimodal emotion recognition in proactive conversational agents: A user study

Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding

JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA

Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities

SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction

Hallucination as Exploit: Evidence-Carrying Multimodal Agents

GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction

Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking

Google unveils Gemini Omni, its first native multimodal AI model built for enterprises

Deploying a Multistage Multimodal Recommender System on Amazon Elastic Kubernetes Service

Gemini Omni Flash can create and edit videos with your voice and it feels like the future of multimodal AI

Google launches the Gemini Omni multimodal model, saying it can "create anything from any input", starting with video generation, for Google AI subscribers (Carl Franzen/VentureBeat)

Google Introduces Gemini Omni, a Multimodal AI That Knows the World

TTE-Flash: Accelerating Reasoning-based Multimodal Representations via Think-Then-Embed Tokens

Learning to Learn from Multimodal Experience

F2IND-IT! -- Multimodal Fuzzy Fake Indian News Detection using Images and Text

CatalyticMLLM: A Graph-Text Multimodal Large Language Model for Catalytic Materials

Multimodal Cultural Heritage Knowledge Graph Extension with Language and Vision Models

EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

Fibra monomodale (SMF) VS multimodale (MMF) Consigli.

Gemma.Witness - Offline Multimodal Evidence Capture with Gemma 4

Gemma 4: From Raspberry Pi to Research Workstation — One Architecture, No Quality Compromise

Nvidia launches Nemotron 3 Nano Omni, an open multimodal model with a 30B-A3B hybrid MoE architecture; the Nemotron 3 family saw 50M+ downloads in the past year (Kyt Dotson/SiliconANGLE)

Xiaomi open-sources MiMo-V2.5: 311B A15B 1M-context omnimodal model

Browse more