WeSearch
Hub / Tags / Multimodal
TAG · #MULTIMODAL

Multimodal coverage.

Every story in the WeSearch catalog tagged with #multimodal, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

55 stories tagged with #multimodal, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Multimodal"

RELATED TAGS
#ai8#ml5#technology3#multimodal-systems2#api2#python2#open-source1#multimodal-models1#xiaomi1#ai-models1#google-deepmind1#model-architecture1
R/LOCALLLAMA

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

12 views ·
TECHMEME

Alibaba releases Qwen3.7-Plus, a multimodal proprietary model with a 1M-token context window, costing $2 per 1M tokens, 60% less than text-only Qwen3.7-Max (Carl Franzen/VentureBeat)

5 views ·
ARXIV CS.AI

CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection

The rapid rise of generative AI has made multimodal fake news increasingly realistic and pervasive, posing severe threats to public trust and social stability. Existing detection m…

10 views ·
#artificial intelligence#fake news#machine learning
ARXIV CS.AI

CP-Agent: Context-Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations

Cell Painting combines multiplexed fluorescent staining, high-content imaging, and quantitative analysis to generate high-dimensional phenotypic readouts to support diverse downstr…

15 views ·
#artificial intelligence#drug discovery#cell biology
INVESTING.COM — NEWS

Tempus AI presents multimodal foundation model results at ASCO

10 views ·
HINDUSTAN TIMES — TOP

India needs 216 multimodal logistics parks by 2047 for smooth freight movement: CII

A Union government official had told HT that the government had now downscaled its approach to build only nine of them, primarily due to land acquisition constraints | India News…

23 views ·
#logistics#infrastructure#transportation
STEPFUN

Step 3.7 Flash – Open-source multimodal model for speed and agents

9 views ·
#technology#artificial intelligence#open-source
ARXIV CS.AI

Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions

Multimodal large language model (MLLM)-based embodied agents have shown strong potential for solving complex tasks in physical environments. However, personalized assistance requir…

21 views ·
#artificial intelligence#machine learning#personalization
ARXIV CS.AI

Advancing Creative Physical Intelligence in Large Multimodal Models

Large multimodal models (LMMs) have rapidly advanced in perception and reasoning; however, it remains unclear whether these capabilities generalize to discovering visually grounded…

22 views ·
#artificial intelligence#machine learning#creativity
ARXIV CS.AI

PolyFusionAgent: A Multimodal Foundation Model and Autonomous AI Assistant for Polymer Property Prediction and Inverse Design

Polymer discovery is central to fields ranging from energy storage to biomedicine, but it is hindered by an astronomically large chemical design space and fragmented representation…

14 views ·
#artificial intelligence#machine learning#polymer science
ARXIV CS.AI

LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?

Advanced Large Multimodal Models (LMMs) have demonstrated impressive performance in K-12 reasoning tasks, exhibiting great promise as intelligent tutors. Realizing this potential r…

12 views ·
#artificial intelligence#education#machine learning
DEV.TO (TOP)

Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes

Look, I’m a backend engineer. I don’t have time to read through 40 pages of model cards before...…

11 views ·
#api#ai#python
ARXIV CS.AI

ConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology

Healthcare models are transitioning from unimodal prediction toward multimodal reasoning over heterogeneous diagnostic inputs. In computational pathology, for complex tumor subtype…

8 views ·
#artificial intelligence#healthcare#computational pathology
ARXIV CS.AI

PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist mode…

9 views ·
#artificial intelligence#machine learning#technology
GITHUB

Show HN: Gemini Omni – A curated list of native multimodal guides and showcases

A curated list of awesome Google Gemini Omni prompt guides, interactive platforms, and creative showcases. - cnemri/awesome-gemini-omni…

16 views ·
#technology#artificial intelligence#video editing
ARXIV CS.AI

Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment

Multimodal large language models (MLLMs) need efficient mechanisms to update knowledge without degrading existing capabilities. While intrinsic multimodal knowledge editing achieve…

10 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding

Multimodal Retrieval-Augmented Generation (RAG) has emerged as an effective paradigm for enhancing Large Language Models (LLMs) with external knowledge. However, existing multimoda…

9 views ·
#information retrieval#artificial intelligence#machine learning
ARXIV CS.AI

RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis

Chronic osteomyelitis presents substantial prognostic challenges due to its high recurrence risk and complex postoperative recovery trajectories. Traditional assessment often relie…

10 views ·
#healthcare#artificial intelligence#machine learning
DEV.TO (TOP)

Gemma 4: The 128K Multimodal Powerhouse in Your Terminal

A raw, developer-first look at Google’s new open-weight Gemma 4 family—featuring a hands-on local...…

9 views ·
#ai#technology#development
DEV.TO (TOP)

Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes

Look, I’m a backend engineer. I don’t have time to read through 40 pages of model cards before...…

13 views ·
#api#benchmarking
CRYPTO BRIEFING

Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio

Google DeepMind unveiled Gemini Omni at Google I/O, a multimodal AI model family for video generation with implications for decentralized compute and Web3 media.…

8 views ·
#technology#artificial intelligence#video generation
DEV.TO (TOP)

When AI Reads Blueprints: The Hidden Attack Surface of Multimodal Engineering Intelligence

description: "A security analysis of steganographic prompt injection and data poisoning...…

9 views ·
#cybersecurity#ai#engineering
DEV.TO (TOP)

Gemma 4 is Here: The Dawn of Local Multimodal Reasoning

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 Gemma 4 is Here: The...…

10 views ·
#technology#artificial intelligence#software development
DEV.TO (TOP)

The Edge AI Revolution: Why Gemma 4 E4B is a Game-Changer for Offline Multimodality

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 The Cloud is Great, But...…

12 views ·
#ai#technology#disaster response
DEV.TO (TOP)

Replicating a Language-Learning Comedy Short with Claude Code — Gemini as a Multimodal Sub-Agent

Building a local GPU + Gemini 3.1 Pro hybrid pipeline that generates publishable comedy Shorts from a single line of text in under 60 seconds.…

10 views ·
#ai#comedy#language-learning
ARXIV CS.AI

Evaluating multimodal emotion recognition in proactive conversational agents: A user study

This article presents a multimodal emotion recognition module integrated into a proactive Socially Interactive Agent (SIA) powered by generative artificial intelligence. The system…

12 views ·
#artificial intelligence#human-computer interaction#emotion recognition
ARXIV CS.AI

Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding

Real-world time series come with text: metadata, descriptions, news, reports. Yet time series foundation models process numerical sequences in isolation, and the multimodal text-an…

10 views ·
#machine learning#artificial intelligence#natural language processing
ARXIV CS.AI

JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA

Industrial anomaly detection has been significantly advanced by Large Multimodal Models (LMMs), enabling diverse human instructions beyond detection, particularly through visually …

12 views ·
#computer vision#artificial intelligence#machine learning
ARXIV CS.AI

Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities

Multimodal semantic segmentation benefits remote sensing analysis by combining complementary information from different sensor modalities. In real-world remote sensing applications…

16 views ·
#computer vision#artificial intelligence#remote sensing
ARXIV CS.AI

SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction

Multimodal IE in social media is difficult because a post may attach multiple images that are weakly related, redundant, or even misleading with respect to the text. In this settin…

12 views ·
#computer vision#artificial intelligence#machine learning
ARXIV CS.AI

Hallucination as Exploit: Evidence-Carrying Multimodal Agents

Multimodal agents use screenshots, documents, and webpages to choose tool calls. When a false visual claim triggers a click, email, extraction, or transfer, hallucination becomes a…

19 views ·
#artificial intelligence#security#multimodal agents
ARXIV CS.AI

GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction

Existing affective-computing, social-signal-processing, and meeting corpora capture important parts of human interaction, but they rarely support analysis of affect in co-located g…

12 views ·
#artificial intelligence#datasets#collaboration
ARXIV CS.AI

Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking

Checkpoint selection for multimodal large language models (MLLMs) presents significant challenges when performance differentials are marginal and evaluation signals are prone to no…

13 views ·
#machine learning#artificial intelligence#language models
CRYPTO BRIEFING

Google unveils Gemini Omni, its first native multimodal AI model built for enterprises

Google unveiled Gemini Omni at I/O, its first native multimodal AI model for enterprises that processes video, audio, images, and text from a single architecture.…

14 views ·
#technology#artificial intelligence#enterprise
TOWARDS DATA SCIENCE

Deploying a Multistage Multimodal Recommender System on Amazon Elastic Kubernetes Service

A practical walkthrough of building and deploying a multistage, multimodal recommender system on Amazon EKS, covering data pipelines, model training, Bloom filters, feature caching…

14 views ·
#machine learning#ecommerce#cloud computing
TOM'S GUIDE

Gemini Omni Flash can create and edit videos with your voice and it feels like the future of multimodal AI

Gemini Omni Flash sounds like it’ll be an essential new AI content creation tool…

11 views ·
#ai#video#technology
TECHMEME

Google launches the Gemini Omni multimodal model, saying it can "create anything from any input", starting with video generation, for Google AI subscribers (Carl Franzen/VentureBeat)

15 views ·
CNET — NEWS

Google Introduces Gemini Omni, a Multimodal AI That Knows the World

Starting with video, Omni will eventually be able to create any output from any input.…

15 views ·
#technology#ai#video
ARXIV CS.AI

TTE-Flash: Accelerating Reasoning-based Multimodal Representations via Think-Then-Embed Tokens

Recent research has demonstrated that Universal Multimodal Embedding (UME) benefits significantly from Chain-of-Thought (CoT) reasoning. In this paradigm, a generative model produc…

13 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

Learning to Learn from Multimodal Experience

Experience-driven learning has emerged as a promising paradigm for enabling agents to improve from interaction trajectories by accumulating and reusing past experience. However, ex…

11 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

F2IND-IT! -- Multimodal Fuzzy Fake Indian News Detection using Images and Text

Biased manipulation of facts across regional and national media outlets complicates misinformation detection in diverse landscapes like India. This paper introduces a novel multimo…

11 views ·
#artificial intelligence#fake news#media
ARXIV CS.AI

CatalyticMLLM: A Graph-Text Multimodal Large Language Model for Catalytic Materials

Property prediction and inverse structural design of catalytic materials are typically modeled as two independent tasks: the former predicts target properties from given structures…

11 views ·
#artificial intelligence#machine learning#catalytic materials
ARXIV CS.AI

Multimodal Cultural Heritage Knowledge Graph Extension with Language and Vision Models

The preservation and interpretation of cultural heritage increasingly rely on digital technologies, among which Knowledge Graphs (KGs) stand out for their ability to structure vast…

14 views ·
#artificial intelligence#cultural heritage#knowledge graphs
ARXIV CS.AI

EGI: A Multimodal Emotional AI Framework for Enhancing Scrum Master Real-time Self-Awareness

While increasing research focuses on the emotional well-being of agile team members, a significant gap remains in emotion monitoring studies for Scrum Masters and meeting organizer…

14 views ·
#artificial intelligence#scrum#emotions
ARXIV CS.AI

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Multimodal large language models are increasingly used as agent backbones that understand multimodal inputs, plan retrieval actions, invoke external tools, and reason over retrieve…

14 views ·
#artificial intelligence#machine learning#computer vision
ARXIV CS.AI

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

Multimodal large language models (MLLMs) often fail to transfer safety capabilities learned in the text modality to semantically equivalent non-text inputs, revealing a persistent …

11 views ·
#artificial intelligence#safety#machine learning
HUGGINGFACE

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

Join the discussion on this paper page…

14 views ·
#artificial intelligence#machine learning#computer vision
ARXIV CS.AI

Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation

We introduce Agent4POI, the first POI recommendation framework that generates context-conditioned multimodal representations at recommendation time, rather than relying on static P…

13 views ·
#artificial intelligence#information retrieval#multimodal systems
ARXIV CS.AI

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics …

12 views ·
#machine learning#artificial intelligence#data science
ARXIV CS.AI

ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

Multimodal large language models (MLLMs) may memorize sensitive cross-modal information during pretraining, making machine unlearning (MU) crucial. Existing methods typically evalu…

13 views ·
#machine learning#artificial intelligence#language models
R/HOMELAB

Fibra monomodale (SMF) VS multimodale (MMF) Consigli.

20 views ·
DEV.TO (TOP)

Gemma.Witness - Offline Multimodal Evidence Capture with Gemma 4

An offline-first multimodal evidence capture system built on Gemma 4, designed for environments where cloud access and chain-of-custody assumptions fail.…

9 views ·
#technology#evidence#offline
DEV.TO (TOP)

Gemma 4: From Raspberry Pi to Research Workstation — One Architecture, No Quality Compromise

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 TL;DR — Gemma 4 is four...…

15 views ·
#ai models#machine learning#google deepmind
TECHMEME

Nvidia launches Nemotron 3 Nano Omni, an open multimodal model with a 30B-A3B hybrid MoE architecture; the Nemotron 3 family saw 50M+ downloads in the past year (Kyt Dotson/SiliconANGLE)

Kyt Dotson / SiliconANGLE : Nvidia launches Nemotron 3 Nano Omni, an open multimodal model with a 30B-A3B hybrid MoE architecture; the Nemotron 3 family saw 50M+ downloads in the p…

10 views ·
HUGGINGFACE

Xiaomi open-sources MiMo-V2.5: 311B A15B 1M-context omnimodal model

We’re on a journey to advance and democratize artificial intelligence through open source and open science.…

18 views ·
#artificial intelligence#machine learning#open source