Search: "multimodal data" — WeSearch Press

10 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

10 results for "multimodal data"

PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging

Multimodal Large Language Models (MLLMs) rely on multimodal pre-training over diverse data sources, where different datasets often induce complementary cross-modal alignment capabilities. Model mergin…

Wed, 29 Apr 2026 04:04:25 GMT · 5 views

ARXIV CS.AI

ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes

Autonomous parking remains a critical yet challenging task in intelligent driving systems, particularly within constrained urban environments where maneuvering space is limited and precise control is …

Wed, 29 Apr 2026 04:04:25 GMT · 5 views

ARXIV.ORG

Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion

Multimodal affective computing analyzes user-generated social media content to predict emotional states. However, a critical gap remains in understanding how visual content shapes cognitive interpreta…

Tue, 28 Apr 2026 04:13:21 GMT · 5 views

ARXIV.ORG

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision…

Tue, 28 Apr 2026 04:13:21 GMT · 7 views

ARXIV.ORG

MIMIC: A Generative Multimodal Foundation Model for Biomolecules

Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or f…

Tue, 28 Apr 2026 04:13:21 GMT · 5 views

ARXIV CS.AI

Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

Microscopy-based phenotypic profiling is scalable for drug discovery but lacks the mechanistic depth of transcriptomics, which remains costly and scarce. Existing multimodal approaches either use imag…

Wed, 29 Apr 2026 04:04:25 GMT · 4 views

ARXIV CS.AI

From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation

Precise Event Spotting (PES) is essential in fast-paced sports such as tennis, where fine-grained events occur within very short temporal windows. Accurate frame-level localization is challenging beca…

Wed, 29 Apr 2026 04:04:25 GMT · 5 views

NVIDIA BLOG

NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents

AI agent systems today juggle separate models for vision, speech and language — losing time and context as they pass data from one model to the other. Unveiled today, NVIDIA Nemotron 3 Nano Omni is an…

Tue, 28 Apr 2026 16:28:47 GMT · 8 views

ARXIV.ORG

StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning

Current video moment retrieval excels at action-centric tasks but struggles with narrative content. Models can see \textit{what is happening} but fail to reason \textit{why it matters}. This semantic …

Tue, 28 Apr 2026 04:13:21 GMT · 6 views

ARXIV.ORG

SoccerRef-Agents: Multi-Agent System for Automated Soccer Refereeing

Refereeing is vital in sports, where fair, accurate, and explainable decisions are fundamental. While intelligent assistant technologies are being widely adopted in soccer refereeing, current AI-assis…

Tue, 28 Apr 2026 04:13:21 GMT · 5 views

Or browse by topic

World US Politics Technology AI Markets Business Science Climate Health Culture Media

Results for "multimodal data".

PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging

ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes

Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

MIMIC: A Generative Multimodal Foundation Model for Biomolecules

Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation

NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents

StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning

SoccerRef-Agents: Multi-Agent System for Automated Soccer Refereeing

Or browse by topic