WeSearch
Hub / Search / ai datasets
SEARCH · AI DATASETS

Results for "ai datasets".

17 stories match your query across our 700+ source catalog. Ranked by relevance and recency.

17 results for "ai datasets"

DEV COMMUNITY

JSONL Explained: The Line-by-Line Format Powering AI Datasets

You're trying to load a 500,000-record dataset into your script. You reach for JSON — it's universal,...…

· 3 views
ARXIV.ORG

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Industrial maintenance environments increasingly rely on AI systems to assist operators in understanding asset behavior, diagnosing failures, and evaluating interventions. Although large language mode…

· 3 views
ARXIV.ORG

MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation

The rapid proliferation of Generative AI necessitates rigorous documentation standards for transparency and governance. However, manual creation of Model and Data Cards is not scalable, while automate…

· 3 views
ARXIV.ORG

Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion

Multimodal affective computing analyzes user-generated social media content to predict emotional states. However, a critical gap remains in understanding how visual content shapes cognitive interpreta…

· 3 views
ARXIV.ORG

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision…

· 3 views
ARXIV.ORG

Agentic AI platforms for autonomous training and rule induction of human-human and virus-human protein-protein interactions

We instruct an AI agent to construct two separate agentic AI platforms: one for autonomous training of predictive ML models for human-human and virus-human PPI, and the other for inducing explicit gen…

· 2 views
ARXIV.ORG

An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness

Artificial Intelligence and Machine Learning (AI/ML) models used in clinical settings are increasingly deployed to support clinical decision-making. However, when training data become stale due to cha…

· 3 views
ARXIV.ORG

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

The increasing reliance on Large Language Models (LLMs) across diverse sectors highlights the need for robust domain-specific and language-specific evaluation datasets; however, the collection of such…

· 3 views
REDDIT

LabelSets — open quality standard for AI training data (LQS v3.1) [D]

Built a third-party quality rating system for ML datasets. Multi-oracle (7 scorers across 5 algorithm families), conformal prediction intervals on downstream F1, Ed25519-signed certs, and a contaminat…

· 6 views
ARXIV.ORG

Does Point Cloud Boost Spatial Reasoning of Large Language Models?

3D Large Language Models (LLMs) leveraging spatial information in point clouds for 3D spatial reasoning attract great attention. Despite some promising results, the role of point clouds in 3D spatial …

· 3 views
ARXIV.ORG

AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting

Accurate long-term time series forecasting (LTSF) requires the capture of complex long-range dependencies and dynamic periodic patterns. Recent advances in frequency-domain analysis offer a global per…

· 3 views
ARXIV.ORG

Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

Large language models (LLMs) have shown strong performance on legal benchmarks, including multiple-choice components of bar exams. However, their capacity for generating open-ended legal reasoning in …

· 3 views
ARXIV.ORG

Does Machine Unlearning Preserve Clinical Safety? A Risk Analysis for Medical Image Classification

The application of Deep Learning in medical diagnosis must balance patient safety with compliance with data protection regulations. Machine Unlearning enables the selective removal of training data fr…

· 2 views
ARXIV.ORG

The Kerimov-Alekberli Model: An Information-Geometric Framework for Real-Time System Stability

This study introduces the Kerimov-Alekberli model, a novel information-geometric framework that redefines AI safety by formally linking non-equilibrium thermodynamics to stochastic control for the eth…

· 3 views
ARXIV.ORG

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

Large Vision-Language Models (LVLMs) frequently suffer from hallucinations. Existing preference learning-based approaches largely rely on proprietary models to construct preference datasets. We identi…

· 2 views
ARXIV.ORG

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of el…

· 3 views
ARXIV.ORG

Microsoft TRELLIS.2: An Open-Source, 4B-Parameter, Image-to-3D Model [pdf]

Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture assets with com…

· 4 views