15 stories tagged with #datasets, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Datasets"
Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection
In nature, events that affect some individuals or groups but not others constitute an implicit intervention and are known as natural experiments. For example, the COVID-19 pandemic…
Before we spend months processing open-source robotics datasets, tell us why this is a bad idea [D]
noisekit - CLI for generating realistic degraded speech datasets for ASR benchmarking [P]
Silicon Valley VC Backs Startup That Gathers AI Datasets From Head-Mounted Cameras on Workers in India
Human Archive believes its technology "will become foundational infrastructure for automating manual labor."…
How are people doing prompt optimization with datasets safely?
Auditing Model Bias with Balanced Datasets with Mimesis
Learn how to use Mimesis library to generate a balanced, counterfactual dataset that helps analyze potential bias in your models.…
Testing a Cold War-Era AI on Satellite Image Datasets
Data Fundamentals Primer for Learning LLM
The minimum data plumbing every ML pipeline needs — samples, features and labels, the train/val/test split, text encoding (ASCII and UTF-8), and preprocessing.…
Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases
This work investigates the ``small-vs-large gap'', where repeating on fewer samples can lead to compute saving during training compared to using a larger dataset. This is observed …
I benchmarked my AI agent runtime firewall against 3 public academic datasets — here are the honest results including where it fails
AI tool fuses five satellite datasets to help track harmful algal blooms
GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction
Existing affective-computing, social-signal-processing, and meeting corpora capture important parts of human interaction, but they rarely support analysis of affect in co-located g…
Built an address-level Calgary civic data explorer by connecting multiple public datasets
Take-Two's CEO says AI's not in the business of making hits, 'datasets by their very nature are backward looking', but that doesn't mean AI can't be 'super helpful'
"Clones don't sell".…