WeSearch

Slop Bucket Idea – a dataset of AI slop (train AI what not to do)

·1 min read · 0 reactions · 0 comments · 18 views
#artificial intelligence#data#research#Microsoft#arXiv
⚡ TL;DR · AI summary

The article discusses the prevalence of low-quality AI-generated content, often referred to as 'AI slop.' It proposes the creation of a public dataset to catalog and explain these issues, potentially aiding in the training of better language models. The author expresses uncertainty about the technical feasibility of this idea.

Key facts
Original article
Ycombinator
Read full at Ycombinator →
Opening excerpt (first ~120 words) tap to expand

I just had this idea, you read it all the time AI slop is so prevalent people are getting banned for a year for submitting science papers to arXiv with it, moans of angst from developers, even Microsoft doing its own study where AI degrades the quality of simple documents, and the beloved em-dash.I don't really have the know-how or the time but it occurred to me, if we created a public data set that could be submitted to publicly, we could catalog and organize all the AI slop, the different types, with explanations about why it is slop and why not to do it, and then train a large language model using this data set included, to help correct itself.I don't really know the technical details of training a large language model,is this even possible?

Excerpt limited to ~120 words for fair-use compliance. The full article is at Ycombinator.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Ycombinator