WeSearch

国外给数据集,国内吹牛逼:锐评女娲马斯克乔布斯Skill

·1 min read · 0 reactions · 0 comments · 6 views
#ai research#open source#data transparency#china tech#ethical ai#EleutherAI#The Pile#LAION#Zenodo#Hugging Face#Musk#Jobs#DeepSeek
国外给数据集,国内吹牛逼:锐评女娲马斯克乔布斯Skill
⚡ TL;DR · AI summary

The article critiques certain Chinese AI projects for lacking genuine openness, emphasizing that while they present elaborate documentation, they often fail to release actual training data. It contrasts this with Western practices where datasets and code are fully shared to enable replication. The author argues that true open-source contribution requires transparency, not just polished narratives.

Key facts
Original article
DEV Community
Read full at DEV Community →
Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3860368) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } GokuScraper悟空爬虫 Posted on May 2 国外给数据集,国内吹牛逼:锐评女娲马斯克乔布斯Skill 国外给数据集,国内吹牛逼:锐评女娲马斯克乔布斯Skill 说句得罪人的话:中国AI圈有些项目,正在重新定义“开源”二字——把README写得像史诗,却连一个原始数据都不敢往外放。 这不是技术差距,是诚意的差距。 一、国外的“开源”是卸了妆见人,咱们的“开源”是化了浓妆念经 国外的AI开源项目,玩的是“交货”。什么叫交货? 你说你开源了个模型,好,数据给我。训练数据的每一行json、每一个csv,全都扔出来。EleutherAI发The Pile,800个G的原始文 本,下载脚本都给你写好——就怕你复现不了。LAION发图文对数据集,不光给数据,连怎么筛掉NSFW内容的脚本都公开。道理很简 单:开源不交数据,就像卖车不给发动机——你他妈让我推着走? 再看国内某些项目,玩的是“交作业”。什么叫交作业? 你点进去一看,data/文件夹是空的,原始语料没有,训练数据没有,标注文件没有。 没有一克米,但README里已经把满汉全席的菜名报完了。…

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV Community.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from DEV Community