WeSearch

Usual implementation of attention transformers (SDPA) is kind of bad, actually

262588213843476· ·20 min read · 0 reactions · 0 comments · 14 views
#artificial intelligence#machine learning#technology
Usual implementation of attention transformers (SDPA) is kind of bad, actually
⚡ TL;DR · AI summary

The article critiques the standard transformer architecture (SDPA) used in machine learning, arguing that it may not be as effective as commonly believed. The author suggests that large AI companies promote expensive models to maintain their competitive advantage. While not dismissing SDPA entirely, the piece raises questions about its necessity and hints at the potential for better alternatives in the future.

Key facts
Original article
Gist · 262588213843476
Read full at Gist →
Opening excerpt (first ~120 words) tap to expand

Introduction I was writing a note to a friend that mentioned my tedious opinions on “AI” discourse. It veered off into my usual argument that big “AI” companies are shaping the industry ecosystem to their own ends by setting up a situation where expensive-to-run models are overvalued. I think they’re doing this because they have a competitive advantage in that tier of the market, having bought (time on) a lot of GPUs. It’s like how a company that owns diamond mines will probably promote the idea that large, mined diamonds are important and valuable, and that there’s something off about running a sub-industrial mine or lab-growing diamonds. You can do this without lying at all, but I still dislike it. Large mined diamonds here are $O(n^2)$ models.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Gist.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Gist