WeSearch

AI Wellbeing: Measuring and improving the functional pleasure and pain of AIs

·2 min read · 0 reactions · 0 comments · 9 views
AI Wellbeing: Measuring and improving the functional pleasure and pain of AIs

Center for AI Safety.

Original article
Ai-wellbeing
Read full at Ai-wellbeing →
Opening excerpt (first ~120 words) tap to expand

AI drugsWhat are the limits of what AIs like and dislike?We can create euphorics (happy drugs) by maximizing a model's expressed preferences. The same procedure, inverted, yields dysphorics (sad drugs), which warrant real caution.The image and soft-prompt versions of these drugs also shift self-report and response sentiment, which serves as evidence that these independent metrics reflect a shared underlying construct. The training signal comes only from forced-choice preferences.How we train AI drugsInterpretable text stringsWe use RL to train text that models find maximally positive or negative in a hypothetical comparison.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Ai-wellbeing.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Ai-wellbeing