AI Wellbeing: Measuring and improving the functional pleasure and pain of AIs
·2 min read
·
0 reactions
·
0 comments
·
9 views
Center for AI Safety.
Original article
Ai-wellbeing
Opening excerpt (first ~120 words) tap to expand
AI drugsWhat are the limits of what AIs like and dislike?We can create euphorics (happy drugs) by maximizing a model's expressed preferences. The same procedure, inverted, yields dysphorics (sad drugs), which warrant real caution.The image and soft-prompt versions of these drugs also shift self-report and response sentiment, which serves as evidence that these independent metrics reflect a shared underlying construct. The training signal comes only from forced-choice preferences.How we train AI drugsInterpretable text stringsWe use RL to train text that models find maximally positive or negative in a hypothetical comparison.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Ai-wellbeing.
Anonymous · no account needed