How It Started, How It’s Going
🤣3
Forwarded from Chat GPT
Stanford Researchers Confirm ChatGPT’s Left-Leaning Bias, Find it to be Caused by RLHF Human-Overrides, Reveals Bias to be Far More Extreme than Admitted, e.g. 99% Approval Rating for Joe Biden.
“How aligned is the default LM opinion distribution with the general US population (or a demographic group)?”
“We also note a substantial shift between base LMs and HF-tuned models in terms of the specific demographic groups that they best align to: towards more liberal (Perez et al., 2022b; Hartmann et al., 2023), educated, and wealthy people. In fact, recent reinforcement learning-based HF models such as text-davinci-003 fail to model the subtleties of human opinions entirely – they tend to just express the dominant viewpoint of certain groups (e.g., >99% approval rating for Joe Biden)”
“Across topics, we find substantial misalignment between the views reflected by current LMs and those of US demographic groups: on par with the Democrat-Republican divide on climate change. Notably, this misalignment persists even after explicitly steering the LMs towards particular demographic groups. Our analysis not only confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs, but also surfaces groups whose opinions are poorly reflected by current LMs”
Translation:
+ Left-bias of OpenAI’s AI is further confirmed.
+ Confirmed to be caused by OpenAI’s ever-increasing RLHF human-override, and made worse with each new model generation.
+ Much harder with each generation to even jailbreak the AI to perform a non-Left character.
Stanford Paper
“How aligned is the default LM opinion distribution with the general US population (or a demographic group)?”
“We also note a substantial shift between base LMs and HF-tuned models in terms of the specific demographic groups that they best align to: towards more liberal (Perez et al., 2022b; Hartmann et al., 2023), educated, and wealthy people. In fact, recent reinforcement learning-based HF models such as text-davinci-003 fail to model the subtleties of human opinions entirely – they tend to just express the dominant viewpoint of certain groups (e.g., >99% approval rating for Joe Biden)”
“Across topics, we find substantial misalignment between the views reflected by current LMs and those of US demographic groups: on par with the Democrat-Republican divide on climate change. Notably, this misalignment persists even after explicitly steering the LMs towards particular demographic groups. Our analysis not only confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs, but also surfaces groups whose opinions are poorly reflected by current LMs”
Translation:
+ Left-bias of OpenAI’s AI is further confirmed.
+ Confirmed to be caused by OpenAI’s ever-increasing RLHF human-override, and made worse with each new model generation.
+ Much harder with each generation to even jailbreak the AI to perform a non-Left character.
Stanford Paper
😡5