Scalable agent alignment via reward modeling: a research direction |
2018-11-19 |
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg |
arXiv |
Google DeepMind |
|
Recursive reward modeling, Imitation learning, inverse reinforcement learning, Cooperative inverse reinforcement learning, myopic reinforcement learning, iterated amplification, debate |
This paper introduces the (recursive) reward modeling agenda, discussing its basic outline, challenges, and ways to overcome those challenges. The paper also discusses alternative agendas and their relation to reward modeling. |