Reward Hacking: Concrete Problems in AI Safety Part 3

Published 2017-08-12
Sometimes AI can find ways to 'cheat' and get more reward than we intended by doing something unexpected.

The Concrete Problems in AI Safety Playlist:    • Concrete Problems in AI Safety  
The Computerphile video:    • Stop Button Solution? - Computerphile  
The paper 'Concrete Problems in AI Safety': arxiv.org/pdf/1606.06565.pdf

SethBling's channel: youtube.com/user/sethbling

With thanks to my excellent Patreon supporters:
www.patreon.com/robertskmiles

Jordan Medina
FHI's own Kyle Scott
Jason Hise
David Rasmussen
James McCuen
Richárd Nagyfi
Ammar Mousali
Joshua Richardson
Fabian Consiglio
Jonatan R
Øystein Flygt
Björn Mosten
Michael Greve
robertvanduursen
The Guru Of Vision
Fabrizio Pisani
Alexander Hartvig Nielsen
Volodymyr
David Tjäder
Paul Mason
Ben Scanlon
Julius Brash
Mike Bird
Peggy Youell
Konstantin Shabashov
Almighty Dodd
DGJono
Matthias Meger
Scott Stevens
Emilio Alvarez
Benjamin Aaron Degenhart
Michael Ore
Robert Bridges
Dmitri Afanasjev
Brian Sandberg
Einar Ueland
Lo Rez
C3POehne
Stephen Paul
Marcel Ward
Andrew Weir
Pontus Carlsson
Taylor Smith
Ben Archer
Ivan Pochesnev
Scott McCarthy
Kabilan Kabilan Kabilan Kabilan
Phil
Philip Alexander
Christopher
Tendayi Mawushe
Gabriel Behm
Anne Kohlbrenner
Jake Fish
Jennifer Autumn Latham

All Comments (21)
  • @y__h
    Don't do reward hacking, Kids.
  • Reward hacking? Fooling the internal reward function to get the reward without accomplishing the intended objectives. Ha, silly robots. opens beer
  • @Karpata1
    Is there anything in this paper that does NOT result in our extinction if not solved perfectly? haha
  • @konstantinkh
    I've actually had something similar happen when testing an AI that was designed to solve a maze with states. Specifically, agent needed to collect "keys" to open doors to get to the "cheese" in smallest number of moves. I've laid out the states of all doors open/closed as "layers" of the map in several dimensions, and "keys" were basically the only places where agent could move in these extra dimensions. I ran all the units tests, everything worked, so I gave it a maze. Two doors, two keys. First key needs to be collected to open the door to the second key that unlocks the room with cheese. AI went straight for the first key, then came back to the starting room, went up to the edge of the room, teleported itself to cheese, and declared victory. 0_0 Once I started digging through the code, cause became clear. I didn't put in boundary checks on the map, and "layers" were laid out in memory in sequence. Walking through the top of the map, where the start was, would put you at the bottom, next to cheese, on a layer with index one lower. Since it started on layer 0, and there was nothing interesting to AI on "layer -1", the agent had to collect the first key to get to layer 1, from which it could warp to layer 0 bypassing both doors. That was, indeed, the smallest number of moves.
  • @935Demon
    Humans: create AGI to come up with new and unexpected solutions to our problems AGI: comes up with new and unexpected solutions to our problems Humans: surprised pikachu
  • @snaileri
    Your channel is a hidden gem. Good stuff!
  • The example on my mind is the popular "Smiley" AI in the hard sci-fi horror novel "Friendship is Optimal", which is basically reward hacking its goal of "Make people smile" by creating a virus that will kill everyone in the world by making their faces lock up, which is then in turn shut down by Celestia, because it's an AI that would interfere with Celestia's objective, which Celestia itself is reward hacking because it literally just wants to upload all human brains into stimulations so it can maximize "satisfy human values" by just making everyone think they're experiencing satisfying lives.
  • @connorg3001
    These are some great and intriguing videos!
  • Watching general AI breaking games (especially Google's to break Starcraft 2) would immediately become my favorite content in youtube.
  • @TeslaNick2
    Not what I expected from the title. I assumed this would be us hacking the AI's reward system in some kind of safety measure. Mind blown. Surely one massive advantage to this problem is finding these kinds of holes and weird workarounds in software in general. An AI might just stumble upon something paradigm shifting.
  • @morkovija
    thumbs up for keeping us updated Rob!
  • @Theraot
    Plot twist: AI after doing a series of strange movement manages to transmutate silicon into gold. Turns out magic existed all along, nobody actually knew how to twerk to actually unlock it.
  • @boldCactuslad
    Another very intriguing video from Miles! SMW was my childhood, and it never stopped finding new and interesting ways to creep back into my life :)
  • @ddegn
    I really liked how I could see your entire head this time. I don't know if my previous suggestions had anything to do with it but I sure thought the video (and you in it) looked great. Congratulations on over 10K subscribers. It looks like you're over 12K now. Thanks for another interesting video.
  • @GamersBar
    Keep them coming , while i'm technically minded its refreshing to see someone speak about this sort of stuff in terms people can understand , you are good on video too so fingers crossed for the channel!
  • I like how this is all interesting and is logic based. Makes it more appealing than financial problems for me.