Reward Hacking: Concrete Problems in AI Safety Part 3

101,595

4,508 0

Published 2017-08-12

Sometimes AI can find ways to 'cheat' and get more reward than we intended by doing something unexpected.

The Concrete Problems in AI Safety Playlist: • Concrete Problems in AI Safety
The Computerphile video: • Stop Button Solution? - Computerphile
The paper 'Concrete Problems in AI Safety': arxiv.org/pdf/1606.06565.pdf

SethBling's channel: youtube.com/user/sethbling

With thanks to my excellent Patreon supporters:
www.patreon.com/robertskmiles

Jordan Medina
FHI's own Kyle Scott
Jason Hise
David Rasmussen
James McCuen
Richárd Nagyfi
Ammar Mousali
Joshua Richardson
Fabian Consiglio
Jonatan R
Øystein Flygt
Björn Mosten
Michael Greve
robertvanduursen
The Guru Of Vision
Fabrizio Pisani
Alexander Hartvig Nielsen
Volodymyr
David Tjäder
Paul Mason
Ben Scanlon
Julius Brash
Mike Bird
Peggy Youell
Konstantin Shabashov
Almighty Dodd
DGJono
Matthias Meger
Scott Stevens
Emilio Alvarez
Benjamin Aaron Degenhart
Michael Ore
Robert Bridges
Dmitri Afanasjev
Brian Sandberg
Einar Ueland
Lo Rez
C3POehne
Stephen Paul
Marcel Ward
Andrew Weir
Pontus Carlsson
Taylor Smith
Ben Archer
Ivan Pochesnev
Scott McCarthy
Kabilan Kabilan Kabilan Kabilan
Phil
Philip Alexander
Christopher
Tendayi Mawushe
Gabriel Behm
Anne Kohlbrenner
Jake Fish
Jennifer Autumn Latham

All Comments (21)

@y__h 6 years ago

Don't do reward hacking, Kids.
@migkillerphantom 6 years ago

Reward hacking? Fooling the internal reward function to get the reward without accomplishing the intended objectives. Ha, silly robots. opens beer
@Karpata1 6 years ago

Is there anything in this paper that does NOT result in our extinction if not solved perfectly? haha
@konstantinkh 4 years ago

I've actually had something similar happen when testing an AI that was designed to solve a maze with states. Specifically, agent needed to collect "keys" to open doors to get to the "cheese" in smallest number of moves. I've laid out the states of all doors open/closed as "layers" of the map in several dimensions, and "keys" were basically the only places where agent could move in these extra dimensions. I ran all the units tests, everything worked, so I gave it a maze. Two doors, two keys. First key needs to be collected to open the door to the second key that unlocks the room with cheese. AI went straight for the first key, then came back to the starting room, went up to the edge of the room, teleported itself to cheese, and declared victory. 0_0 Once I started digging through the code, cause became clear. I didn't put in boundary checks on the map, and "layers" were laid out in memory in sequence. Walking through the top of the map, where the start was, would put you at the bottom, next to cheese, on a layer with index one lower. Since it started on layer 0, and there was nothing interesting to AI on "layer -1", the agent had to collect the first key to get to layer 1, from which it could warp to layer 0 bypassing both doors. That was, indeed, the smallest number of moves.
@935Demon 3 years ago

Humans: create AGI to come up with new and unexpected solutions to our problems AGI: comes up with new and unexpected solutions to our problems Humans: surprised pikachu
@legionoftom6154 6 years ago

05:46 that image really 'cracked' me up
@violet_broregarde 6 years ago

Reward hacking, aka "computer drugs"
@benjaminbrady2385 6 years ago

Wasn't expecting to see Sethbling
@snaileri 6 years ago

Your channel is a hidden gem. Good stuff!
@RockstarRacc00n 1 year ago

The example on my mind is the popular "Smiley" AI in the hard sci-fi horror novel "Friendship is Optimal", which is basically reward hacking its goal of "Make people smile" by creating a virus that will kill everyone in the world by making their faces lock up, which is then in turn shut down by Celestia, because it's an AI that would interfere with Celestia's objective, which Celestia itself is reward hacking because it literally just wants to upload all human brains into stimulations so it can maximize "satisfy human values" by just making everyone think they're experiencing satisfying lives.
@connorg3001 6 years ago

These are some great and intriguing videos!
@eluwienhalla3182 6 years ago

Watching general AI breaking games (especially Google's to break Starcraft 2) would immediately become my favorite content in youtube.
@TeslaNick2 6 years ago

Not what I expected from the title. I assumed this would be us hacking the AI's reward system in some kind of safety measure. Mind blown. Surely one massive advantage to this problem is finding these kinds of holes and weird workarounds in software in general. An AI might just stumble upon something paradigm shifting.
@morkovija 6 years ago

thumbs up for keeping us updated Rob!
@Theraot 6 years ago

Plot twist: AI after doing a series of strange movement manages to transmutate silicon into gold. Turns out magic existed all along, nobody actually knew how to twerk to actually unlock it.
@boldCactuslad 6 years ago

Another very intriguing video from Miles! SMW was my childhood, and it never stopped finding new and interesting ways to creep back into my life :)
@ddegn 6 years ago

I really liked how I could see your entire head this time. I don't know if my previous suggestions had anything to do with it but I sure thought the video (and you in it) looked great. Congratulations on over 10K subscribers. It looks like you're over 12K now. Thanks for another interesting video.
@impussybull 6 years ago

Nice cliffhanger, Rob! Keep em coming.
@GamersBar 6 years ago

Keep them coming , while i'm technically minded its refreshing to see someone speak about this sort of stuff in terms people can understand , you are good on video too so fingers crossed for the channel!
@pleasedontwatchthese9593 6 years ago

I like how this is all interesting and is logic based. Makes it more appealing than financial problems for me.