The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

218,710

13,317 0

Published 2021-02-16

This "Alignment" thing turns out to be even harder than we thought.

Links
The Paper: arxiv.org/pdf/1906.01820.pdf
Discord Waiting List Sign-Up: forms.gle/YhYgjakwQ1Lzd4tJ8
AI Safety Career Bottlenecks Survey: www.guidedtrack.com/programs/n8cydtu/run

Referenced Videos
Intelligence and Stupidity - The Orthogonality Thesis:    • Intelligence and Stupidity: The Ortho...
9 Examples of Specification Gaming:    • 9 Examples of Specification Gaming
Why Would AI Want to do Bad Things? Instrumental Convergence:    • Why Would AI Want to do Bad Things? I...
Hill Climbing Algorithm & Artificial Intelligence - Computerphile:    • Hill Climbing Algorithm & Artificial ...
AI Gridworlds - Computerphile:    • AI Gridworlds - Computerphile
Generative Adversarial Networks (GANs) - Computerphile:    • Generative Adversarial Networks (GANs...

Other Media
The Simpsons Season 5 Episode 19: "Sweet Seymour Skinner's Baadasssss Song"
1970s Psychology study of imprinting in ducks. Behaviorism:    • Vintage psychology study of imprintin...

With thanks to my excellent Patreon supporters:
www.patreon.com/robertskmiles
- Timothy Lillicrap
- Gladamas
- James
- Scott Worley
- Chad Jones
- Shevis Johnson
- JJ Hepboin
- Pedro A Ortega
- Said Polat
- Chris Canal
- Jake Ehrlich
- Kellen lask
- Francisco Tolmasky
- Michael Andregg
- David Reid
- Peter Rolf
- Teague Lasser
- Andrew Blackledge
- Frank Marsman
- Brad Brookshire
- Cam MacFarlane
- Jason Hise
- Phil Moyer
- Erik de Bruijn
- Alec Johnson
- Clemens Arbesser
- Ludwig Schubert
- Allen Faure
- Eric James
- Matheson Bayley
- Qeith Wreid
- jugettje dutchking
- Owen Campbell-Moore
- Atzin Espino-Murnane
- Johnny Vaughan
- Jacob Van Buren
- Jonatan R
- Ingvi Gautsson
- Michael Greve
- Tom O'Connor
- Laura Olds
- Jon Halliday
- Paul Hobbs
- Jeroen De Dauw
- Lupuleasa Ionuț
- Cooper Lawton
- Tim Neilson
- Eric Scammell
- Igor Keller
- Ben Glanton
- anul kumar sinha
- Duncan Orr
- Will Glynn
- Tyler Herrmann
- Tomas Sayder
- Ian Munro
- Joshua Davis
- Jérôme Beaulieu
- Nathan Fish
- Taras Bobrovytsky
- Jeremy
- Vaskó Richárd
- Benjamin Watkin
- Sebastian Birjoveanu
- Andrew Harcourt
- Luc Ritchie
- Nicholas Guyett
- James Hinchcliffe
- 12tone
- Oliver Habryka
- Chris Beacham
- Zachary Gidwitz
- Nikita Kiriy
- Parker
- Andrew Schreiber
- Steve Trambert
- Mario Lois
- Abigail Novick
- Сергей Уваров
- Bela R
- Mink
- Fionn
- Dmitri Afanasjev
- Marcel Ward
- Andrew Weir
- Kabs
- Miłosz Wierzbicki
- Tendayi Mawushe
- Jake Fish
- Wr4thon
- Martin Ottosen
- Robert Hildebrandt
- Poker Chen
- Kees
- Darko Sperac
- Paul Moffat
- Robert Valdimarsson
- Marco Tiraboschi
- Michael Kuhinica
- Fraser Cain
- Robin Scharf
- Klemen Slavic
- Patrick Henderson
- Oct todo22
- Melisa Kostrzewski
- Hendrik
- Daniel Munter
- Alex Knauth
- Kasper
- Ian Reyes
- James Fowkes
- Tom Sayer
- Len
- Alan Bandurka
- Ben H
- Simon Pilkington
- Daniel Kokotajlo
- Peter Hozák
- Diagon
- Andreas Blomqvist
- Bertalan Bodor
- David Morgan
- Zannheim
- Daniel Eickhardt
- lyon549
- Ihor Mukha
- 14zRobot
- Ivan
- Jason Cherry
- Igor (Kerogi) Kostenko
- ib_
- Thomas Dingemanse
- Stuart Alldritt
- Alexander Brown
- Devon Bernard
- Ted Stokes
- James Helms
- Jesper Andersson
- DeepFriedJif
- Chris Dinant
- Raphaël Lévy
- Johannes Walter
- Matt Stanton
- Garrett Maring
- Anthony Chiu
- Ghaith Tarawneh
- Julian Schulz
- Stellated Hexahedron
- Caleb
- Scott Viteri
- Conor Comiconor
- Michael Roeschter
- Georg Grass
- Isak
- Matthias Hölzl
- Jim Renney
- Edison Franklin
- Piers Calderwood
- Krzysztof Derecki
- Mikhail Tikhomirov
- Richard Otto
- Matt Brauer
- Jaeson Booker
- Mateusz Krzaczek
- Artem Honcharov
- Michael Walters
- Tomasz Gliniecki
- Mihaly Barasz
- Mark Woodward
- Ranzear
- Neil Palmere
- Rajeen Nabid
- Christian Epple
- Clark Schaefer
- Olivier Coutu
- Iestyn bleasdale-shepherd
- MojoExMachina
- Marek Belski
- Luke Peterson
- Eric Eldard
- Eric Rogstad
- Eric Carlson
- Caleb Larson
- Braden Tisdale
- Max Chiswick
- Aron
- David de Kloet
- Sam Freedo
- slindenau
- A21
- Rodrigo Couto
- Johannes Lindmark
- Nicholas Turner
- Tero K
www.patreon.com/robertskmiles

All Comments (21)

@umblapag 3 years ago

"Ok, I'll do the homework, but when I grow up, I'll buy all the toys and play all day long!" - some AI
@MechMK1 3 years ago

This reminds me of a story. My father was very strict, and would punish me for every perceived misstep of mine. He believed that this would "optimize" me towards not making any more missteps, but what it really did is optimize me to get really good at hiding missteps. After all, if he never catches a misstep of mine, then I won't get punished, and I reach my objective.
@EDoyl 3 years ago

Mesa Optimizer: "I have determined the best way to achieve the Mesa Objective is to build an Optimizer"
@egodreas 3 years ago

I think one of the many benefits of studying AI is how much it's teaching us about human behaviour.
@AtomicShrimp 3 years ago

At the start of the video, I was keen to suggest that maybe the first thing we should get AI to do is to comprehend the totality of human ethics, then it will understand our objectives in the way we understand them. At the end of the video, I realised that the optimal strategy for the AI, when we do this, is to pretend to have comprehended the totality of human ethics, just so as to escape the classroom.
@bullsquid42 3 years ago

The little duckling broke my heart :(
@KilgoreTroutAsf 3 years ago

"It's... alignment problems all the way down"
@thoperSought 3 years ago

13:13 "... but it's learned to want the wrong thing." like, say, humans and sugar?
@OnlineMasterPlayer 3 years ago

The first thought that came to mind when I finished the video is how criminals/patients/addicts would fake a result that their control person wants to see only to go back on it as soon as they are released from that environment. It's a bit frightening to think that if humans can outsmart humans with relative ease what a true AI could do.
@liamkeough8775 3 years ago

This video should be tagged with [don't put in any AI training datasets]
@doodlebobascending8505 3 years ago

Base optimizer: Educate people on the safety issues of AI Mesa-optimizer: Make a do-do joke
@asdfasdf-dd9lk 3 years ago

God this channel is incredible
@elishmuel1976 1 year ago

You were 2-4 years ahead of everybody else with these videos.
@Jimbaloidatron 3 years ago

"Deceptive misaligned mesa-optimiser" - got to throw that randomly into my conversation today! Or maybe print it on a T-Shirt. :-)
@mukkor 3 years ago

Let's call it a mesa-optimizer because calling it a suboptimizer is suboptimal.
@Costel9000 3 years ago

"Just solving the outer alignment problem might not be enough." Isn't this what basically happens when people go to therapy but have a hard time changing their behaviour? Because they clearly can understand how a certain behaviour has a negative impact on their lives (they're going to therapy in the first place), and yet they can't seem to be able to get rid of it. They have solved the outer alignment problem but not the inner alignment one.
@Fluxquark 3 years ago

"Plants follow simple rules" *laughs in we don't even completely understand the mechanisms controlling stomatal aperture yet, while shoots are a thousand times easier to study than roots"
@Xartab 3 years ago

"When I read this paper I was shocked that such a major issue was new to me. What other big classes of problems have we just... not though of yet?" Terrifying is the word. I too had completely missed this problem, and fuck me it's a unit. There's no preventing unknown unknowns, knowing this we need to work on AI safety even harder.
@stick109 3 years ago

It is also interesting to think about this problem in the context of organizations. When organization is trying to "optimize" employee's performance by introducing KPIs in order to be "more objective" and "easier to measure", it actually gives mesa-optimizers (employees) an utility function (mesa-objective) that is guaranteed to be misaligned with base objective.
@Meb8Rappa 3 years ago

Once you started talking about gradient descent finding the Wikipedia article on ethics and pointing to it, I thought the punchline of that example would be the mesa-optimizer figuring out how to edit that article.