Synaptiq.ai

Our Blog

Our Blog

Exploration vs. Exploitation
 Exploring with reinforcement learning

Exploring with reinforcement learning

By Dr. Tim Oates, Chief Data Scientist

There’s a branch of machine learning called reinforcement learning (RL) that deals with making decisions under uncertainty.  It works a little like people do when learning to play a new video game.  You look at the screen, assess the situation, try a few things, see what works and what doesn’t work, and over time get better and better at deciding what to do when.  (The big difference between RL and people is that we use lots of knowledge about how things tend to work in the real world when learning a new video game.)  Combined with deep neural networks, RL is responsible for the recent news you may have read about machines beating the best human Go players and about Google’s DeepMind project building a system that learned to play Atari games.

RL is all about trading off exploration and exploitation.  If I beat level 1-2 in Super Mario Brothers and happily head on to level 1-3, I may never discover the action sequence at the end of that level that warps me directly to world 5.  That’s favoring exploitation over exploration, and missing out on something; I know how to beat 1-2, and beating levels is what you’re supposed to do, so I do what I know.  But exploration can get out of control as well.  I could spend hours looking for a non-existent action sequence that warps me to world 8 at the end of 1-3.  Life is a lot like that.  Do I order my favorite dish at the Indian restaurant because I like it, or try something new that I might like even more (or like a lot less)?

Rather than getting worked up about AI taking over the world, I worry much more about AI and ML going heavy on exploitation and missing out on exploration.  There are AI systems that compose music and paint, but their best works are culled by humans from many of varying quality that are produced.  AI systems learn to help school children learn better, but they are driven by aggregate statistics over many learners.  What happens to the children who learn differently from their peers?  Others have worried for a while now about systems that get better and better at showing us content that we’ve liked in the past, hiding that awesome article that you’d absolutely love because you’ve never clicked on anything like it before.  Facebook shows me people to connect with who are close to my inner circle.  But what about that random person you meet who turns out to be a great friend?

Don’t get me wrong, I’m really excited about the good that AI and ML can do, for businesses and for society as a whole.  I’m just saying that we, and our algorithms, need to do something unexpected every so often.