6 Insights About The Future Of Learning Inspired By How Engineers Train Robots

639

Tell an educator the names of the talks in the OpenAI Robotics Symposium 2019 conference. Don’t give them any context. What would they think about “Learning Dexterity,” “Learning from Play” or “What Should be Learned”? Even “Social-Emotional Intelligence in Human-Robot Interactions” sounds not only interesting, but prescient. And that goes for teachers, instructional designers and school administrators.

WIRIS

Could some of the main ideas and experimental settings become lessons worth considering when it comes to skilling the “wet robots” of which we’re in charge? What are the limitations, and the big no-nos involving teachers? Let’s try to find out.

1. It takes 10,000 hours to gain a new skill in the first try

Extensive teaching of simple tasks through watching and repetition, then practiced several at a time on randomly generated, rich environments, virtual or not, allows to evaluate and refine the creation of “policies” that become instantly ready for real-life use by the learner. Are we talking machines or humans?

The main technological highlight, and the main reason for the conference, is “One-Shot Imitation Learning.” The process (also name of a paper) aims to show robots one example of a task. Then, after one try, the robot is able to perform the task at the same level of skill seen.

As impressive as this might seem, there is one fact missing from the abstract. It took thousands of hours to create, test and train the neural networks capable to learn in one shot. The “teaching” (train and test) not only focused on their performance. It also evaluated the “policies” created, which can be thought of as the principles that guides action in real-life. While the lessons use simple tasks, extensive testing follows in practice, which requires the application of several policies acquired previously.

The policies, by the way, needed thousands of distributed CPU cores and are the core of OpenAI’s reinforcement learning algorithms “because of its ease of use and good performance.”

One-Shot Imitation Learning: Teach Task A and Task B for thousands of hours, consolidate policies for a thousand hours more. Ta-da: The robot learns stuff in one shot!

The following gives some basic ideas for what a learning inspired on this process would look like:

  • Teach simple lessons, one task at a time
  • Make students practice them broadly and extensively
  • Combine tasks in practice
  • Nurture their ability to grasp real-life problems fast

2. Watch and Learn (because we still don’t know any better)

Learning Dexterity by Wojciech Zaremba (PDF). The robots are not explicitly told what to learn. Rather, they are told when the skill (policy) has been acquired.

You don’t have to look up the definition of dexterity to gather how difficult it is to grasp it or separate it into components or steps. However, humans are capable to transfer dexterity skills to one, with limitations and through largely heuristic (educated guessing) processes.

3. Blur the division between supervised (classroom) and unsupervised (real life) learning. Play if you must

For decades now, the research into machine learning algorithms has been forcefully fit classified into supervised and unsupervised learning. Definite challenges have been added lately. Semi-supervised learning, for example, would add labeled data only as needed. Only if the algorithm underperforms, the training is intervened.

Learning From Play by Pierre Sermanet, Google Brain (PDF). The robot is not given all the goals from the start, possibly delaying their achievement, but making each individual attempt more valuable. It is also easier and potentially less costly to set up.

By introducing the idea of play, we arrive at a new concept: Self-Supervised Learning. It is a more powerful concept in human than in machine learning, because us human beings can define and refine goals autonomously.

This is, by the way, an essential difference between “Play” and “Game.” In the latter, following the rules to accomplish a given goal matter most, whereas in playing they can be seen as excuses to move forward with the exploration. Another possible difference is the preference for continuous experiences in playing, and discrete (step-by-step) in games.

In any case, it all converges to one idea: Learning does not have to be merely about teaching which rules to follow. It can provide tools for the learner to understand, adopt, modify, drop and adopt their own rules.

4. Nevermind the learning styles: Use them all at the same time

One thing you can never accomplish while teaching humans: Instructing them what to disregard. It is fairly simple to choose the inputs of a robotic neural network. You can deliberately manipulate a learning process by changing the importance of certain data points or streams; or tweak the learning after it’s done.

You cannot do that with human beings.

When we talk about technology, personalization and interaction, there should be a clear disclaimer upfront. Learning is inevitably interactive, and its outcomes are, by necessity, personal. What we call with those names night be better referenced as “Interactivity-modifying technology” and “Personalization-aware learning.”

5. Visualize the learning process, even if it makes it all go slower

A distinctive trait across OpenAI’s project is its “flashiness.” Robots that Learn were no exception.

By devoting the time to make proper visualizations not only of the outcomes, but the learning processes themselves, the research performed by OpenAI stands a far better change to be seen, first of all. Eventually, it should lead to be looked into more critically, as it already started to happen with the case of GPT-2; and its useful findings, if any, more likely to be put in practice.

Pedagogically speaking, visualization is essential. Or put more poetically, “Beauty is fundamental.”

6. VR, or no VR?

Imagine the following VR game\experience. You have to complete a mission within a given score, limited by time and other resources. You will follow a “shadow,” or the overlay of an ideal player until your actions closely resemble those of the ideal player. The game is never over, you can play it non-stop to marginally improve your score.

This is the game the neural networks for robots are playing.

Is it a good idea that you or your students play this VR game? If you are open to it, here’s another potential advantage: By building several virtual training cases, researchers reduced the need for real life examples. This can help save resources and speed up the rate of learning. It is referenced across OpenAI as “Domain Randomization.”

It is not a new concept. Games and other simulated environments can be rich generators of training data for machine learning and AI algorithms. Furthermore, the specifications of the data can be decided before hand to make it compatible or compliant from the start.

Translating this idea to human learners, virtual environments for VR learning have the potential to generate tons of data about how students learn, across a larger number of dimensions than to what we had access before. Basically, if you have no training data, just build a visual world, an AI to play around in it, and keep track of everything!

The idea of “Domain Randomization” suggests a process to automatically generate simulation examples by iterating over ranges of situational likelihood. This would enable designers to create countless virtual environments for practice. It is an example of “Procedural Generation.”

About OpenAI

OpenAI LP is an Open Source “capped-profit” organization designed to build general-purpose AI tools and applications. Funded by large tech companies and Silicon Valley VCs. From time to time some of their code is made public on a limited basis.

OpenAI developed GPT-2 and limited its public release.

More resources


eThink LogoOur coverage on Outcomes in LMS and Learning Systems is supported by eThink Education, a Certified Moodle Partner and Platinum Totara Partner that provides a fully-managed LMS experience including implementation, integration, cloud-hosting, and management services. To learn more about eThink, visit ethinkeducation.com.