Robots from simulation into reality

Transfer Learning for Robotics with Generative Adversarial Networks

6 min readMar 6, 2020

As a kid, I remember watching Star Wars and thinking “damn, those robots are crazy-sophisticated”. They were smart and able to walk in various different environments. Contrary to most people, this didn’t scare me but instead got me really excited … but I was shocked to know that machines aren’t actually that intelligent in real life (yet).

Throughout our human evolution, we’ve seen humans gain and apply general knowledge to problem solving in a wide range of situations. Having general intelligence enables us to combine broad cognitive abilities and to be able to flip in different situations to solve problems. This isn’t fully possible with machines today but a field of study is trying to crack it.

Artificial General Intelligence (AGI)

This is what we’re trying to accomplish with artificial intelligence and robotics, in a field called Artificial General Intelligence (AGI) which focuses on generalizing human cognitive abilities in software. So far we’ve tried various different techniques in unsupervised learning, generative models and reinforcement learning.

I’ve personally done some work with RL using OpenAI’s Robotic environment to simulation a robotic arm to lift, slide, move objectives to defined targets. You can learn more about that here.

But what’s getting me excited recently is the field of unsupervised learning, specifically generative models and being able to train robots to think and perform more like humans; using past experience to make decisions.

There are lots of companies working on this type of research, namely here are two examples in robotics:

Vicarious.ai: applying a neuroscience approach using generative probabilistic models.
SingularityNET: research in unsupervised learning, neural-symbolic systems and designing a language for a cognitive architecture.

Transfer Learning for Robot Control

One of the key aspects of human intelligence is the capacity to adapt to new situations based on similar past experiences.

For example, when I was younger I used to play a lot of badminton 🏸 so it would be easier for me to adapt to playing tennis than a person who has no experience in racket sports.

Being able to transfer knowledge between different task domains is a highly desirable trait of intelligence → it’s what makes us unique as humans and how we’ve progressed cultures and society. But still it is extremely difficult to come up with a general algorithm to allow a computer to do this exact thing effectively without explicitly programming it.

I’ll be explaining how we can train robots in simulation and transfer the knowledge over to a robot in real life within different environments (I will highlight only some of the key architectures and how we can do this).

But before we get into that, we need to understand why we can’t do this today. This is because:

robots can go out of control during the training stage.
large amounts of time is needed to supervise the robot and create relevant training data.
simulation environments often come with the capability of providing annotations to the training data (e.g. joint states, object orientation/pose etc.) — these aren’t always accurate or don’t translate perfectly in real life.

Essentially, this is a problem of transferring knowledge from a source domain → to a related target domain.

In this article, I’ll be walking you through how we can control a 3-degree of-freedom (3DOF) robot arm to move towards a cube (a regression problem, joint velocities will be predicted using an image input) + with a goal of reducing the dependence on target domain data.

We can do this by:

image translation using generative adversarial networks (GANs) to create image pairs between source and target domains in order to easily generate more target domain data from source domain data
training GANs for image translation by minimizing the task loss of generated target domain images.

Simulation Set-Up

The interesting problem here is to predict joint velocities for a 3DOF arm from an input image. Joint velocities is a vector that describes the relative angular velocity of one segment relative to another segment.

V-REP provides a remote API as a way to communicate with V-REP’s simulation environment. Using the remote API, the inverse kinematics module is able to provide a list of joint states. This list of joint states can be followed iteratively to create the behaviour of the robot arm moving towards a cube.

These are 16 steps to reach the cube (gets closer from left to right):

The joint states become → joint velocities. To know the effect of moving the arm at each step:

the joint velocities = next joint state - current joint state.

GANs for Image Translation and Pairing across Domains

Generative adversarial networks (GANs) have been used to generate pictures of a particular domain from random vectors. Our input is an image from the source domain and the output is an image from the target domain. In our case, that means our source and target domain images to be semantically related.

GANs for translation and pairing across domains.

We can train GANs to perform image-to-image translation without the need of source-target domain pairs x-y.

We learn the mapping function G : X → Y as well, the inverse mapping function F : Y → X is learnt. A separate discriminator which learns to differentiate between images from domain y and the generated images F(y) is necessary to train the inverse mapping function. An additional loss is added to the objective function to utilize the inverse mapping function, it penalizes the L1 loss between the original image and the reconstructed image after being put through the generator and the inverse generator. This is to encourage the generators to learn to map to a related output in a different domain.

Top: target domain image y, middle: generate source domain image F(y), bottom: reconstructed target domain image G(F(y)).

Training Robot with Generated Source-Target Pairs

The labels from the source domain can be used for the paired generated target domain image. We can do this by extracting a feature vector from an image using a neural network.

Example of the source domain and the various target domains being tested on.

Source Domain translation: Source domain images (x), the generated target domain image, G(x), and the reconstructed source domain image F(G(x)) are trained on the task with labels of x. L2 loss between feature layers are minimized.
Target domain translations: The target domain images y, generated source domain image F(y), and reconstructed target domain image G(F(y). L2 loss between feature layers are minimized.

Putting it all together

So far we’ve talked about training a GAN for image translation and pairing across domains.

We can do this by:

training of a neural network fY on a task in the target domain Y by using some number of source-target domain pairs x-y.
generate target domain images G(x) that recreates target domain characteristics.

A possible of limitation of this is that its transformations can be unsuccessful even with a well trained fY with many x-y pairs. We would need improvements in the generator + discriminator architectures.

In this model, there was also some dependence on x-y pairs but in the future we want to reduce the dependency on x-y pairs or even remove the need of it (fully unsupervised model).

Once we crack this, we can build fully unsupervised models that we can use to create intelligent machines!!