Inverse Kinematics in Robotics using Reinforcement Learning

7 min readJan 26, 2020

We see robotic mechanisms in our daily lives, it’s a huge industry and one that won’t stop appearing in our future. The robotics industry is expected to reach 210 billion U.S. dollars by 2025.

I got interested in understanding how robots work on a fundamental level and how we can advance some of the complex operations today. One of the questions I got curious about was: how do robots know exactly where to go and where to pick up things?

Inverse Kinematics is essentially how robots calculate exactly how they can move their joints to reach a desired target. Currently, this is done through very complex math and takes a lot of computational power.

Later, I’ll be talking about a branch on Machine Learning called Reinforcement Learning to automate this process. Reinforcement Learning has been used by researchers at Google to teach robot arms fine movements such as opening a door:

It’s a really interesting application but before we get into how the Machine Learning side of things work, let’s go through what Inverse Kinematics is and why it’s so useful.

Inverse Kinematics

The typical approach to learning to solve goals in robotics environments is through Inverse Kinematics. Inverse Kinematics is the …. inverse of normal kinematics. This is what we’re trying to achieve with it:

Given an end position for an effector (just a fancy word for finger), what are the forces we need to apply on joints to make the end effector reach it?

This might seem like an easy problem but finding these necessary forces today is very complex. This isn’t a very efficient way to program robots especially if we’re trying to figure out things like:

How the movement of a hip can influence the position of your finger?

Applications of Inverse Kinematics

Inverse Kinematics has been used in a lot of real-world applications other than just bringing a robot from point A to Point B.

Computer Graphics. This is an inverse kinematics system based on a learned model of human poses. Given a set of constraints, this system can produce the most likely pose satisfying those constraints, in realtime. The parameters of the model have all learned automatically; no manual tuning is required for the learning component of the system. This is known as style-based IK.

Robotics Applications. Calculating the angles needed for each joint to reach the end effecter.

Computer-Aided Ergonomics.

Protein science for protein structure prediction.

Traditional Ways of Solving IK

Jacobean solution

One of the first solutions to the Inverse Kinematics problem widely used to see if the humanoid arm could reach the object. This method is very powerful but also has the potential to be computationally expensive and unstable at times.

This is the equation used to get from the initial pose to the target position for the end effector:

T = O + dO

O is a pose vector that represents the initial orientation of every joint and T is the pose vector which represents the final orientation of every joint. dO is the vector that represents the change in orientation for each joint so that the articulated body reaches T from O. For example, O would be (45°, 15°, -60°) in the diagram below.

In this case, only the initial pose vector O is known. We want to find T so that the end effector can reach the target position. To do this, we need to find dO first.

Jacobian methods use an iterative approach in calculating dO (similar to the Gradient Descent Method where you keep iterating with each solution and find the local minimum of the error. Now, the equation becomes:

T = O + dO * h

Where h is just a simulation step that can be tuned.

2. FABRIK (forward and backward reaching Inverse Kinematics) — forward and backward iterative approach. The most common method used in the last decade.

Instead of using rotational angles or matrices, FABRIK instead finds each joint position via locating a point on a line.

We have lengths (l1, l2, l3) , start position (p0) , goal and the previous points in our system (need to know these to know how it can move to the goal — understand the motion).

Assume P3 will reach goal and draw vectors to determine what other points will look like in this case. This is the “backwards” appraoch.

The “forward” approach is starting at P0 and getting a new P1, P2, P3. Our lengths will never change.

All these approaches are still very complex and require a significant amount of computation power. The complexity of this problem is given by the robot’s geometry and the nonlinear trigonometric equations that describe the mapping between the Cartesian space and the joint space.

Reinforcement Learning approach to IK

Using Reinforcement Learning, we can solve goal oriented problems in robotics in a fairly simple way.

For simplicity’s sake, instead of using a hand with multiple fingers I’m going to explain how we can get a finger of a robot to reach a certain goal. In other words, our goal is to minimize the distance between the finger and the goal so we’ll output rewards close to 0 when they are close to each other and negative rewards if they are really far apart.

Quick Breakdown of Reinforcement Learning

Reinforcement learning is an area of Machine Learning which looks at agents trying to maximize their cumulative reward given the state they’re in by taking a sequence of actions.

You have an agent interacting with the environment by performing an action and the environment, in turn, returns a reward and the new state the agent finds itself in.

In our case, robotic 2D arm:

The environment consists of where the two arm joints are in space
The reward is the negative of distance between the finger and the goal
The actions consist of a real-valued up or down movement on each of the two joints

Most reinforcement learning algorithms don’t work well out of the box with real-valued actions (if the action space is infinitely large like in the real physical world that means we’ll be waiting a long time before our arm learns anything at all).

But we can use Deep Deterministic Policy Gradients (DDPG) to make this work.

Deep Deterministic Policy Gradients

Policy gradients are a family of reinforcement learning algorithms that attempt to find the optimal policy to reach a certain goal.

This works really well for robotic control problems:

Model free: the algorithm really only needs low level observations like the positions of joints

If you’d like to learn more you can go through the original paper here.

To represent the state of the arm environment and then how to move the arm around. The Arm environment needs to hold the following key information:

A viewer class
State dimensions for: whether the goal was reached, the position of the two joints on the screen, the distance of the joints to the goal.
Action dimension which consists of the two joints we’re operating on with a scalar value that would nudge each one up or down to make them more likely to reach the goal.
An arm info data structure which keeps track of the length of each arm and the radius the arm makes with a horizontal line going through the center of the screen.

Reinforcement Learning for Humanoid

Inverse kinematics (IK) is needed in humanoid robots because they tend to lose balance. This approach is based on the idea of exploring the entire configuration space of the robot and learning the best possible solutions using Deep Deterministic Policy Gradient (DDPG). This strategy was evaluated on the highly articulated upper body of a humanoid model where the trained model was able to solve inverse kinematics for the end effectors with 90% accuracy while maintaining the balance.

Check out the paper here: https://arxiv.org/pdf/1801.10425.pdf

Open AI has been working on similar projects using Reinforcement Learning to train virtual characters like this humanoid, which is learning to walk.