Closed-Loop Vision-Based Manipulation Technique

Prosthetic Arm CAD Design

Problem of Generalization

Reinforcement learning (RL) offers a promising avenue for tackling this problem, but current work on reinforcement learning masters only individual skills. To create generalization which better meets of real-world manipulation, we focus specifically on scalable learning with off-policy algorithms and study this question in the context of the specific problem of grasping.

Resulting Demo

Using V-REPs motion planning and object grasping scene, I simulated a Jaco arm for manipulation tasks including dynamic particles, motion planning and inverse kinematics features. The model is trained and run on an algorithm that I reproduced very similar to QT-Opt. Check it out!

Overview of all components involved in manipulation tasks!

Q-learning and CNN’s

Generalization requires diverse data, but this can be very difficult with on-policy algorithms that evaluate and improve the same policy which is being used to select actions. This agent would not be good since it never explores.


QT-Opt is a reinforcement learning algorithm that allows robots to improve their grasping capability after watching hundreds of thousands of real-world grasping examples. At the core is a CNN which represents the robot’s grasping logic (its Q function).

Markov Decision Process (MDP)

This closed-loop vision-based control framework is based on a general formulation of robotic manipulation as a Markov Decision Process (MDP).

Reward function

A successful grasp results in a reward of 1, and a failed grasp a reward of 0. A grasp is considered successful if the robot holds an object above a certain height at the end of the episode.

Grasp execution and termination condition

Q-Function representation as CNN

The Q-function is represented by a large convolutional neural network, where the image is provided as an input into the bottom of the convolutional stack, and the action, gripper status, and distance to the floor are fed into the middle of the stack.

The architecture of grasping Q-Function

Data collection

We had to train it on a large and diverse set of objects. The dataset that I used to train was collected during multiple separate experiments, and each experiment reused the data from the previous one. The policy used was randomized but biased toward reasonable grasps, and later was switched to using the learned QT-Opt policy once it reached a high enough success rate.

Quantile QT-Opt

Researchers have been working on a distributional enhancement of QT-Opt called Quantile QT-Opt (Q2-Opt). This approach achieved a very high grasping success rate, while also being more sample efficient. To learn more, you can read this blog.

Inverse Kinematics (IK)

Inverse Kinematics is how machines calculate exactly how they can move their joints to reach a desired target. For grasping this is essentially how machines need to move to grasp an object based on the object and grasp position proposed.

Configuring the joint positions of a robot using forward or inverse kinematics.

Dynamic Particles & Motion Planning

Motion planning of a robot sounds like a very simple task at hand but it’s one of the harder parts of research that people are still working on.

  • It is defined what the start and goal state is but if the path planning object is a serial manipulator, an end-effector position is often provided instead of a goal state.
  • A new path planning task is created and an algorithm is selected.
  • Create the required state space and specify which entities are not allowed to collide with.
  • Compute one or several paths and destroy the path planning task once the task is completed.

Moving Forward: Testing

So far we have only tested these approaches in a simulation environment and although they seem promising, our next goal is to test them in real life. We will be working on making adaptions based on the prosthetic arm and then running initial tests on it using a generative grasping CNN and then the Q-leaning framework outlined here.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alishba Imran

Alishba Imran

Machine learning developer working on accelerating automation/hardware and energy storage!