Closed-Loop Vision-Based Manipulation Technique

Manipulation of real-world objects is one of the most important limits of machine intelligence.

As humans, we can easily grasp and pick up objects that we’ve never seen before, but even the most advanced robotic arms can’t manipulate objects that they weren’t trained to handle.

I’ve been working with the engineering department at San Jose State University (SJSU) to develop these techniques for robotics along with new manufacturing/design for developing cheaper and easier to use medical devices.

Prosthetic Arm CAD Design

Specificially, we are focusing on applying our work to prosthetics but these can be scaled across other medical assistive devices that people use on a day-day basis.

Problem of Generalization

Resulting Demo

This approach was able to achieve a 78% grasp success rate on previously unseen objects and 84% on household objects that are moved during the grasp attempt.

I’ll be diving deeper into how the manipulation part of the algorithm works with a brief highlight on the dynamic particles, inverse kinematics and motion planning!

Overview of all components involved in manipulation tasks!

Q-learning and CNN’s

Q-learning is an off-policy learner where learning happens from taking different actions (for ex., from random actions) where a policy isn’t needed. It seeks to find the best action to take given the current state. Q-learning seeks to learn a policy that maximizes the total reward.

This approach specifically is an off-policy framework based around a continuous generalization of Q-learning where you train only a Q-function, and induce a policy implicitly by maximizing this Q-function.

QT-Opt

QT-Opt successfully trains a neural network in a closed-loop system, allowing the robot to learn useful techniques like regrasping in case of a failed grasp, pre-grasp manipulation of objects, grasping in clutter, and handling dynamic objects. This is the original paper for QT-Opt that is referenced throughout this article.

Markov Decision Process (MDP)

At each time step, the policy observes the image from the robot’s camera and chooses a gripper command. The grasping task is defined simply by providing a reward to the learner during data collection.

The state observation includes the robot’s current camera observation, which is in the form of an RGB image. This also includes the status of the gripper in the state, which is a binary indicator of whether the gripper is open or closed.

The action consists of a vector in Cartesian space indicating the desired change in the gripper position.

Reward function

Success is determined by using a background subtraction test after dropping the picked object. This can be seen in the following grasp execution and termination condition flow:

Grasp execution and termination condition

The manipulator is controlled by creating a closed-loop around visual input and gripper commands. At the end of an episode, the grasp success is detected and the sequence of state-action pairs are returned to be fed back into the learning pipeline or stored offline. Depending on the executed policy, the termination action is either learned or decided heuristically.

Q-Function representation as CNN

The input image is processed through convolutional layers where then the action and vector-valued state variables (gripper status and height) are defined. These are processed by several fully connected layers, with width and height dimensions of the convolutional map. The resulting convolutional map is further processed by a number of convolutional layers. Finally, the output is gated through a sigmoid so that the Q-values are always in the range [0, 1].

The architecture of grasping Q-Function

Data collection

Quantile QT-Opt

Inverse Kinematics (IK)

Currently, this is done through very complex math and takes a lot of computational power. Although today there are various packages especially in ROS that do these calculations automatically. As an example, VREP (the simulation software I use) has built-in kinematics functionality via the GUI.

The grasping end of a robot arm is often the end-effector. The robot configuration is a list of joint positions that are restricted by the position limits of the robot model (such as degrees of freedom).

Given the desired robot’s end-effector positions for grasping an object, IK can determine an appropriate joint configuration for which the end-effectors move to the target pose.

Configuring the joint positions of a robot using forward or inverse kinematics.

Once the robot’s joint angles are calculated using the inverse kinematics, a motion profile can be generated using the Jacobian matrix to move the end-effector from the initial to the target pose. The Jacobian matrix helps define a relationship between the robot’s joint parameters and the end-effector velocities.

Generally, robots have multiple solutions to inverse kinematics. If you’d like to dive deeper into this topic, check out this article I’ve written on how we can apply RL to solve IK problems.

Dynamic Particles & Motion Planning

A motion planner is an algorithm that automatically plans the route, trajectory or path that the robot will travel to get from Point A to Point B.

In real-life, these algorithms speed up the programming process when the robot is in a complex environment (where there are lots of obstacles in its way). Instead of planning every single move ourselves, the motion planner can automatically create one or more good routes for the robot to follow to reach the desired place or goal.

VREP offers path/motion planning functionality via a plugin wrapping the OMPL library.

This is accomplished with these key steps:

  • It is defined what the start and goal state is but if the path planning object is a serial manipulator, an end-effector position is often provided instead of a goal state.
  • A new path planning task is created and an algorithm is selected.
  • Create the required state space and specify which entities are not allowed to collide with.
  • Compute one or several paths and destroy the path planning task once the task is completed.

Often, path planning is used in combination with inverse kinematics.

Moving Forward: Testing

Our final results including the mechanical design/improvements will be published in a paper very soon. Stay tuned!

If you have questions, feedback or thoughts about anything, feel free to get in touch:

Email: alishbai734@gmail.com

Linkedin: Alishba Imran

Twitter: @alishbaimran_

I’m a developer & innovator who enjoys building products and researching ways we can use AI, Blockchain & robotics to solve problems in healthcare and energy!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store