A team of researchers at the Massachusetts Institute of Technology (MIT) have come up with a predictive Artificial Intelligence (AI) that can learn to see by touching and to feel by seeing. While our sense of touch gives us capabilities to feel the physical world, our eyes help us understand the full picture of these tactile signals. Robots, however, that have been programmed to see or feel can't use these signals quite as interchangeably.
The new AI-based system can create realistic tactile signals from visual inputs, and predict which object and what part is being touched directly from those tactile inputs. In the future, this could help with a more harmonious relationship between vision and robotics, especially for object recognition, grasping, better scene understanding and helping with seamless human-robot integration in an assistive or manufacturing setting.
"By looking at the scene, our model can imagine the feeling of touching a flat surface or a sharp edge", said Yunzhu Li, PhD student and lead author from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL). "By blindly touching around, our model can predict the interaction with the environment purely from tactile feelings," Li added.
The team used a KUKA robot arm with a special tactile sensor called GelSight, designed by another group at MIT. Using a simple web camera, the team recorded nearly 200 objects, such as tools, household products, fabrics, and more, being touched more than 12,000 times. Breaking those 12,000 video clips down into static frames, the team compiled "VisGel," a dataset of more than three million visual/tactile-paired images.
"Bringing these two senses (vision and touch) together could empower the robot and reduce the data we might need for tasks involving manipulating and grasping objects," said Li. The current dataset only has examples of interactions in a controlled environment. The team hopes to improve this by collecting data in more unstructured areas, or by using a new MIT-designed tactile glove, to better increase the size and diversity of the dataset.
"This is the first method that can convincingly translate between visual and touch signals", said Andrew Owens, a post-doc at the University of California at Berkeley. The team is set to present the findings next week at the "Conference on Computer Vision and Pattern Recognition" in Long Beach, California.