A team of researchers from NVIDIA and the University of Washington has released CLIPort, a machine learning framework that not only precisely manipulates specific objects on a robotic arm, but also understands object abstractions in natural language.
Recent research has shown that end-to-end networks can allow AI to acquire subtle manipulation skills that require spatial reasoning. However, he points out that there are many things that cannot be generalized and applied to new technologies or the same concept can be transferred to other technologies with existing methods. In addition, significant progress has been made in learning visual and language generalizable meaning through large-scale data learning, but this lacks the spatial understanding required for accurate work.
The research team developed a framework called Clipport that combines the Transporter Networks architecture with spatial accuracy and the Clip architecture that understands various image meanings depending on the language to realize visual-based manipulation.
In addition, the idea of combining the two architectures to manipulate the robotic arm is based on the hypothesis of two visual pathways ( It is said to be triggered by the two-streams hypothesis.
Clipport performs various tasks indicated by natural language without explicit expression such as object orientation, state instance segmentation, and syntax structure. Through the demonstration, the research team folded and unfolded the fabric, put the scattered objects into the bowl, put the indicated object into the box, moved the chess frame, put the scattered water, picked the cherry, put it in the box, read the letters in the box, and put the object in the indicated box It is said that 9 tasks were prepared, including inserting and moving the rope as directed, but the number of datasets was only 179.
Clipport succeeded in accurately inserting the bowl into the block even in the test that interfered with the bowl position. In addition, Clip, an image recognition algorithm used by Clipport to recognize objects, can identify objects that have been previously learned as well as objects that are seen for the first time. In the test of putting the indicated object into the box, when the clip was instructed to put a blue whiteboard marker that the clip had never seen before, the robot arm accurately grabbed the marker. Clipport was also said to be able to move chess frames as directed, place specific objects into designated boxes, sweep coffee beans, and complete a variety of tasks. Related information can be found here.