OpenAI is observing machine learning agents evolving to use more and more complex tools by making simple hide and seek. AI repeats hide-and-seek and refines 6 strategies and countermeasures by itself. The fact that AI independently showed such a complex cooperative adaptation suggests the possibility of more complex and intelligent movements in the future.
In hide-and-seek, AI plays both the hidden role and the ghost role. The hidden role is shown in blue, and the ghost role is shown in red. The role of hiding is a movement that is invisible from the view of the ghost role, and the role of the ghost moves to understand what is hiding in Xia. No details on hide-and-seek are entered, and the area where the hide-and-seek is played repeatedly while hunting is hidden and some useful objects are placed, and the actual hide-and-seek environment is realistically reproduced.
The distance between the hiding role and the objects around the ghost role can be measured, and when the ghost role plays a hiding role, the gaze is displayed like a beam. If you try to hide-and-seek without input to the agent, it moves completely randomly at first, but when the number of hide-and-seek hits reaches 2.69 million, the hiding role is hidden in the shadow of the object, and the ghost role follows the hiding role. After 8.6 million episodes, you can learn how to operate the entity and anchor it at the entrance. In addition, objects can only be fixed in the hiding role, and once the fixed block remains in place until the hiding role is played.
If the object is fixed at the entrance, the ghost role cannot be found. However, from 8.62 million to 14.5 million, the ghost station moves the triangular columnar object during hide-and-seek, and uses the slope to discover hidden things in the room. The role of hiding between 14.5 million and 43.4 million times learns how to hide itself so that the ghost role does not use the slopes.
Learn to chase the role of hiding ghost roles up to 22 million times in an environment with few walls. Students learn how to form a shelter by using plates from 22 million to 88 million times. On the other hand, the method for the ghost station to use the slopes from 88 million to 115 million times, and the method of fixing the slopes, a new countermeasure, is squeezed from 115 million times to 382 million times.
The slope was fixed and I thought it was desperate, but the ghost role moves the cube to the side of the slope and rides on a cube on the slope. It moves while riding on a cube and discovers its role in hiding inside the shelter. The way to surf on this cube is to play hide-and-seek from 386 million to 458 million times, and the role of hiding fixes all objects and forms a shelter. When the role of hiding in this state finds a new way, the role of ghosts creates a strategy for this and repeats it to create advanced hide and seek.
The task of navigating a complex environment is quite difficult for agents to create sophisticated movements because there are so many things that humans have to design and set for AI behavior. However, it can be said that there is a possibility that AI can produce useful skills by assigning different roles to compete while training an AI model as an experiment.
Open AI says that the results give confidence that the method of using multiple agents in a more free and diverse environment is quite complex and can lead to human-related behavior. Related information can be found here .
Add comment