Highest score in history! Tencent Juewu AI won the championship of minecraft AI competition

0
134

Juewu AI began to challenge the more complex open world game environment.
Minecraft is the world’s best-selling open world 3D game. Randomly generated open maps, free and flexible playing methods and multithreaded long chain tasks have brought great challenges to AI research. For the complex environment of minecraft, minerl invites global programmers to train AI with a computer to find diamonds in the game in four days.
On December 8, the main research track of the third minerl competition released its results, and AI’s “Diamond dream” took a big step forward: Tencent AI Lab “Juewu” won the championship with an absolute advantage of 76.970 points. The research results have been published on arXiv, and the algorithm framework can be reused in other complex decision-making environments.
Minerl competition is jointly organized by Carnegie Mellon University, Microsoft, deepmind and openai, together with the top machine learning conference neurips. The extremely challenging competition continues to attract the attention of developers all over the world. A total of 59 teams and nearly 500 players participated in this year’s event, including strong scientific research teams from the world’s top universities and research institutions. The research theme of the competition is: training samples and efficient minecraft AI agents.
Tencent AI Lab innovatively achieves the competition goal through hierarchical reinforcement learning, representation learning, self imitation learning, integrated behavior cloning and other algorithms.
Juewu AI won the highest score in history with an overwhelming advantage
Extremely diverse environments, maps generated entirely by random seeds, long decision sequences and complex skill learning, and massive strategic preferences brought by high degree of freedom play all increase the difficulty of minecraft AI research. For example, in order for AI to find diamonds within 15 minutes, AI needs to collect logs, synthetic boards, sticks and pickaxes by hand, collect iron ore, and synthesize diamonds after a series of processing.
In addition, the organizers also formulated various strict rules, including prohibiting participants from writing rules, encrypting the backpack information and action space in the game environment, not allowing the use of pre training models, and only allowing interaction with the environment for up to 8 million times, Each team can only use a 6-core CPU and half a NVIDIA K80 graphics card for training for 4 days – this configuration can be affordable for almost all university laboratories and individual researchers.
The purpose of this competition is to promote the development of sample efficient game AI algorithm. At present, the popular reinforcement learning algorithms generally need to reach tens of millions of trials and errors to find the optimal process, which consumes a lot of time and computing resources. Although the imitation learning algorithm relying solely on human data is faster, its performance is often unsatisfactory.
Minecraft game screenshot
Juewu AI creatively proposes a sample efficient solution based on hierarchical reinforcement learning. The data show that the prediction accuracy of the upper controller launched by Tencent AI Lab can reach 99.95%, that is, AI has learned a set of almost error free macro strategies from human data, and clearly knows the correct trend of its next step all the time.
In the representation of state space, the biggest challenge faced by minecraft game is how to understand complex open maps. Firstly, the popular representation learning method in recent years is selected. However, researchers soon found that the existing methods are only applicable to 2D scenes, and the effect is very poor in minecraft game environment. Therefore, Tencent AI Lab designed a novel algorithm based on action aware representation learning to capture the impact of each action on the environment and form an attention mechanism. Experiments show that the algorithm can significantly improve the ability and efficiency of agents to obtain resources.
Based on the visualization results of different actions, AI learned to focus on the key areas in the current image
With the advancement of the game, there are great differences between the strategies of agents and humans. At this time, human data has been difficult to guide AI. Juewu AI uses the idea of self imitation learning and proposes a self imitation learning algorithm based on discriminator. AI can obtain experience and lessons from its past successes and failures, and take the initiative to correct in a better direction when it detects that the current situation is bad. Comparative experiments show that after adding the self imitation strategy, the behavior explored by the agent is more consistent, and the probability of entering the dangerous area can be significantly reduced.
Researchers have also made detailed optimization for tasks requiring long chain action sequences such as synthetic items. Through action sequence consistency filtering and voting based integrated learning, the success rate of the model in the synthetic goods stage is increased from 35% to 96%, turning the weakest chain into the most stable winning point in one fell swoop.

Using highly complex and customized game scenes as training ground, the deep reinforcement learning agent of Tencent AI Lab is constantly approaching reality. The “unique skill” of chess and card game AI gradually moves from go board to chess and mahjong, and the “unique understanding” of strategic cooperative AI moves from MoBa to FPS and RTS, and then to today’s 3D open world minecraft. Each step towards new challenges brings AI one step closer to the goal of solving real problems and improving science and technology.
With the virtual reality integrated world gradually becoming a reality, the experience, methods and conclusions of these studies will create greater practical value in the real world.