The development of AI technology in Google’s top 10 areas in 2020


Welcome to the wechat subscription number of “chuangshiji”: sinachungshiji
Qjp, Xiaoyun
Source: Xinzhiyuan (ID: AI)_ era)
Jeff Dean wrote a long article with tens of thousands of words, reviewing Google’s achievements and breakthroughs in various fields in the past year, and looking forward to its work goals in 2021.
“When I joined Google more than 20 years ago, I just wanted to figure out how to really start using computers to provide high-quality and comprehensive information search services on the Internet. Fast forward to today, when faced with a wider range of technical challenges, we still have the same overall goal, that is, to organize the world’s information to make it universally accessible and useful.
In 2020, as the world is reshaped by coronavirus, we see that technology can help billions of people better communicate, understand the world and accomplish tasks. I’m proud of what we’ve achieved and excited about the new possibilities that are coming. ”
The goal of Google research is to solve a series of long-term and major problems, from predicting the spread of coronavirus disease, to designing algorithms, automatically translating more and more languages, to reducing bias in machine learning models.
This article covers the key highlights of the year.
New coronavirus and health
The impact of covid-19 has brought huge losses to people’s lives. Researchers and developers around the world have joined hands to develop tools and technologies to help public health officials and policy makers understand and respond to the epidemic.
Apple and Google teamed up in 2020 to develop an exposure notification system (ENS), a Bluetooth enabled privacy protection technology that allows people to be notified if they are exposed to other people who have tested positive.
Ens complements traditional contact tracking and is deployed by public health authorities in more than 50 countries, states and regions to help curb the spread of infection.
In the early days of the pandemic, public health officials said they needed more comprehensive data to fight the rapid spread of the virus. Our community mobility report, which provides anonymous tracking of population mobility trends, not only helps researchers understand the impact of policies, such as home directives and social distance, but also predicts the economic impact.
Our own researchers have also explored using this anonymous data to predict the propagation of covid-19, replacing the traditional time series based model with graph neural network.
Coronavirus disease search trend symptoms allow researchers to explore time or relationships between symptoms, such as loss of smell – which is sometimes one of the symptoms of a virus. To further support the broader research community, we launched the Google health research application to provide a way for the public to participate in research.
Figure: the covid-19 search trend is helping researchers study the link between disease transmission and symptom related searches
Google’s team is providing tools and resources to the broader scientific community, which is working to address the health and economic impact of the virus.
Figure: a spatiotemporal map simulating the spread of new coronavirus
We are also working to help identify skin diseases, help detect age-related macular degeneration (the leading cause of blindness in the United States and the United Kingdom, and the third leading cause of blindness worldwide), and potentially new non-invasive diagnostics (for example, the ability to detect signs of anemia from retinal images).
Figure: deep learning model quantifies hemoglobin levels from retinal images. Hemoglobin level is an indicator of anemia
This year, there are exciting demonstrations of how the same technology can peer into the human genome. Google’s open source tool, deepvariant, uses convolutional neural network genome sequencing data to identify genomic variations, and this year won the best accuracy challenge for three of the FDA’s four categories. A study led by Dana Farber Cancer Institute, using the same tool, increased the diagnostic rate of genetic variations that lead to prostate cancer and melanoma by 14% in 2367 cancer patients.
Weather, environment and climate change
Machine learning can help us better understand the environment and help people make useful predictions in daily life and in disaster situations.
For weather and precipitation forecasting, computational physics based models such as NOAA’s hrrr have been dominant. However, we have been able to demonstrate that the ML based forecasting system can predict the current precipitation with better spatial resolution (“is it raining in Seattle’s local park?” rather than just “is it raining in Seattle?”). It can produce short-term forecasts up to 8 hours, better than hrrr It is much more accurate and can calculate the forecast faster with higher temporal and spatial resolution.
We also developed an improved technology, called hydronets, which uses a neural network to model the real river system, so as to more accurately understand the interaction between upstream water level and downstream flood, and make more accurate water level forecast and flood forecast. Using these technologies, we have expanded flood warning coverage in India and Bangladesh 20 times, helping to better protect more than 200 million people in 250000 square kilometers.
Machine learning continues to offer amazing opportunities to improve accessibility because it can learn to transform one sensory input into another. For example, we launched lookout, an android app that helps visually impaired users identify packaged food, whether it’s in the grocery store or in their kitchen cabinet.
The machine learning system behind lookout demonstrates a powerful but compact machine learning model, which can complete this task in real time on mobile phones with nearly 2 million products.

Similarly, it is difficult for people who use sign language to use video conferencing system, because even if they are in sign language, the audio based speaker detection system can not detect that they are actively speaking. To develop real-time automatic sign language detection for video conference, we propose a real-time sign language detection model, and demonstrate how to use the model to provide a mechanism for video conference system to recognize sign language speakers as active speakers.
Application of machine learning in other fields
In 2020, in collaboration with the flyem team, we released the Drosophila hemispheric connectome, a large synaptic resolution map of brain connections, to reconstruct brain tissue using large-scale machine learning models for high-resolution electron microscopy imaging. These connectome information will help neuroscientists to do all kinds of research and help us better understand how the brain works.
Responsible AI
In order to better understand the behavior of language models, we developed the language interpretability tool (LIT), which is a toolkit to better interpret language models, making it possible to interactively explore and analyze the decision-making of language models.
We have developed technologies to measure gender relevance in the pre training language model and scalable technologies to reduce gender bias in Google translation.
To help non professionals interpret machine learning results, we extend the tcav technology introduced in 2019 and now provide a complete and sufficient set of concepts. We can say that “hair” and “long ears” are important concepts of “rabbit” prediction. Through this work, we can also say that these two concepts are enough to fully explain the prediction; you don’t need any other concepts.
Conceptual bottleneck model is a kind of technology. By training the model, we can make one layer consistent with the pre-defined expert concepts (for example, “spine presentation”, or “wing color”, as shown below), and then make the final prediction of the task, so that we can not only interpret these concepts, but also dynamically turn them on / off.
natural language understanding
A better understanding of language is an area where we have seen considerable progress this year. Most of Google’s and other companies’ work in this field now relies on transformer, a special style of neural network model originally developed to solve language problems (but there is growing evidence that they are also useful in image, video, voice, protein folding and various other fields).
In 2020, we described Meena, a conversation robot that can talk about anything.
Machine learning algorithm
Google is still making great efforts to develop unsupervised learning. For example, simclr, developed in 2020, promotes self supervised and semi supervised learning technology.
Different self-monitoring methods (pre training on Imagenet) are used to train the Imagenet top-1 accuracy of imageCLASS classifier. The gray cross indicates the regulated resnet-50.
Reinforcement learning
Reinforcement learning through learning other subjects and improving exploration, Google has improved the efficiency of RL algorithm.
Their main focus this year is offline RL, which relies only on fixed, previously collected data sets (such as previous experiments or human demonstrations), thus extending RL to applications that cannot collect training data instantly. Researchers have introduced dual methods for RL, developed improved algorithms for non policy evaluation, and are working with the broader community to address these issues by publishing open source benchmark datasets and Atari’s dqn datasets.
Offline RL of Atari game using dqn replay dataset
Another research direction is to learn from other agents through apprenticeship learning, so as to improve the sample efficiency.
It should be noted that extending RL to complex practical problems is an important challenge.
Summarize our method and explain the data processing flow in attentionagent. Top: input conversion – a sliding window splits the input image into smaller patches and then “flattens” them for future processing. middle. Patch election – the modified self attention module votes among patches to generate the patch importance vector. Bottom: action generation — attentionagent votes among patches to generate the importance vector of patches. Action generation — attention agent selects the most important patch, extracts corresponding features, and makes decisions based on these features.
There is no doubt that this is a very active and exciting research area.
I am in automl Zero: we use another way to learn the code constantly, that is to provide a search space composed of very primitive operations (such as addition, subtraction, variable assignment and matrix multiplication) for evolutionary algorithm, so as to see if it is possible to develop modern ML algorithm from scratch.
However, there are too few useful algorithms. As shown in the figure below, the system reshapes many of the most important ml findings of the past 30 years, such as linear modeling, gradient descent, correction of linear units, effective learning rate setting and weight initialization, and gradient normalization.
Better understanding of ML algorithms and models
As neural networks are made wider and deeper, they tend to train faster and generalize better. This is one of the core mysteries of deep learning, because the classical learning theory shows that large networks should be over equipped with more.
Under the restriction of infinite width, neural network presents a surprisingly simple form, which is described by neural network Gaussian process (nngp) or neural tangent kernel (NTK). Google researchers have studied this phenomenon theoretically and experimentally, and released neural agents, an open source software library written in Jax, which allows researchers to build and train infinite width neural networks.

Left: This diagram shows how deep neural networks trigger them as simple I / O graphs become infinitely wide. Right: as the width of the neural network increases, we see that the output distribution on different random instances of the network changes to Gaussian distribution.
Machine perception
The perception of the world around us – the understanding, modeling, and acting of visual, auditory, and multimodal inputs – remains an area of great potential for research that benefits our daily lives.
In 2020, deep learning will combine 3D computer vision and computer graphics more closely. Cvxnet, depth implicit functions for 3D shapes, neural voxel rendering, and CoreNet are examples of this direction. In addition, their research on the representation of scenes as neural radiation fields (also known as nerf, see also this blog post) is a good example of how the academic collaboration of Google research has stimulated rapid progress in the field of neural volume rendering.
In the University of California, Berkeley collaboration “learning factitizing and re lighting the city”, Google proposed a learning based framework for decomposing outdoor scenes into spatiotemporal lighting and permanent scene factors. This can change the lighting effect and scene geometry for any street view panorama, and even turn it into an all day delay video.
In 2020, they will continue to expand the field of using neural network for media compression. They have achieved good results not only in learning image compression, but also in deep video compression methods, volume compression and deep unknown image watermarking.
First line: there is no cover image embedded in the message. Line 2: the encoded image from the hidden combined distortion model. Line 3: the encoded image from our model. The fourth line: the normalized difference between the encoding image and cover image of hidden combination model. The fifth line: the normalized difference of the model
Interacting with the broader research community through open source solutions and datasets is another important aspect. In 2020, Google will open a variety of new perceptual reasoning functions and solutions in mediapipe, such as face, hand and posture prediction on devices, real-time body posture tracking, real-time iris tracking and depth estimation, and real-time 3D object detection.
“Finally, looking forward to this year, I’m particularly interested in the possibility of building more general machine learning models that can handle a variety of patterns and automatically learn to complete new tasks with few training examples.
The progress in this field will provide people with more powerful products and better translation, speech recognition, language understanding and creative tools for billions of people around the world.