From concepts to technologies to international standards and open source communities, federal learning takes only two years.

0
262

Lei Feng’s AI Science and Technology Review Press: On August 16, the 28th International Joint Artificial Intelligence Congress (IJCAI 2019) successfully closed in Macao.
This IJCAI is the 50th anniversary of IJCAI. The organizers have organized a series of thematic activities. In addition to the routine links of papers, Tutorial, Workshop, demo, exhibitions and so on, there are also panel links featuring the 50th anniversary of IJCAI, AI in China, user data privacy and so on. Among these contents, “Federal Learning” is undoubtedly one of the most noteworthy.
In the Workshop Day on August 12, the first International Symposium on Federal Learning, held by Weizhong Bank and IBM, became the most popular Workshop of the day. The symposium was full before it started. Many participants crowded outside the door to listen, and the enthusiasm of the audience exceeded the expectations of the organizers.
Lei Feng understands that the first International Symposium on Federal Learning was sponsored by Weizhong Bank and IBM Research, and supported by Elsevier, Innovation Workshop, Squirrel AI and China Artificial Intelligence Open Source Software Development Alliance (AIOSS) and IEEE.
During the IJCAI conference, the third meeting of the Working Group on the Standards of IEEE P3652.1 (Federal Learning Infrastructure and Applications) was also held. In addition, the relevant content of federal learning was shared by micro-banks at the “AI Security Symposium” and “AI and User Privacy” roundtables. This is probably the first time since the concept of federal learning was put forward in 2017 that federal learning researchers have made such a dense voice at the top international conference of artificial intelligence. The participation of many enterprises represented by Weizhong Bank also marks that federal learning has gradually moved from basic research to landing application. The convening of the symposium marked the formal establishment of the Federal Learning International Community, and the Federal Learning entered a new stage.
Why has federal learning become a hot topic in industry?
Since 2006, with the introduction of deep learning neural network, the improvement of algorithm and computing power, and the wide application of large data, artificial intelligence has ushered in a new wave of peaks. In 2016, Alpha Go’s victory over Go world champion Li Shishi not only demonstrates the great potential of AI driven by big data, but also makes people look forward to a new era in which AI can be realized in all walks of life.
However, the ideal is abundant and the reality is skeletal. In practical application, most of the application fields have the problem of limited data and poor quality. In some highly specialized sub-areas (such as medical diagnosis), it is difficult to obtain tagged data that can support the implementation of artificial intelligence technology. At the same time, there are difficult barriers to break between different data sources, and “big data” is often just the general name of more and more “data islands”.
At the same time, with the development of big data, it has become a worldwide trend to attach importance to data privacy and security, and the introduction of a series of regulations such as the General Data Protection Regulation (GDP R) of the European Union has aggravated the difficulty of data acquisition, which has also brought the former to the application of artificial intelligence on the ground. Unprecedented challenges.
Federated Learning is a new attempt to solve the data dilemma faced by traditional machine learning methods. This is a new artificial intelligence basic technology that can carry out efficient machine learning among multi-participants or multi-computing nodes on the premise of protecting data privacy and meeting the requirements of legitimate compliance. Federal learning has the following characteristics:
Under the framework of federated learning, all participants have equal status and can achieve fair cooperation.
Data should be kept locally to avoid data leakage and meet the needs of user privacy protection and data security.
Encrypted exchange of information and model parameters can be guaranteed to all participants while maintaining their independence, and grow at the same time.
The effect of modeling is similar to that of traditional deep learning algorithm. Especially in the process of federal transfer learning, “no loss” can be achieved to avoid the negative transfer of transfer learning.
Federal learning is a closed-loop learning mechanism. The effectiveness of the model depends on the contribution of data providers to themselves and others, which helps motivate more organizations to join the data federation.
The above characteristics of Federated learning are of great significance in breaking data islands and promoting AI to land in more industries. In order to provide better services for users, the urgency of integrating data in AI applications has reached an unprecedented level.
But if data cannot be exchanged among companies, most of them, except a few “Big Macs” with massive users and product and service advantages, can hardly cross the data gap of AI in a reasonable and legal way, or they need to pay a huge cost to solve this problem. Questions.
Federal learning is to build a virtual common model by means of technology, which can achieve the same effect as the optimal model built by aggregating data under the condition that the existing mechanisms and processes can not be changed.
It is worth mentioning that this kind of data aggregation is not simply to merge all parties’data, but to establish high-quality models at each end of the participants by exchanging their own data without local access and through encryption mechanism (e.g., enterprise A establishes a classification task model and enterprise B establishes a classification task model). Establish a predictive task model. Comparing with the traditional way in which each data subject has its own private data, the “federation” contains the idea of uniting all parties on an equal footing and has the meaning of “gentleman and different”.

There is another story about the name of “Federated Learning”. In the early years, most of the words “Federated Learning” were translated into “joint learning”, which is now commonly called “federal learning”. The difference is that if the user is an individual, he or she is indeed “united” to learn their model, and if the user is a data owner such as an enterprise, a bank, a hospital, etc., the technology is more likely to combine many “city states” and the word “federation” is more accurate. The change of this name also reflects the changing trend of the research subject of Federated learning from theory to practice.
The Evolutionary Way of Federal Learning
In 2017, in order to solve the problem of data security and massive data transmission caused by the training of Android mobile phone user’s personal terminal device model (such as input method preferred word recommendation model), Google proposed a new data joint modeling scheme, which enables users to update model parameters locally when using Android mobile phone and to transfer parameters. Upload to the cloud, so that data parties with the same feature dimension can jointly build the model. It can solve the distribution of data sets with large overlap of sample features and small overlap of samples. This joint modeling scheme is called horizontal federated learning, and it is also the earliest way of federated learning.
Classification of Federal Learning
For different data sample types, there are two different ways of vertical Federation learning and Federation transfer learning besides horizontal Federation learning.
The former is used to solve the problem that the overlap part of sample is large, while the overlap part of sample feature is small and needs vertical segmentation. The former is used to solve the problem that the overlap part of sample and sample feature is small or there is no overlap part.
Professor Yang Qiang, Chair Professor of Hong Kong University of Science and Technology and Chief AI Officer of Weizhong Bank, led the AI team of Weizhong Bank to combine migration learning with Federation learning, and proposed Federation migration learning, not through data segmentation, but through migration learning.
For example, the scenarios of the same business type and different regions (such as two regional banks in different regions) are suitable for horizontal Federation learning, while the scenarios of the same region and different business types (such as a bank and supermarket in Shenzhen) are suitable for vertical Federation learning, while the scenarios of different regions and businesses are suitable for different institutions.( For example, an American supermarket and a Chinese bank) solve the problem of insufficient data size and label samples by introducing federal migration learning.
It can also be seen that the situation of Federated migration learning proposed by AI team of Weizhong Bank is more universal, and more in line with the future application needs of large data, multi-enterprise and cross-industry.
Professor Yang Qiang’s AI team of Weizhong Bank has become the main promoter of learning in China and even in the international federation.
Since 2018, the AI team of Weizhong Bank has not only exchanged the results of federal learning at various academic exchanges such as CCAI, AAAI, CCF Youth Elite Congress and IJCAI, but also held many seminars with professional organizations such as CCF and IEEE to explore innovative breakthroughs in federal learning with the industry. The specific methods of supervised learning, reinforcement learning and decision tree, including secure Federation transfer learning, Federation reinforcement learning and SecureBoost security tree model, introduced by AI team, have attracted the attention of researchers and industry.
In technology landing, Weizhong Bank will also apply federal learning to its own credit wind control, customer equity pricing and other financial business processes; at the same time, Weizhong Bank has signed cooperation with Pengcheng Laboratory, Swiss reinsurance, extreme perspective and other enterprises and institutions to promote the application of federal learning technology in more areas.
The AI team of Weizhong Bank is also committed to promoting the standardization of federal learning. This is also a necessary process for technology to mature and gradually land on the ground. If federal learning wants to truly achieve landing application, it must establish a dialogue language between enterprises, and it is supported by the system of international law and regulations.
Last October, the AI team of Weizhong Bank submitted a proposal to the IEEE Standards Association for the establishment of a federal learning standard – “Guide for Architectural Framework and Application of Federated Machine Learning” (Federal Learning Infrastructure and Application Standards), which was approved in December 2018.
Subsequently, under the leadership of Professor Yang Qiang, a working group on standards of IEEE P3652.1 (Federal Learning Infrastructure and Application) was set up. The working group held its first and second meetings in February and June this year, sorting out the typical cases of federal learning in their respective fields, and carrying out the specific forms and contents of federal learning standards. Discussions were made and constructive suggestions were put forward for the formulation of the draft standards.
At this IJCAI meeting, Weizhong Bank once again jointly organized the third meeting of the Working Group on the Standards of IEEE P3652.1 (Federal Learning Infrastructure and Application) with more than 20 enterprises and units at home and abroad. The main content of this meeting was to focus on how to quantify the evaluation of the indicators of federal learning and how the standards reflect federalism. Compliance of learning technology, classification and induction of federal learning application cases, etc.
Federal learning enters the international standard process, which means that enterprises joining the Federal Learning Alliance can talk in the same framework. If new enterprises or institutions want to join the Federal Learning Alliance, they must apply the same framework in accordance with the provisions of this standard, which in turn can promote the Federal Learning Ecology. Expansion can be said to lay a cornerstone for the ecological construction of the whole federal learning, which is of great significance.
Out of the Financial Scenario, Build AI Big Data Ecology with Open Source Platform

For a long time, the financial industry is one of the most potential industries for big data and artificial intelligence. Financial industry is a data-intensive industry. Financial data has higher requirements of real-time, security and stability. Structured data accounts for a high proportion and has a wide range of application scenarios. But at the same time, the financial industry also relies on the use of third-party data to provide better services to customers. Its own business characteristics also bring great difficulties to data security and personal privacy protection of financial data. The phenomenon of data islands is serious, which is why federal learning is the first innovative financial enterprise like Weizhong Bank. Reasons for landing and blossoming and fruiting.
But the scenario for federal learning is not just for the financial industry. In other industries, the problem of data islands is also widespread. How to make use of the ability of Federated learning to “learn how to learn”? After accumulating the learning experience in many fields, the examples of migration are made into training sets, so that AI can plan how to realize the migration in different fields, which has guiding significance for the landing of AI. In this process, the more areas accumulated, the more training sets available (i.e., examples of migration between different fields) will grow exponentially, so it is essential to build an ecosystem of federated learning.
Based on this thinking, the AI team of Weizhong Bank launched a project named FedAI Ecosystem, which aims to develop and promote AI technology and its application under the protection of security and user privacy. Under the premise of ensuring data security and user privacy, the project establishes AI technology ecology based on Federated learning, which enables industries to give full play to the value of data and promote the landing of vertical domain cases.
Another way to advance technology is open source.
In June this year, Weizhong Bank opened up Federated AI Technology Enabler (FATE) as a learning technology framework for industrial cascade states. It is called “industrial class” because it can solve three common problems in industrial applications, including parallel computing architecture, auditable information interaction and clear and scalable interfaces.
The FATE project not only provides a series of out-of-the-box federated learning algorithms, such as LR, GBDT, CNN and so on, but also provides a model for developers to implement federated learning algorithms and systems. Most of the traditional algorithms can be adapted to the federated learning framework through some modifications. Through the project, open source can be applied to the federated learning framework. Relevant organizations carry out AI empowerment, enhance their own modeling technology and capabilities, provide a simple and effective solution for the rapid development and application of industry personnel, support the development and application in multi-scenarios by means of joint construction, platform services and other solutions landing.
Although federal learning provides a feasible way of thinking for many scenarios where AI was difficult to land before, there are also a series of different problems in different industries. For example, on the scene of the First International Symposium on Federal Learning, a technician from Huawei told Lei Fengwei (Public No. Lei Fengwei) that his purpose in attending the symposium was to solve two puzzles in his practical application of related technologies. One was how to use labeled data in the context of a smart city. Through Federation learning, we can help us learn by using the unlabeled data of local cameras. Secondly, in medical scenarios, whether it is possible to preliminarily predict whether the exchange performance will be improved before the exchange model of Latong and Latong hospitals. On the way to further promote federal learning, more people are needed to participate in ecological construction.
It is gratifying that this seminar has received many excellent papers from institutions and enterprises of universities, and the community of federal learning is growing further. Meanwhile, on the second day of the seminar, Weizhong Bank upgraded FATE again, introduced the first visual federated learning tool FATEBoard, and the federated learning modeling pipeline scheduling and life cycle management tool FATEFlow. It also upgraded the Federated ML significantly and updated the algorithm. The new version of FATE also adds features that can partially support multiple parties. In future versions, the AI team of Weizhong Bank will further enhance support for multiple functions.
The first International Symposium on Federal Learning held at IJCAI is an important time node for Federal Learning.
Before that, although there were many papers, speeches and news reports about federal learning, there was hardly a chance for the outside world to get a glimpse of the whole picture. Few researchers in federal learning had a chance to gather together to understand the whole picture of the current development of federal learning. The first International Federal Learning Seminar on IJCAI is the first concentrative voice of the Federal Learning Community and attracts the attention of a large number of people from all walks of life. At the Machine Learning Summit NeurIPS in Vancouver in December this year, Weizhong Bank will hold another Federal Learning Seminar to share more research progress and practical experience in Federal Learning with the public.
expectation
Since the third wave of artificial intelligence surged in 2012, after the initial freshness faded, we and artificial intelligence have experienced the aesthetic fatigue of “seven-year itch”.
Despite sustained progress in the field of artificial intelligence, in the eyes of the public, most of the promises of artificial intelligence have not been fulfilled. Researchers have realized that breakthroughs in artificial intelligence rely heavily on annotated data, and open high-quality data sets such as ImageNet have become the driving force for innovation.
The challenge of AI in the future is still in data: with the progress of Internet, 5G technology and more applications of cheap sensors, future data will present a trend of massive fragmentation, and lower requirements for training data sets, including generating confrontation networks, reinforcement learning, migration learning and Federation learning, will be implemented. It will be the direction that researchers place great hopes on.
So is the future of federal learning possible?

The full scene of this seminar is also a strong signal. To solve the problem of insufficient, fragmented and small data, AI solutions must solve the problems of security, compliance, privacy protection and improve the efficiency of the model. At present, in such an AI technology era, user privacy protection will become a strong constraint of society. More and more people and enterprises begin to realize the seriousness of “data island” and the urgency of data sharing. Federal learning can solve these two problems simultaneously (privacy protection and sharing). It provides good technical support for us to build a cross-enterprise, cross-data and cross-domain large data AI ecosystem. Federal learning ecosystem linking more industries and application scenarios will also stand out. Sharp weapon.
Lei Feng’s original articles are forbidden to be reproduced without authorization. Details can be found in the reproduced instructions.