Present performance of machine learning systems—optimization of parameters, weights, biases—at least in part relies on large volumes of training data which, as any other competitive asset, is dispersed, distributed, or maintained by various R&D and business data owners, rather than being stored by a single central entity. Collaboratively training a machine learning (ML) model on such distributed data—federated learning, or FL—can result in a more accurate and robust model than any participant could train in isolation.
FL, also known as collaborative learning, is a method that trains an algorithm collaboratively across multiple decentralized edge devices (e.g., a device providing an entry point into enterprise or service provider core networks) or servers holding local data samples without exchanging them among the edge devices. The appeal of FL stems from its ability to provide near-real-time access to large amounts of data, without requiring the transfer of that data between remote devices. In a sense, this means that the data is not “distributed”, but rather is “federated” across the devices. This may sound similar to the concept of distributed computing, which refers to the use of multiple devices to perform a task, such as a computer, a smartphone, or any other edge device. However, in FL, the data is not shared between the devices, and therefore, each device holds its own data and calculates its own model. Such collaborative training is usually implemented by a coordinator/aggregator that oversees the participants, and can result in more robust and accurate ML models than any single participant could hope to train in isolation. However, the data owners are often unwilling (e.g., limited trust), unable (e.g., limited connectivity or communication resources), and/or legally prohibited (e.g., privacy laws, such as HIPAA, GDPR, CCPA, and local state laws) from openly sharing all or part of their individual data sources with each other. In FL, however, raw edge device data is not required to be shared with the server or among distinct separate organizations, which distinguishes FL from traditional distributed optimization by bringing it under the orchestration of a central server and also requires FL to contend with heterogeneous data.
Hence, in FL, the star topology is typically used, in which one central server coordinates the initialization, communication, and aggregation of the algorithms, and serves as the central place for the aggregation of model updates and model updates. In this design the local nodes have some degree of trust in this central server, but still maintain independent and have their own degree on control of whether they participate and take ownership over their local data, the central server does not have access to the original local data.
There are two types of FL: horizontal and vertical. Horizontal FL involves collaborative training on horizontally partitioned datasets (e.g., the participants' datasets have common, similar, and/or overlapping feature spaces and uncommon, dissimilar, and/or non-overlapping sample spaces). For instance, two competing banks might have different clients (e.g., different sample spaces) while having similar types of information about their clients, such as age, occupation, credit score, and so on (e.g., similar feature spaces). Vertical FL, on the other hand, involves collaborative training on vertically partitioned datasets (e.g., the participants' datasets have common, similar, and/or overlapping sample spaces and uncommon, dissimilar, and/or non-overlapping feature spaces). For instance, a bank and an online retailer might serve the same clients (e.g., similar sample spaces) while having different types of information about those clients (e.g., different feature spaces).
Nowadays, growing concerns and restrictions on data sharing and privacy, such as the GDPR of Europe and the Cyber Security Law of China, made it difficult, if not impossible, to transfer, merge and fuse data obtained from different data owners. With FL, a device on the edge can send potentially de-identified updates to a model instead of sharing the entirety of its raw data in order for the model to be updated. As a result, FL greatly reduces privacy concerns since the data never leaves these devices, just an encrypted, perturbed gradient of data leave. Such framework can be a useful tool for many different types of organizations, from companies who do not want to disclose proprietary data to the public, to developers who may want to build privacy-preserving AI applications, like chatbots.
One of the earlier applications of FL was mobile keyboard (next word) predictions; the details of what an individual has typed remains on the device, and isn’t shared with the cloud-based, machine learning provider. The provider can see securely aggregated summaries of what’s been typed and corrected, across many devices. But they can’t see the contents of what a user has typed. This protects individual people’s privacy, while improving predictions for everyone. This approach is also compatible with additional personalized learning that occurs on device.
While FL can be adopted to build models locally and may boost model performance by widening the amount of available training data, due to its reliance on global synchrony and data exchange, whether this technique can be deployed at scale or not, across multiple platforms in real-world applications, remains unclear (particularly if the devices or servers in the system are highly secured). The main challenge with federated learning is that it relies heavily on the secure execution of decentralized computing due to the many iterations of training and the the large number of devices this needs to be communicated to. As the communication overhead is networked and can be several orders of magnitude slower than local computation, the system requires reduction of the total number of communication rounds and the size of the transmitted messages. Further, support of both system heterogeneity (devices having highly dynamic and heterogeneous network, hardware, connection, and power availability) and data heterogeneity (data is generated by different users on different devices, and therefore may have different statistical distribution, or non-IID) is required to attain high performance. Classical statistics pose theoretical challenges to FL, as user device data collection and training defeats any guarantee or assumption that training data is independent and identically distributed, IID. This is a distinguishing feature of FL. The loss of the strong statistical guarantee allows the system with high dimensionality to make inferences about a wider population of data, including for example in training set samples collected by edge devices. Lastly, the algorithms used in federated learning are fundamentally different from the algorithms used in decentralized computing systems, such as the algorithms used in blockchain. If the devices in a federated learning system do not have the same privacy-preservation or security models (as those in traditional computing environments), then the system will likely perform poorly or not function at all. For added privacy, an additional optional layer can be added, like Secure Multi-party Computation (SMC), Differential Privacy, or Homomorphic Encryption, in the case that even the aggregated information in the form of model updates may also contain privacy-sensitive information. Handling privacy sensitive information is one of the main motivations behind the development of homomorphic encryption in federated learning systems. Homomorphic encryption uses mathematical operations without revealing the private key, or “secret key,” used to encrypt the data. Thus, homomorphic encryption can be used to process encrypted data without revealing the model parameters or the encrypted data to the device that executed the computation – the device can only learn the parameters of the model, and cannot decrypt the data. Without learning the model parameters, the server is unable to perform attack vectors, such as side-channel attacks, on the model. Yet functional encryption techniques can be far more computationally efficient than homomorphic encryption techniques. It can involve a public key that encrypts confidential data and a functional secret key that, when applied to the encrypted confidential data, yields a functional output based on the confidential data without decrypting/revealing the confidential data. This can result in much faster FL than existing systems/techniques can facilitate (e.g., mere seconds to train via hybrid functional encryption versus hours to train via homomorphic encryption).
Drivers & opportunities
Fintech. Collectively, the amount of financial data (structured and unstructured) generated and processed worldwide by current banking systems and other financial service providers is incalculable. As such, the ability to extract value from data in the fintech sector while protecting privacy and complying with regulations is of great interest to both government and industry. The increased availability of large-scale, high-quality data, along with the growing desire for privacy in the wake of numerous data breaches, has led to the development of FL in the fintech sector. Today, FL in the fintech sector is being used to extract value from data in a way that preserves privacy while complying with regulations but its applications are still in its infancy, and many challenges abound. One of the main challenges is the difficulty in obtaining permission from end users to process their data. Once permission has been obtained, it is difficult to guarantee that all data is processed correctly. The data may be inconsistent and sometimes includes errors, so it is difficult to estimate the accuracy of the model after data is aggregated across multiple devices. The process may also be biased due to individual differences among a large number of devices, as some devices may be unable to complete the process due to a lack of resources (power, storage, memory, etc.). All these challenges require solution design that will allow the aggregation process to function effectively, coupled with encryption of collected data while in transit from the device to the server and at the server to protect user privacy.
Healthcare. The majority of healthcare data collection today is accomplished by paper forms, which are prone to errors and often result in under-reporting of adverse events. Of all the global stored data, about 30% resides in healthcare and it is fueling the development and funding for AI algorithms. By moving medical data out of data silos and improving the quality and accuracy of medical records, the use of FL in healthcare could improve patient safety and reduce the costs associated with information collection and review (e.g., clinical trials aimed at evaluating a medical, surgical, or behavioral intervention). In some circumstances, an individual may understand the medical research value of sharing information, but doesn't trust the organization that they're being asked to share with. The individual may wonder what third parties that could gain access to their data. On the B2B side, there are intellectual property (IP) issues that thwart companies that want to collaborate, but are unable to share their raw data for IP reasons as well as internal data policies that prevent even intra-company, cross-division sharing of data. In the further context of clinical trials, the data collection is centralized where one sponsor (principal investigator) who centrally produces the protocol and uses several sites where many end users can go for physical exams and laboratory tests. This procedure is time consuming and expensive as it requires considerable planning and effort and is mostly outsourced to Contract Research Organizations (CROs). With FL, a global protocol can be shared by one central authority to many end users who collect information on their edge devices, e.g. smartphones, label the information and compute it locally, after which the outcome tensors (generalization of vectors and matrices) are sent to the central FL aggregator of the sponsor. The central authority aggregates all the tensors and then reports the updated and averaged tensors back to each of the end users. Therefore, this one-to-many tensors can be configured to conduct distributed clinical trials. Further, administrators can control the data training and frequency behind the scenes and it is the algorithms that are adaptive, instead of humans in a CRO. Trials are more streamlined and parallelized; speed of trial is significantly improved, even though it may possibly mean failing fast; feedback loops are much faster, and the sponsors or CROs get a much better idea whether the trial is even working correctly from early on.
Industrial IoT (IIoT). Integrating FL in IIoT ensures that no local sensitive data is exchanged, as the distribution of learning models over the edge devices becomes more common with FL. With the extensive deployment of Industry 4.0, FL could play a critical role in manufacturing optimization and product life cycle management (PLCM) improvement, where sensors can be implemented to gather data about the local environment, which can then be used to train the models for a specific machine, equipment, or process in a specific location. This data in turn can be used to expand the parameters that can be optimized, further increasing automation capabilities, such as the temperature of a given process, the amount of oil used in a given machine, the type of material used in a particular tooling, or the amount of electricity used for a given process, all while protecting privacy-sensitive information. Beyond the expected benefits of FL for large scale manufacturing, critical mass opportunities for FL in the small and medium scale manufacturing industry might be as appealing for startups. The small and medium scale manufacturing industry is currently experiencing a shortage of skilled labor, which has led to an increase in the use of automation. However, the automation in these industries is often limited by the level and quality of data that can be collected and the ability to learn from this data. With FL, the availability of an on-premises learning model can help increase the efficiency of the manufacturing site and enhance product quality through the use of predictive maintenance, while maintaining user privacy, and without the need for user consent or supervision. Further, if the model is performing too slowly, or the accuracy of the model is too low (due to concept drift and/or model decay), the machine can be brought into a maintenance mode based on its predicted profiled needs. This avoids the need to take the machine completely offline, which would increase the costs associated with the maintenance, as well as the time. With the use of FL, manufacturers can gather and process data from larger number of edge devices to improve the accuracy of their processes, making them more competitive on the market.