Ph.D. Research Projects
1. Federated learning for privacy-preserving distributed deep learning
Machine learning context: Federated learning (FL) is a distributed machine learning paradigm where several clients collaboratively train a model under the orchestration of a central server, while keeping the training data decentralized and private. Conventionally, a global model is constructed by pooling raw data from different clients into a central database for model training. FL enables clients to obtain a globally optimized model without uploading data to a central database or sharing proprietary data with other clients, thus maintaining data privacy without compromising performance.
The Federated Averaging algorithm (FedAvg) enables FL on decentralized data. It works as follows: First, the central server initializes a global model and shares it across all participating clients participating. Each client downloads this model and improves it by training on its local data for some epochs. The parameter updates of the local models (think weights of a neural network) are uploaded to the central server where they are aggregated and incorporated into the global model.
Manufacturing application 1: Semantic segmentation for pixel-scale defect detection from images
Semantic segmentation is an ML-based computer vision technique which enables fine-scale defect detection in additive manufacturing (AM). Most existing segmentation methods utilize CNN architectures that require large quantities of training data. However, obtaining sufficient data — both in quality and quantity — is costly in AM. This limits the deployment of semantic segmentation in a data-scarce production environment. Similar data may be readily available at other AM sites but cannot be pooled for centralized learning (CL) due to its proprietary nature.
We use FL to simultaneously alleviate both data scarcity and data privacy. We train a U-Net architecture under the FL framework and observe excellent defect detection performance even when individual manufacturers have only one or two images! FL achieves a comparable defect detection performance with CL (which shares data across clients and does not preserve data privacy) and significantly outperforms individual learning (where each client trains on its own data only). We thus show that FL enables the best of both worlds.
Read our open-source paper here. Our code is available here. This work was featured in MechSE News.
Manufacturing application 2: Mixed fault classification in rotating machinery
Rotating machinery is ubiquitous in modern industrial systems. Ensuring optimal operating conditions for rotating machinery is essential to satisfy stringent requirements on safety, efficiency, and reliability. State-of-the-art performance for fault detection has been achieved using deep learning-based methods which generally require large quantities of high-quality data. We develop an FL framework for the diagnosis of mixed faults from multiple factories. We construct a novel duplet classifier to separate the mixed fault classification task into parallel networks (each network responsible for one component). Experimental results show that FL yields excellent mixed fault classification accuracy for all participating factories even under highly unbalanced and heterogeneous distributions of fault labels. See our slides here.
2. Addressing stasticial hetergeneity in federated learning using client clustering
Machine learning context: Many recent studies have shown that the global model learned by FL may not be suitable for all clients. While FL can handle some stastical heterogeneity in client data, it yields suboptimal results when the distributions diverge significantly. To tackle this, three main lines of research are prevalent. The first line of research focuses on improving the single global model learned by FedAvg to accommodate non-IID data. The second line of research learns individually optimized models for each client (personalized FL). The third line of research is clustered FL, which assumes that groups of clients share one data distribution. Here, a personalized model is learned for each group rather than each individual client.
The focus of our work is clustered FL. We develop a state-of-the-art method Federated Learning via Agglomerative Client Clustering (FLACC) using gradient updates from FedAvg to cluster similar clients together. For some cool math, read our open-source paper here. This work was featured in MechSE News.
Manufacturing application: We test FLACC on a realistic industrial scenario where 48 factories participate in FL and each factory produces only one or two types of faults. Under such extreme statistical heterogeneity, vanilla FL is not well suited as shown by the FedAvg accuracy in the figure. Contrarily, FLACC identifies and clusters similar entities solely based on their gradients and without looking at their signals. FLACC separates factories into independent federations leading to significant increase in classification accuracy.
3. Active learning for multi-task learning with Gaussian processes
Machine learning context: Gaussian processes (GPs) are a powerful machine learning tool useful for both classification and regression. A GP is a collection of random variables, any finite number of which are jointly Gaussian. GPs are very flexible in fitting highly non-linear functions to relatively small datasets using covaraiance functions called kernels. Using this covariance, GPs provide an analytical expression for prediction uncertainty, thereby enabling sequntial sampling in a target domain (active learning). A beautiful visual introduction to GPs can be found here.
Multi-task learning (MTL) refers to learning many similar-but-not-identical tasks together while exploiting commonalities and differences across tasks. MTL improves learning efficiency and accuracy compared to training each task separately. There has been significant research on MTL with GPs and active learning using GPs. But active learning for MTL of GPs has not been studied.
Manufacturing application: GPs are popular for response surface modeling in manufacturing, where the task is to map a set of process inputs to known outputs. We consider a simple 2D example of a machined engine head. The task here is to reconstruct the entire surface using only a few strategic measurements. Since a factory produces several such surfaces using similar process parameters, MTL can be used to reconstruct many engine surfaces together.
The problem we solve is: To obtain the least possible reconstruction error using as few measurements as possible, which engine surface should we measure and where should we measure it? If you are interested in the math, please refer to our paper here.
4. Multi-task response surface modeling with multi-resolution data
Machine learning context: There are several methods for multi-task learning with Gaussian processes (GPs) in literature. Most of these methods learn the GP mean and covariance functions for each task from the training data. Some of these parameters represent input uncertainty in the system. Practically, the input uncertainty is inherent to a system and not learned for it. To solve this, we develop a heirarchical Bayesian framework for multi-task response surface modeling using GPs and derive an expectation maximization (EM) algorithm. Our method can explicitly encode input uncertainty rather than fitting it to data, making it more practical for real systems.
Manufacturing application: In surface metrology, it is not uncommon to use gauges having multiple resolutions for measurement. For instance, a coordinate measurement machine can be used along with a profilometer or a laser holographic interferometer. Each gauge has a different repeatability which is known from prior experiments. Using our multi-task multi-resolution framework, we can encode the gauge uncertainties explicitly while making surface predictions, thus obtaining highly accurate surface reconstruction from limited measurements on engine surfaces.