Federated Learning
Federated learning trains ML models across multiple organizations or devices without centralizing the data. Instead of moving data to the model, federated learning moves the model to the data — each participant trains locally and shares only model updates. This approach preserves data privacy while enabling collaborative intelligence.
The motivation is practical. Five hospitals want to build a chest X-ray classifier, but none can share patient images due to HIPAA regulations. A consortium of banks wants to detect emerging fraud patterns, but sharing customer transaction data across institutions is prohibited. Federated learning makes both scenarios possible by keeping the data where it is and sharing only what the model learned from it.
Federated learning trains ML models across distributed datasets without moving the data. Each participant trains locally on their own data, then shares only model weight updates — not raw data — with a central coordinator that aggregates them into a global model. This enables hospitals to collaborate on medical imaging models without sharing patient records, and banks to build fraud detection without exposing customer transactions. The trade-off: higher complexity and potentially lower accuracy than centralized training.
How Federated Learning Works
The process follows a repeated cycle between a central coordinator and distributed participants.
Step 1: The central server initializes a model and distributes it to all participants. Each hospital, bank, or device receives the same starting model weights.
Step 2: Each participant trains the model on their local data. Hospital A trains on its chest X-ray images. Hospital B trains on its own. Neither sends images anywhere.
Step 3: Participants send their model updates (gradients or weight changes) — not their data — back to the central server.
Step 4: The server aggregates updates from all participants. The most common method is federated averaging: computing the weighted average of all participants' model updates, where the weight is proportional to each participant's dataset size.
Step 5: The updated global model is redistributed to all participants.
Step 6: Repeat until the model converges — typically 10 to several hundred rounds, depending on the task complexity and data distribution.
The result is a model that has learned from all participants' data without any participant's data leaving their premises. The aggregated model is usually better than any individual participant's model because it has seen a broader range of examples.
Types of Federation
Federated learning takes different forms depending on how data is distributed across participants.
Horizontal federation
Participants have the same features but different samples. Google's Gboard keyboard is the canonical example: hundreds of millions of Android phones each have the same data schema (typed words, selected suggestions, context) but different user-specific samples. The model learns typing patterns across all users without any text leaving any device.
Vertical federation
Participants have different features for overlapping entities. A bank and a retailer both serve the same customers but hold different data — the bank has income and credit history, the retailer has purchase behavior and browsing patterns. Vertical federation combines these complementary views without either party seeing the other's data. This requires secure techniques for entity alignment — matching customers across datasets without revealing identities.
Transfer federation
Participants have different features and different samples but work on related tasks. A model trained on English medical records at US hospitals can transfer knowledge to a French hospital with French records. The data schemas and patient populations are different, but the underlying medical patterns are related. Transfer federation combines federated learning with transfer learning to bridge these gaps.
Google's Gboard keyboard uses federated learning to improve next-word prediction across hundreds of millions of Android devices. The model trains on user typing patterns without any text leaving the device — a privacy guarantee that would be impossible with centralized data collection.
— McMahan et al., Communication-Efficient Learning of Deep Networks from Decentralized Data, AISTATS 2017
Privacy Mechanisms
Sharing model updates instead of raw data is a strong baseline, but it is not sufficient on its own. Research has shown that model gradients can sometimes be reverse-engineered to reconstruct training examples. Three additional mechanisms provide stronger privacy guarantees — each with a cost.
Differential privacy
Adds calibrated noise to model updates before they leave each participant. The noise is mathematically tuned so that the presence or absence of any single data point cannot be detected in the aggregated model. The trade-off: noise reduces model accuracy. A privacy budget (epsilon) controls this balance — lower epsilon means more privacy and more noise.
Secure aggregation
Cryptographic protocols that let the coordinator combine model updates from all participants without seeing any individual participant's update. The coordinator only ever sees the aggregate. This prevents a compromised or curious coordinator from learning what any single participant contributed. The cost: secure aggregation adds communication overhead and limits the types of aggregation operations possible.
Homomorphic encryption
Allows the coordinator to perform mathematical operations on encrypted model updates without decrypting them. The aggregated result is decrypted only by the participants who hold the decryption key. This provides the strongest theoretical guarantee but adds significant computational latency — operations on encrypted data are orders of magnitude slower than on plaintext.
Where Federated Learning Is Used
Google Gboard. The most widely deployed federated learning system. Hundreds of millions of Android devices train a next-word prediction model on local typing data. The model learns that after "I'm running" users often type "late" — without Google ever seeing what anyone typed. This is horizontal federation at massive scale, processing thousands of training rounds per day.
NVIDIA Clara. Medical imaging across hospitals. NVIDIA's Clara federated learning framework enables hospitals to train radiology models collaboratively — tumor segmentation, organ detection, pathology classification — without moving patient data outside hospital networks. Secure aggregation ensures no single hospital's patient information is exposed during the training process.
WeBank and Swiss Re. Cross-institutional credit risk modeling using vertical federation. WeBank (a Chinese digital bank) and Swiss Re (a global reinsurer) combined their complementary data — banking behavior and insurance claims — to build better risk models without sharing raw customer data. This is vertical federation: different features for overlapping customer populations.
Apple Siri. Apple uses on-device federated learning to improve Siri voice recognition and language understanding. Voice recordings stay on the user's device. Only model improvements — not audio — are sent to Apple's servers. This architectural choice is a deliberate privacy position: Apple cannot access the voice data even if compelled to, because it never leaves the device.
The HealthChain project demonstrated that federated learning models for breast density classification achieved 96.4% of the accuracy of centralized models, while keeping patient data at each hospital. The 3.6% accuracy gap was the cost of privacy preservation.
— Roth et al., Federated Learning for Breast Density Classification, Nature Medicine 2022
Technical Challenges
Non-IID data
Participants have different data distributions. A rural hospital sees more agricultural injuries and fewer cardiac cases than an urban trauma center. A phone used by a teenager generates different typing patterns than one used by a corporate executive. This non-independent and identically distributed (non-IID) data causes the global model to converge slowly and can bias it toward over-represented populations. Solutions include personalized federated learning (each participant fine-tunes the global model locally) and robust aggregation methods that weight participants based on data diversity rather than just size.
Communication overhead
A large language model has billions of parameters. Sending full model updates from thousands of participants each round requires enormous bandwidth. Gradient compression reduces update size by quantizing values or sending only the most significant changes. Federated distillation shares model outputs rather than weights. These techniques reduce communication by 10-100x but add engineering complexity and may slow convergence.
Stragglers and dropouts
Not all participants are equal. A mobile phone on a weak connection takes longer to train and upload than a hospital data center with dedicated bandwidth. Slow participants (stragglers) delay entire training rounds. Participants that disconnect mid-round (dropouts) waste the coordinator's round. Asynchronous protocols allow the coordinator to proceed without waiting for every participant. Participant selection strategies dynamically choose participants based on availability and connection quality.
Metadata Requirements for Federation
Federated learning requires participants to agree on data formats, feature definitions, and quality standards — without being able to inspect each other's data. This is a data governance problem operating across organizational boundaries.
Consider a federation of banks building a loan default model. Every participant must define "default" the same way. Is it 90 days past due? 180 days? Does restructured debt count? If one bank uses a 90-day definition and another uses 180 days, the model learns from inconsistent labels and produces unreliable predictions — and no participant can audit the other's data to catch the discrepancy.
This alignment requires shared metadata standards: a common business glossary that defines terms used in the federation, agreed-upon data schemas, and quality thresholds that each participant must meet before their data enters a training round. Data catalogs provide the shared vocabulary that federation requires. Feature stores ensure that the features computed from each participant's local data follow the same definitions and transformations.
How Dawiso Supports Federated Learning
Federated learning participants need shared metadata standards without sharing data. Dawiso's business glossary provides the common definitions that federation requires — what "active customer" means, how "default" is defined, which data quality thresholds apply.
The data catalog documents each participant's local data assets and their mapping to federated feature definitions. When a new participant joins a federation, the catalog makes it clear which local tables map to which federated features and where alignment gaps exist.
Through the Model Context Protocol (MCP), federated coordinators can verify that participant datasets meet schema and quality requirements before initiating training rounds. An automated pre-training check can confirm that each participant's "loan_default" column uses the agreed-upon definition, that data freshness meets the SLA, and that required columns are present — all without accessing the raw data.
Conclusion
Federated learning makes collaboration possible where data sharing is not. The technology is proven at scale — Google, Apple, and NVIDIA run production systems serving hundreds of millions of users. The technical challenges (non-IID data, communication costs, privacy mechanisms) are well-understood and have practical solutions. The harder problem is metadata governance: ensuring that distributed participants agree on what their data means, so the model they collaboratively build is learning from consistent, well-defined inputs.