Federated XGBoost Made Practical and Productive with NVIDIA FLARE

XGBoost is a highly effective and scalable machine learning algorithm widely employed for regression, classification, and ranking tasks. Building on the principles of gradient boosting, it combines the predictions of multiple weak learners, typically decision trees, to produce a robust overall model.

XGBoost excels with large datasets and complex data structures, thanks to its efficient implementation and advanced features such as regularization, parallel processing, and handling missing values. Its versatility and high performance have made it a popular choice in data science competitions and practical applications across various industries.

The XGBoost 1.7.0 release introduced Federated XGBoost, which enables multiple institutions to jointly train XGBoost models without needing to move data. In the XGBoost 2.0.0 release, this capability was further enhanced to support vertical federated learning. OSS Federated XGBoost provides Python APIs for simulations of XGBoost-based federated training.

Since 2023, NVIDIA Federated Learning Application Runtime Environment (FLARE) has introduced built-in integration with Federated XGBoost features: horizontal histogram-based and tree-based XGBoost, as well as vertical XGBoost. We have also added support for Private Set Intersection (PSI) for sample alignment as pre-step for vertical training.

With these integrations, you can prepare and run Federated XGBoost jobs in production or simulation without writing code. You only provide the dataset location, training parameters, and NVIDIA FLARE configurations for the job and data loading function.

In this post, we highlight the key features of NVIDIA FLARE 2.4.1 for real-world federated XGBoost learning. To conduct real-world federated training productively, you must be able to do the following:

Run multiple concurrent XGBoost training experiments with different experiments, feature combinations, or datasets.
Handle potential experiment failures due to unreliable network conditions or interruptions.
Monitor experiment progress through tracking systems such as MLflow or Weights & Biases.

Running multiple experiments concurrently

Data scientists often have to assess the impact of various hyperparameters or features on models. They experiment with different features or combinations of features using the same model.

NVIDIA FLARE parallel job execution capabilities enable you to conduct these experiments concurrently, significantly reducing the time required for training. NVIDIA FLARE manages communication multiplexing on behalf of the users and does not require opening new ports (typically done by IT support) for each job.

Figure 1 shows the execution of two Federated XGBoost jobs in NVIDIA FLARE.

Chart shows that each job has two clients shown as two visible curves. — *Figure 1. *Two concurrent XGBoost jobs with a unique set of features*. Each job has two clients shown as two visible curves*

Fault-tolerant XGBoost training

When dealing with cross-region or cross-border training, the network can be less reliable than on a corporate network, leading to periodic interruptions. These interruptions can cause communication failures, resulting in job interruptions and necessitating a restart from the beginning or from a saved snapshot.

The NVIDIA FLARE reliability features of its communicator for XGBoost automatically handle message retries during network interruptions, ensuring resilience and maintaining learning continuity and data integrity (Figure 2).

NVIDIA FLARE integrates seamlessly to facilitate communication between different XGBoost federated servers and clients, providing a robust and efficient solution for federated learning.

Diagram shows that XGBoost communication is replaced with NVIDIA FLARE communication. — *Figure 2. XGBoost communication is routed through the NVIDIA FLARE Communicator layer*

For more information and an end-to-end example, see the /NVIDIA/NVFlare GitHub repo.

Video 1. Federated XGBoost with NVIDIA FLARE

Federated experiment tracking

When you’re conducting machine learning training, especially in distributed settings like federated learning, it’s crucial to monitor training and evaluation metrics closely.

NVIDIA FLARE provides built-in integration with experiment tracking systems—MLflow, Weights & Biases, and TensorBoard—to facilitate comprehensive monitoring of these metrics.

Diagram shows centralized metrics streaming with FL clients streaming to the FL server and out to the tracking systems, compared to decentralized metrics streaming, where the clients stream directly to the tracking systems. — *Figure 3. Metrics streaming to the FL server or clients and delivered to different experiment tracking systems*

With NVIDIA FLARE, you can choose between decentralized and centralized tracking configurations:

Decentralized tracking: Each client manages its own metrics and experiment tracking server locally, maintaining training metric privacy. However, this setup limits the ability to compare data across different sites.
Centralized tracking: All metrics are streamed to a central FL server, which then pushes the data to a selected tracking system. This setup supports effective cross-site metric comparisons

The NVIDIA FLARE job configuration enables you to choose the tracking scenario or system that best fits your needs. When users need to migrate from one experiment tracking system to another, using NVIDIA FLARE, you can modify the job configuration without re-writing the experiment tracking code.

To add MLflow, Weights & Bias, or TensorBoard logging to efficiently stream metrics to the respective server requires just three lines of code:

from nvflare.client.tracking import MLflowWriter

mlflow = MLflowWriter()

mlflow.log_metric("loss", running_loss / 2000, global_step)

The Nvflare.client.tracking API enables you to flexibly redirect your logging metrics to any destination. The use of MLflow, Weights & Biases, or TensorBoard syntax doesn’t really matter here as you can stream the collected metrics to any supported experiment tracking system. Choosing to use MLflowWriter, WandBWriter, or TBWriter is based on your existing code and requirements.

MLflowWriter uses the MLflow API operation log_metric.
TBWriter uses the TensorBoard SummaryWriter operation add_scalar.
WandBWriter uses the API operation log.

Depending on your existing code or familiarity with these systems, you can choose any writer. After you’ve modified the training code, you can use the NVIDIA FLARE job configuration to configure the system to stream the logs appropriately.

For more information, see the FedAvg with SAG workflow with MLflow tracking tutorial.

Summary

In this post, we described the reliable XGBoost and experiment tracking support features of NVIDIA FLARE 2.4.x in great technical detail. For more information, see the /NVIDIA/NVFlare 2.4 branch on GitHub and the NVIDIA FLARE 2.4 documentation.

Any questions or comments? Reach out to us at [email protected].