Applying platform SDK and CLI to run a Deep Learning application
The more detailed guide to the deep learning showcase can be read here
Assumed Knowledge
This guide assumes you're already familiar with the basic concepts of federated learning. If not, read the background docs on Coding an Analysis and the Core SDK.
Summary
This tutorial shows how to train a deep learning based image classifier on FLAME using the external frameworks such as PyTorch and the provided reference script Image_classifier_training.py. It is a demonstration workflow whose main purpose is to demonstrate platform capabilities for machine learning applications.
Download
Download the full reference script: Image_classifier_training.py
Goal
Briefly train the neural network based image classifier for a few epochs inside the multi-round federated analysis: handle model weight exchange and aggregation, enforce convergence criteria based on the change in loss function, and compute basic metrics across nodes without moving raw image datasets between nodes.
By the end of this tutorial you will learn how to use Star patterns, and how to collect the results of federated training.
The reason why we use Python as the language of choice is that there is no better alternative for this kind of application due to its suitable ecosystem
What does the analysis code do?
Brief overview:
- Analyzer runs network training for few specified number of epochs, then returns a dictionary with updated weights, loss value
- The aggregator subclass computes federated average of the returned model weights, loss and its metrics received from analyzer node, and checks convergence criterion each round
- Upon the training is finished the results are serialized and saved as the pickle file in the Hub storage
Prerequisites
- Properly prepared S3 buckets on MinIO Object Store that include the tarball archive of the used dataset
- Configured datastores on each participating node that refer to respective S3 buckets
- A deep learning master image with all necessary dependencies being available
Output Structure
An expected real output is a serialized dictionary that includes the following keys with values of data types as shown in the mapping below:
{'model':torch.Tensor,
'loss':float,
'num_classes':int,
'prediction_scores':List[float],
'accuracy':float,
'avg_f1':float,
'avg_precision':float,
'avg_recall':float
}Common issues
Additional references
Author: Gherman Sergey