nnUNet: nnU-Net is the first segmentation method that is designed to deal with t ...

原作者: [db:作者] 来自: 网络收藏邀请

开源软件名称：

nnUNet

开源软件地址：

https://gitee.com/RitchieAlpha/nnUNet

开源软件介绍：

[2020_10_21] Update: We now have documentation for common questions andcommon issues. We now also provide reference epoch times forseveral datasets and tips on how to identify bottlenecks.

Please read these documents before opening a new issue!

nnU-Net

In 3D biomedical image segmentation, dataset properties like imaging modality, image sizes, voxel spacings, classratios etc vary drastically.For example, images in the Liver and Liver Tumor Segmentation Challenge datasetare computed tomography (CT) scans, about 512x512x512 voxels large, have isotropic voxel spacings and theirintensity values are quantitative (Hounsfield Units).The Automated Cardiac Diagnosis Challenge dataset on the other hand shows cardiacstructures in cine MRI with a typical image shape of 10x320x320 voxels, highly anisotropic voxel spacings andqualitative intensity values. In addition, the ACDC dataset suffers from slice misalignments and a heterogeneity ofout-of-plane spacings which can cause severe interpolation artifacts if not handled properly.

In current research practice, segmentation pipelines are designed manually and with one specific dataset in mind.Hereby, many pipeline settings depend directly or indirectly on the properties of the datasetand display a complex co-dependence: image size, for example, affects the patch size, which inturn affects the required receptive field of the network, a factor that itself influences several otherhyperparameters in the pipeline. As a result, pipelines that were developed on one (type of) dataset are inherentlyincomaptible with other datasets in the domain.

nnU-Net is the first segmentation method that is designed to deal with the dataset diversity found in the domain. Itcondenses and automates the keys decisions for designing a successful segmentation pipeline for any given dataset.

nnU-Net makes the following contributions to the field:

Standardized baseline: nnU-Net is the first standardized deep learning benchmark in biomedical segmentation.Without manual effort, researchers can compare their algorithms against nnU-Net on an arbitrary number of datasetsto provide meaningful evidence for proposed improvements.
Out-of-the-box segmentation method: nnU-Net is the first plug-and-play tool for state-of-the-art biomedicalsegmentation. Inexperienced users can use nnU-Net out of the box for their custom 3D segmentation problem withoutneed for manual intervention.
Framework: nnU-Net is a framework for fast and effective development of segmentation methods. Due to its modularstructure, new architectures and methods can easily be integrated into nnU-Net. Researchers can then benefit from itsgeneric nature to roll out and evaluate their modifications on an arbitrary number of datasets in astandardized environment.

For more information about nnU-Net, please read the following paper:

Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2020). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 1-9.

Please also cite this paper if you are using nnU-Net for your research!

Installation
Usage
Extending/Changing nnU-Net
Information on run time and potential performance bottlenecks.
Common questions and issues

Installation

nnU-Net has been tested on Linux (Ubuntu 16, 18 and 20; centOS, RHEL). We do not provide support for other operatingsystems.

nnU-Net requires a GPU! For inference, the GPU should have 4 GB of VRAM. For training nnU-Net models the GPU should have atleast 10 GB (popular non-datacenter options are the RTX 2080ti, RTX 3080 or RTX 3090). Due to the use of automated mixedprecision, fastest training times are achieved with the Volta architecture (Titan V, V100 GPUs) when installing pytorchthe easy way. Since pytorch comes with cuDNN 7.6.5 and tensor core acceleration on Turing GPUs is not supported for 3Dconvolutions in this version, you will not get the best training speeds on Turing GPUs. You can remedy that by compiling pytorch from source(see here) using cuDNN 8.0.2 or newer. This will unlock Turing GPUs(RTX 2080ti, RTX 6000) for automated mixed precision training with 3D convolutions and make the training blisteringfast as well. Note that future versions of pytorch may include cuDNN 8.0.2 or newer by default andcompiling from source will not be necessary.We don't know the speed of Ampere GPUs with vanilla vs self-compiled pytorch yet - this section will be updated assoon as we know.

For training, we recommend a strong CPU to go along with the GPU. At least 6 CPU cores (12 threads) are recommended. CPUrequirements are mostly related to data augmentation and scale with the number of input channels. They are thus higherfor datasets like BraTS which use 4 image modalities and lower for datasets like LiTS which only uses CT images.

We very strongly recommend you install nnU-Net in a virtual environment.Here is a quick how-to for Ubuntu.If you choose to compile pytorch from source, you will need to use conda instead of pip. In that case, please set theenvironment variable OMP_NUM_THREADS=1 (preferably in your bashrc using export OMP_NUM_THREADS=1). This is important!

Python 2 is deprecated and not supported. Please make sure you are using Python 3.

Install PyTorch. You need at least version 1.6
Install nnU-Net depending on your use case:
1. For use as standardized baseline, out-of-the-box segmentation algorithm or for running inference with pretrained models:
  pip install nnunet
2. For use as integrative framework (this will create a copy of the nnU-Net code on your computer so that you can modify it as needed):
```
git clone https://github.com/MIC-DKFZ/nnUNet.gitcd nnUNetpip install -e .
```
nnU-Net needs to know where you intend to save raw data, preprocessed data and trained models. For this you need toset a few of environment variables. Please follow the instructions here.
(OPTIONAL) Install hiddenlayer. hiddenlayer enables nnU-net to generateplots of the network topologies it generates (see Model training). To install hiddenlayer,run the following commands:
```
pip install --upgrade git+https://github.com/FabianIsensee/hiddenlayer.git@more_plotted_details#egg=hiddenlayer
```

Installing nnU-Net will add several new commands to your terminal. These commands are used to run the entire nnU-Netpipeline. You can execute them from any location on your system. All nnU-Net commands have the prefix nnUNet_ foreasy identification.

Note that these commands simply execute python scripts. If you installed nnU-Net in a virtual environment, thisenvironment must be activated when executing the commands.

All nnU-Net commands have a -h option which gives information on how to use them.

A typical installation of nnU-Net can be completed in less than 5 minutes. If pytorch needs to be compiled from source(which is what we currently recommend when using Turing GPUs), this can extend to more than an hour.

Usage

To familiarize yourself with nnU-Net we recommend you have a look at the Examples before you start withyour own dataset.

How to run nnU-Net on a new dataset

Given some dataset, nnU-Net fully automatically configures an entire segmentation pipeline that matches its properties.nnU-Net covers the entire pipeline, from preprocessing to model configuration, model training, postprocessingall the way to ensembling. After running nnU-Net, the trained model(s) can be applied to the test cases for inference.

Dataset conversion

nnU-Net expects datasets in a structured format. This format closely (but not entirely) follows the data structure ofthe Medical Segmentation Decthlon. Please readthis for information on how to convert datasets to be compatible with nnU-Net.

Experiment planning and preprocessing

As a first step, nnU-Net extracts a dataset fingerprint (a set of dataset-specific properties such asimage sizes, voxel spacings, intensity information etc). This information is used to create three U-Net configurations:a 2D U-Net, a 3D U-Net that operated on full resolution images as well as a 3D U-Net cascade where the first U-Netcreates a coarse segmentation map in downsampled images which is then refined by the second U-Net.

Provided that the requested raw dataset is located in the correct folder (nnUNet_raw_data_base/nnUNet_raw_data/TaskXXX_MYTASK,also see here), you can run this step with the following command:

nnUNet_plan_and_preprocess -t XXX --verify_dataset_integrity

XXX is the integer identifier associated with your Task name TaskXXX_MYTASK. You can pass several task IDs at once.

Running nnUNet_plan_and_preprocess will populate your folder with preprocessed data. You will find the output innnUNet_preprocessed/TaskXXX_MYTASK. nnUNet_plan_and_preprocess creates subfolders with preprocessed data for the 2DU-Net as well as all applicable 3D U-Nets. It will also create 'plans' files (with the ending.pkl) for the 2D and3D configurations. These files contain the generated segmentation pipeline configuration and will be read by thennUNetTrainer (see below). Note that the preprocessed data folder only contains the training cases.The test images are not preprocessed (they are not looked at at all!). Their preprocessing happens on the fly duringinference.

--verify_dataset_integrity should be run at least for the first time the command is run on a given dataset. This will execute somechecks on the dataset to ensure that it is compatible with nnU-Net. If this check has passed once, it can beomitted in future runs. If you adhere to the dataset conversion guide (see above) then this should pass without issues :-)

Note that nnUNet_plan_and_preprocess accepts several additional input arguments. Running -h will list all of themalong with a description. If you run out of RAM during preprocessing, you may want to adapt the number of processesused with the -tl and -tf options.

After nnUNet_plan_and_preprocess is completed, the U-Net configurations have been created and a preprocessed copyof the data will be located at nnUNet_preprocessed/TaskXXX_MYTASK.

Extraction of the dataset fingerprint can take from a couple of seconds to several minutes depending on the propertiesof the segmentation task. Pipeline configuration given the extracted finger print is nearly instantaneous (coupleof seconds). Preprocessing depends on image size and how powerful the CPU is. It can take between seconds and severaltens of minutes.

Model training

nnU-Net trains all U-Net configurations in a 5-fold cross-validation. This enables nnU-Net to determine thepostprocessing and ensembling (see next step) on the training dataset. Per default, all U-Net configurations need tobe run on a given dataset. There are, however situations in which only some configurations (and maybe even withoutrunning the cross-validation) are desired. See FAQ for more information.

Note that not all U-Net configurations are created for all datasets. In datasets with small image sizes, the U-Netcascade is omitted because the patch size of the full resolution U-Net already covers a large part of the input images.

Training models is done with the nnUNet_train command. The general structure of the command is:

nnUNet_train CONFIGURATION TRAINER_CLASS_NAME TASK_NAME_OR_ID FOLD  --npz (additional options)

CONFIGURATION is a string that identifies the requested U-Net configuration. TRAINER_CLASS_NAME is the name of themodel trainer. If you implement custom trainers (nnU-Net as a framework) you can specify your custom trainer here.TASK_NAME_OR_ID specifies what dataset should be trained on and FOLD specifies which fold of the 5-fold-cross-validatonis trained.

nnU-Net stores a checkpoint every 50 epochs. If you need to continue a previous training, just add a -c to thetraining command.

IMPORTANT: --npz makes the models save the softmax outputs during the final validation. It should only be used for trainingswhere you plan to run nnUNet_find_best_configuration afterwards(this is nnU-Nets automated selection of the best performing (ensemble of) configuration(s), see below). If you are developing newtrainer classes you may not need the softmax predictions and should therefore omit the --npz flag. Exported softmaxpredictions are very large and therefore can take up a lot of disk space.If you ran initially without the --npz flag but now require the softmax predictions, simply run

nnUNet_train CONFIGURATION TRAINER_CLASS_NAME TASK_NAME_OR_ID FOLD -val --npz

to generate them. This will only rerun the validation, not the training.

See nnUNet_train -h for additional options.

2D U-Net

For FOLD in [0, 1, 2, 3, 4], run:

nnUNet_train 2d nnUNetTrainerV2 TaskXXX_MYTASK FOLD --npz

3D full resolution U-Net

For FOLD in [0, 1, 2, 3, 4], run:

nnUNet_train 3d_fullres nnUNetTrainerV2 TaskXXX_MYTASK FOLD --npz

3D U-Net cascade

3D low resolution U-Net

For FOLD in [0, 1, 2, 3, 4], run:

nnUNet_train 3d_lowres nnUNetTrainerV2 TaskXXX_MYTASK FOLD --npz

3D full resolution U-Net

For FOLD in [0, 1, 2, 3, 4], run:

nnUNet_train 3d_cascade_fullres nnUNetTrainerV2CascadeFullRes TaskXXX_MYTASK FOLD --npz

Note that the 3D full resolution U-Net of the cascade requires the five folds of the low resolution U-Net to becompleted beforehand!

The trained models will we written to the RESULTS_FOLDER/nnUNet folder. Each training obtains an automatically generatedoutput folder name:

nnUNet_preprocessed/CONFIGURATION/TaskXXX_MYTASKNAME/TRAINER_CLASS_NAME__PLANS_FILE_NAME/FOLD

For Task002_Heart (from the MSD), for example, this looks like this:

RESULTS_FOLDER/nnUNet/├── 2d│   └── Task02_Heart│       └── nnUNetTrainerV2__nnUNetPlansv2.1│           ├── fold_0│           ├── fold_1│           ├── fold_2│           ├── fold_3│           └── fold_4├── 3d_cascade_fullres├── 3d_fullres│   └── Task02_Heart│       └── nnUNetTrainerV2__nnUNetPlansv2.1│           ├── fold_0│           │   ├── debug.json│           │   ├── model_best.model│           │   ├── model_best.model.pkl│           │   ├── model_final_checkpoint.model│           │   ├── model_final_checkpoint.model.pkl│           │   ├── network_architecture.pdf│           │   ├── progress.png│           │   └── validation_raw│           │       ├── la_007.nii.gz│           │       ├── la_007.pkl│           │       ├── la_016.nii.gz│           │       ├── la_016.pkl│           │       ├── la_021.nii.gz│           │       ├── la_021.pkl│           │       ├── la_024.nii.gz│           │       ├── la_024.pkl│           │       ├── summary.json│           │       └── validation_args.json│           ├── fold_1│           ├── fold_2│           ├── fold_3│           └── fold_4└── 3d_lowres

Note that 3d_lowres and 3d_cascade_fullres are not populated because this dataset did not trigger the cascade. In eachmodel training output folder (each of the fold_x folder, 10 in total here), the following files will be created (onlyshown for one folder above for brevity):

debug.json: Contains a summary of blueprint and inferred parameters used for training this model. Not easy to read,but very useful for debugging ;-)
model_best.model / model_best.model.pkl: checkpoint files of the best model identified during training. Not used right now.
model_final_checkpoint.model / model_final_checkpoint.model.pkl: checkpoint files of the final model (after traininghas ended). This is what is used for both validation and inference.
network_architecture.pdf (only if hiddenlayer is installed!): a pdf document with a figure of the network architecture in it.
progress.png: A plot of the training (blue) and validation (red) loss during training. Also shows an approximation ofthe evlauation metric (green). This approximation is the average Dice score of the foreground classes. It should,however, only to be taken with a grain of salt because it is computed on randomly drawn patches from the validationdata at the end of each epoch, and the aggregation of TP, FP and FN for the Dice computation treats the patches as ifthey all originate from the same volume ('global Dice'; we do not compute a Dice for each validation case and thenaverage over all cases but pretend that there is only one validation case from which we sample patches). The reason forthis is that the 'global Dice' is easy to compute during training and is still quite useful to evaluate whether a modelis training at all or not. A proper validation is run at the end of the training.
validation_raw: in this folder are the predicted validation cases after the training has finished. The summary.jsoncontains the validation metrics (a mean over all cases is provided at the end of the file).

During training it is often useful to watch the progress. We therefore recommend that you have a look at the generatedprogress.png when running the first training. It will be updated after each epoch.

Training times largely depend on the GPU. The smallest GPU we recommend for training is the Nvidia RTX 2080ti. Withthis GPU (and pytorch compiled with cuDNN 8.0.2), all network trainings take less than 2 days.

Multi GPU training

Multi GPU training is experimental and NOT RECOMMENDED!

nnU-Net supports two different multi-GPU implementation: DataParallel (DP) and Distributed Data Parallel (DDP)(but currently only on one host!). DDP is faster than DP and should be preferred if possible. However, if you did notinstall nnunet as a framework (meaning you used the pip install nnunet variant), DDP is not available. It requires adifferent way of calling the correct python script (see below) which we cannot support from our terminal commands.

Distributed training currently only works for the basic trainers (2D, 3D full resolution and 3D low resolution) and notfor the second, high resolution U-Net of the cascade. The reason for this is that distributed training requires somechanges to the network and loss function, requiring a new nnUNet trainer class. This is, as of now, simply notimplemented for the cascade, but may be added in the future.

To run distributed training (DP), use the following command:

CUDA_VISIBLE_DEVICES=0,1,2... nnUNet_train_DP CONFIGURATION nnUNetTrainerV2_DP TASK_NAME_OR_ID FOLD -gpus GPUS --dbs

Note that nnUNetTrainerV2 was replaced with nnUNetTrainerV2_DP. Just like before, CONFIGURATION can be 2d, 3d_lowres or3d_fullres. TASK_NAME_OR_ID refers to the task you would like to train and FOLD is the fold of the cross-validation.GPUS (integer value) specifies the number of GPUs you wish to train on. To specify which GPUs you want to use, please make use of theCUDA_VISIBLE_DEVICES envorinment variable to specify the GPU ids (specify as many as you configure with -gpus GPUS).--dbs, if set, will distribute the batch size across GPUs. So if nnUNet configures a batch size of 2 and you run on 2 GPUs, each GPU will run with a batch size of 1. If you omit --dbs, each GPU will run with the full batch size (2 for each GPUin this example for a total of batch size 4).

To run the DDP training you must have nnU-Net installed as a framework. Your current working directory must be thennunet folder (the one that has the dataset_conversion, evaluation, experiment_planning, ... subfolders!). You can then runthe DDP training with the following command:

CUDA_VISIBLE_DEVICES=0,1,2... python -m torch.distributed.launch --master_port=XXXX --nproc_per_node=Y run/run_training_DDP.py CONFIGURATION nnUNetTrainerV2_DDP TASK_NAME_OR_ID FOLD --dbs

XXXX must be an open port for process-process communication (something like 4321 will do on most systems). Y is thenumber of GPUs you wish to use. Remember that we do not (yet) support distributed training across compute nodes. Thisall happens on the same system. Again, you can use CUDA_VISIBLE_DEVICES=0,1,2 to control what GPUs are used.If you run more than one DDP training on the same system (say you have 4 GPUs and you run two training with 2 GPUs each)you need to specify a different --master_port for each training!

IMPORTANT!Multi-GPU training results in models that cannot be used for inference easily (as said above, all of this is experimental ;-) ).After finishing the training of all folds, run nnUNet_change_trainer_class on the folder where the trained model is(see nnUNet_change_trainer_class -h for instructions). After that you can run inference.

Identifying the best U-Net configuration

Once all models are trained, use the followingcommand to automatically determine what U-Net configuration(s) to use for test set prediction:

nnUNet_find_best_configuration -m 2d 3d_fullres 3d_lowres 3d_cascade_fullres -t XXX --strict

(all 5 folds need to be completed for all specified configurations!)

On datasets for which the cascade was not configured, use -m 2d 3d_fullres instead. If you wish to only explore somesubset of the configurations, you can specify that with the -m command. We recommend setting the--strict (crash if one of the requested configurations ismissing) flag. Additional options are available (use -h for help).

Run inference

Remember that the data located in the input folder must adhere to the format specifiedhere.

nnUNet_find_best_configuration will print a string to the terminal with the inference commands you need to use.The easiest way to run inference is to simply use these commands.

If you wish to manually specify the configuration(s) used for inference, use the following commands:

For each of the desired configurations, run:

nnUNet_predict -i INPUT_FOLDER -o OUTPUT_FOLDER -t TASK_NAME_OR_ID -m CONFIGURATION --save_npz

Only specify --save_npz if you intend to use ensembling. --save_npz will make the command save the softmaxprobabilities alongside of the predicted segmentation masks requiring a lot of disk space.

Please select a separate OUTPUT_FOLDER for each configuration!

If you wish to run ensembling, you can ensemble the predictions from several configurations with the following command:

nnUNet_ensemble -f FOLDER1 FOLDER2 ... -o OUTPUT_FOLDER -pp POSTPROCESSING_FILE

You can specify an arbitrary number of folders, but remember that each folder needs to contain npz files that weregenerated by nnUNet_predict. For ensembling you can also specify a file that tells the command how to postprocess.These files are created when running nnUNet_find_best_configuration and are located in the respective trained modeldirectory (RESULTS_FOLDER/nnUNet/CONFIGURATION/TaskXXX_MYTASK/TRAINER_CLASS_NAME__PLANS_FILE_IDENTIFIER/postprocessing.json orRESULTS_FOLDER/nnUNet/ensembles/TaskXXX_MYTASK/ensemble_X__Y__Z--X__Y__Z/postprocessing.json). You can also choose tonot provide a file (simply omit -pp) and nnU-Net will not run postprocessing.

Note that per default, inference will be done with all available folds. We very strongly recommend you use all 5 folds.Thus, all 5 folds must have been trained prior to running inference. The list of available folds nnU-Net found will beprinted at the start of the inference.

How to run inference with pretrained models

Trained models for all challenges we participated in are publicly available. They can be downloaded and installeddirectly with nnU-Net. Note that downloading a pretrained model will overwrite other models that were trained withexactly the same configuration (2d, 3d_fullres, ...), trainer (nnUNetTrainerV2) and plans.

To obtain a list of available models, as well as a short description, run

nnUNet_print_available_pretrained_models

You can then download models by specifying their task name. For the Liver and Liver Tumor Segmentation Challenge,for example, this would be:

nnUNet_download_pretrained_model Task029_LiTS

After downloading is complete, you can use this model to run inference. Keep in mind that each ofthese models has specific data requirements (Task029_LiTS runs on abdominal CT scans, others require several imagemodalities as input in a specific order).

When using the pretrained models you must adhere to the license of the dataset they are trained on! If you runnnUNet_download_pretrained_model you will find a link where you can find the license for each dataset.

Examples

To get you started we compiled two simple to follow examples:

run a training with the 3d full resolution U-Net on the Hippocampus dataset. See here.
run inference with nnU-Net's pretrained models on the Prostate dataset. See here.

Usability not good enough? Let us know!

Extending or Changing nnU-Net

Please refer to this guide.

Information on run time and potential performance bottlenecks.

We have compiled a list of expected epoch times on standardized datasets across many different GPUs. You can use themto verify that your system is performing as expected. There are also tips on how to identify bottlenecks and whatto do about them.

Click here.