To conduct arbitrary-modal semantic segmentation, we create DeLiVER benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. It has four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. Besides, we present the arbitrary cross-modal segmentation model CMNeXt, allowing to scale from 1 to 81 modalities on the DeLiVER, KITTI-360, MFNet, NYU Depth V2, UrbanLF, and MCubeS datasets.
For more details, please check our arXiv paper.
DeLiVER multimodal dataset including (a) four adverse conditions out of five conditions(cloudy, foggy, night-time, rainy and sunny). Apart from normal cases, each condition has five corner cases (MB: Motion Blur; OE: Over-Exposure; UE: Under-Exposure; LJ: LiDAR-Jitter; and EL: Event Low-resolution). Each sample has six views. Each view has four modalities and two labels (semantic and instance). (b) is the data statistics. (c) is the data distribution of 25 semantic classes.
Download DELIVER dataset from **GoogleDrive** (~12.2 GB).
The data/DELIVER
folder is structured as:
DELIVER
├── depth
│ ├── cloud
│ │ ├── test
│ │ │ ├── MAP_10_point102
│ │ │ │ ├── 045050_depth_front.png
│ │ │ │ ├── ...
│ │ ├── train
│ │ └── val
│ ├── fog
│ ├── night
│ ├── rain
│ └── sun
├── event
├── hha
├── img
├── lidar
└── semantic
CMNeXt architecture in Hub2Fuse paradigm and asymmetric branches, having e.g., Multi-Head Self-Attention (MHSA) blocks in the RGB branch and our Parallel Pooling Mixer (PPX) blocks in the accompanying branch. At the hub step, the Self-Query Hub selects informative features from the supplementary modalities. At the fusion step, the feature rectification module (FRM) and feature fusion module (FFM) are used for feature fusion. Between stages, features of each modality are restored via adding the fused feature. The four-stage fused features are forwarded to the segmentation head for the final prediction.
conda env create -f environment.yml
conda activate cmnext
# Optional: install apex follow: https://github.com/NVIDIA/apex
Prepare six datasets:
Then, all datasets are structured as:
data/
├── DELIVER
│ ├── img
│ ├── hha
│ ├── event
│ ├── lidar
│ └── semantic
├── KITTI-360
│ ├── data_2d_raw
│ ├── data_2d_hha
│ ├── data_2d_event
│ ├── data_2d_lidar
│ └── data_2d_semantics
├── NYUDepthv2
│ ├── RGB
│ ├── HHA
│ └── Label
├── MFNet
│ ├── rgb
│ ├── ther
│ └── labels
├── UrbanLF
│ ├── Syn
│ └── real
├── MCubeS
│ ├── polL_color
│ ├── polL_aolp
│ ├── polL_dolp
│ ├── NIR_warped
│ └── SS
For RGB-Depth, the HHA format is generated from depth image.
Model-Modal | #Params(M) | GFLOPs | mIoU | weight |
---|---|---|---|---|
CMNeXt-RGB | 25.79 | 38.93 | 57.20 | GoogleDrive |
CMNeXt-RGB-E | 58.69 | 62.94 | 57.48 | GoogleDrive |
CMNeXt-RGB-L | 58.69 | 62.94 | 58.04 | GoogleDrive |
CMNeXt-RGB-D | 58.69 | 62.94 | 63.58 | GoogleDrive |
CMNeXt-RGB-D-E | 58.72 | 64.19 | 64.44 | GoogleDrive |
CMNeXt-RGB-D-L | 58.72 | 64.19 | 65.50 | GoogleDrive |
CMNeXt-RGB-D-E-L | 58.73 | 65.42 | 66.30 | GoogleDrive |
Model-Modal | mIoU | weight |
---|---|---|
CMNeXt-RGB | 67.04 | GoogleDrive |
CMNeXt-RGB-E | 66.13 | GoogleDrive |
CMNeXt-RGB-L | 65.26 | GoogleDrive |
CMNeXt-RGB-D | 65.09 | GoogleDrive |
CMNeXt-RGB-D-E | 67.73 | GoogleDrive |
CMNeXt-RGB-D-L | 66.55 | GoogleDrive |
CMNeXt-RGB-D-E-L | 67.84 | GoogleDrive |
| Model-Modal | mIoU | weight | | :————— | :—– | :—— | | CMNeXt-RGB-D (MiT-B4) | 56.9 | GoogleDrive |
| Model-Modal | mIoU | weight | | :————— | :—– | :—— | | CMNeXt-RGB-D (MiT-B4) | 59.9 | GoogleDrive |
There are real and synthetic datasets.
Model-Modal | Real | weight | Syn | weight |
---|---|---|---|---|
CMNeXt-RGB | 82.20 | GoogleDrive | 78.53 | GoogleDrive |
CMNeXt-RGB-LF8 | 83.22 | GoogleDrive | 80.74 | GoogleDrive |
CMNeXt-RGB-LF33 | 82.62 | GoogleDrive | 80.98 | GoogleDrive |
CMNeXt-RGB-LF80 | 83.11 | GoogleDrive | 81.02 | GoogleDrive |
| Model-Modal | mIoU | weight | | :————— | :—– | :—– | | CMNeXt-RGB | 48.16 | GoogleDrive | | CMNeXt-RGB-A | 48.42 | GoogleDrive | | CMNeXt-RGB-A-D | 49.48 | GoogleDrive | | CMNeXt-RGB-A-D-N | 51.54 | GoogleDrive |
Before training, please download pre-trained SegFormer, such as checkpoints/pretrained/segformer/mit_b2.pth
.
checkpoints/pretrained/segformer
├── mit_b2.pth
└── mit_b4.pth
To train CMNeXt model, please use change yaml file for --cfg
. Several training examples using 4 A100 GPUs are:
cd path/to/DELIVER
conda activate cmnext
export PYTHONPATH="path/to/DELIVER"
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/deliver_rgbdel.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/kitti360_rgbdel.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/nyu_rgbd.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/mfnet_rgbt.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/mcubes_rgbadn.yaml
python -m torch.distributed.launch --nproc_per_node=4 --use_env tools/train_mm.py --cfg configs/urbanlf.yaml
To evaluate CMNeXt models, please download respective model weights (GoogleDrive) as:
output/
├── DELIVER
│ ├── cmnext_b2_deliver_rgb.pth
│ ├── cmnext_b2_deliver_rgbd.pth
│ ├── cmnext_b2_deliver_rgbde.pth
│ ├── cmnext_b2_deliver_rgbdel.pth
│ ├── cmnext_b2_deliver_rgbdl.pth
│ ├── cmnext_b2_deliver_rgbe.pth
│ └── cmnext_b2_deliver_rgbl.pth
├── KITTI360
│ ├── cmnext_b2_kitti360_rgb.pth
│ ├── cmnext_b2_kitti360_rgbd.pth
│ ├── cmnext_b2_kitti360_rgbde.pth
│ ├── cmnext_b2_kitti360_rgbdel.pth
│ ├── cmnext_b2_kitti360_rgbdl.pth
│ ├── cmnext_b2_kitti360_rgbe.pth
│ └── cmnext_b2_kitti360_rgbl.pth
├── MCubeS
│ ├── cmnext_b2_mcubes_rgb.pth
│ ├── cmnext_b2_mcubes_rgba.pth
│ ├── cmnext_b2_mcubes_rgbad.pth
│ └── cmnext_b2_mcubes_rgbadn.pth
├── MFNet
│ └── cmnext_b4_mfnet_rgbt.pth
├── NYU_Depth_V2
│ └── cmnext_b4_nyu_rgbd.pth
├── UrbanLF
│ ├── cmnext_b4_urbanlf_real_rgblf1.pth
│ ├── cmnext_b4_urbanlf_real_rgblf33.pth
│ ├── cmnext_b4_urbanlf_real_rgblf8.pth
│ ├── cmnext_b4_urbanlf_real_rgblf80.pth
│ ├── cmnext_b4_urbanlf_syn_rgblf1.pth
│ ├── cmnext_b4_urbanlf_syn_rgblf33.pth
│ ├── cmnext_b4_urbanlf_syn_rgblf8.pth
│ └── cmnext_b4_urbanlf_syn_rgblf80.pth
Then, modify --cfg
to respective config file, and run:
cd path/to/DELIVER
conda activate cmnext
export PYTHONPATH="path/to/DELIVER"
CUDA_VISIBLE_DEVICES=0 python tools/val_mm.py --cfg configs/deliver_rgbdel.yaml
On DeLiVER dataset, there are validation and test sets. Please check val_mm.py to modify the dataset for validation and test sets.
To evaluate the different cases (adverse weather conditions, sensor failures), modify the cases
list at val_mm.py, as shown below:
# cases = ['cloud', 'fog', 'night', 'rain', 'sun']
# cases = ['motionblur', 'overexposure', 'underexposure', 'lidarjitter', 'eventlowres']
cases = [None] # all
Note that the default value is [None]
for all cases.
The visualization results on DELIVER dataset. From left to right are the respective cloudy, foggy, night and rainy scene.
Thanks for the public repositories:
This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.
If you use DeLiVer dataset and CMNeXt model, please cite the following works:
@inproceedings{zhang2023delivering,
title={Delivering Arbitrary-Modal Semantic Segmentation},
author={Zhang, Jiaming and Liu, Ruiping and Shi, Hao and Yang, Kailun and Rei{\ss}, Simon and Peng, Kunyu and Fu, Haodong and Wang, Kaiwei and Stiefelhagen, Rainer},
booktitle={CVPR},
year={2023}
}
@article{zhang2023cmx,
title={CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers},
author={Zhang, Jiaming and Liu, Huayao and Yang, Kailun and Hu, Xinxin and Liu, Ruiping and Stiefelhagen, Rainer},
journal={IEEE Transactions on Intelligent Transportation Systems},
year={2023}
}