requirements.txt
DenseFusion
News
We have released the code and arXiv preprint for our new project 6-PACK which is based on this work and used for category-level 6D pose tracking.
Table of Content
- Overview
- Requirements
- Code Structure
- Datasets
- Training
- Evaluation
- Results
- Trained Checkpoints
- Tips for your own dataset
- Citations
- License
Overview
This repository is the implementation code of the paper "DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion"(arXiv, Project, Video) by Wang et al. at Stanford Vision and Learning Lab and Stanford People, AI & Robots Group. The model takes an RGB-D image as input and predicts the 6D pose of the each object in the frame. This network is implemented using PyTorch and the rest of the framework is in Python. Since this project focuses on the 6D pose estimation process, we do not specifically limit the choice of the segmentation models. You can choose your preferred semantic-segmentation/instance-segmentation methods according to your needs. In this repo, we provide our full implementation code of the DenseFusion model, Iterative Refinement model and a vanilla SegNet semantic-segmentation model used in our real-robot grasping experiment. The ROS code of the real robot grasping experiment is not included.
Requirements
- Python 2.7/3.5/3.6 (If you want to use Python2.7 to run this repo, please rebuild the
lib/knn/
(with PyTorch 0.4.1).) - PyTorch 0.4.1 (PyTroch 1.0 branch)
- PIL
- scipy
- numpy
- pyyaml
- logging
- matplotlib
- CUDA 7.5/8.0/9.0 (Required. CPU-only will lead to extreme slow training speed because of the loss calculation of the symmetry objects (pixel-wise nearest neighbour loss).)
Code Structure
-
datasets
-
datasets/ycb
- datasets/ycb/dataset.py: Data loader for YCB_Video dataset.
-
datasets/ycb/dataset_config
- datasets/ycb/dataset_config/classes.txt: Object list of YCB_Video dataset.
- datasets/ycb/dataset_config/train_data_list.txt: Training set of YCB_Video dataset.
- datasets/ycb/dataset_config/test_data_list.txt: Testing set of YCB_Video dataset.
-
datasets/linemod
- datasets/linemod/dataset.py: Data loader for LineMOD dataset.
-
datasets/linemod/dataset_config:
- datasets/linemod/dataset_config/models_info.yml: Object model info of LineMOD dataset.
-
datasets/ycb
- replace_ycb_toolbox: Replacement codes for the evaluation with YCB_Video_toolbox.
-
trained_models
- trained_models/ycb: Checkpoints of YCB_Video dataset.
- trained_models/linemod: Checkpoints of LineMOD dataset.
-
lib
- lib/loss.py: Loss calculation for DenseFusion model.
- lib/loss_refiner.py: Loss calculation for iterative refinement model.
- lib/transformations.py: Transformation Function Library.
- lib/network.py: Network architecture.
- lib/extractors.py: Encoder network architecture adapted from pspnet-pytorch.
- lib/pspnet.py: Decoder network architecture.
- lib/utils.py: Logger code.
- lib/knn/: CUDA K-nearest neighbours library adapted from pytorch_knn_cuda.
-
tools
- tools/_init_paths.py: Add local path.
- tools/eval_ycb.py: Evaluation code for YCB_Video dataset.
- tools/eval_linemod.py: Evaluation code for LineMOD dataset.
- tools/train.py: Training code for YCB_Video dataset and LineMOD dataset.
-
experiments
-
experiments/eval_result
-
experiments/eval_result/ycb
- experiments/eval_result/ycb/Densefusion_wo_refine_result: Evaluation result on YCB_Video dataset without refinement.
- experiments/eval_result/ycb/Densefusion_iterative_result: Evaluation result on YCB_Video dataset with iterative refinement.
- experiments/eval_result/linemod: Evaluation results on LineMOD dataset with iterative refinement.
-
experiments/eval_result/ycb
- experiments/logs/: Training log files.
-
experiments/scripts
- experiments/scripts/train_ycb.sh: Training script on the YCB_Video dataset.
- experiments/scripts/train_linemod.sh: Training script on the LineMOD dataset.
- experiments/scripts/eval_ycb.sh: Evaluation script on the YCB_Video dataset.
- experiments/scripts/eval_linemod.sh: Evaluation script on the LineMOD dataset.
-
experiments/eval_result
- download.sh: Script for downloading YCB_Video Dataset, preprocessed LineMOD dataset and the trained checkpoints.
Datasets
This work is tested on two 6D object pose estimation datasets:
-
YCB_Video Dataset: Training and Testing sets follow PoseCNN. The training set includes 80 training videos 0000-0047 & 0060-0091 (choosen by 7 frame as a gap in our training) and synthetic data 000000-079999. The testing set includes 2949 keyframes from 10 testing videos 0048-0059.
-
LineMOD: Download the preprocessed LineMOD dataset (including the testing results outputted by the trained vanilla SegNet used for evaluation).
Download YCB_Video Dataset, preprocessed LineMOD dataset and the trained checkpoints (You can modify this script according to your needs.):
./download.sh
Training
- YCB_Video Dataset: After you have downloaded and unzipped the YCB_Video_Dataset.zip and installed all the dependency packages, please run:
./experiments/scripts/train_ycb.sh