Skip to content
Snippets Groups Projects
Guillaume Duret's avatar
Guillaume Duret authored
abba3d70

FruitBin

The early released dataset can be accesible to the link : https://datasets.liris.cnrs.fr/fruitbin-version1

The datasets for the 8 scenarios are currently available in compressed zip files under the "scenarios" section. The release of the large-scale dataset, which encompasses a broader range of data, will be made available at the same location as soon as possible, within a few weeks. Considering the substantial size of the dataset, users will have the option to selectively download specific data subsets based on their requirements, such as RGB, depth, and other features, allowing for more efficient and targeted data retrieval.

Detailed information regarding the dataset generation process can be accessed through the following links: "https://gitlab.liris.cnrs.fr/gduret/PickSim" for insights into the PickSim generation, and this repositiry for access to the processing codes and procedures involved in creating FruitBin, as well as the scenarios designed for training purposes. These resources provide valuable documentation and code references for a better understanding of the dataset's origins and the steps taken to prepare it for various applications.

To facilitate the reproduction of the benchmark results, we have provided the code for PVNet, Densefusion and GDRNPP along with detailed instructions on how to apply them to the FruitBin dataset. You can access the code at the following links: "https://gitlab.liris.cnrs.fr/gduret/pvnet_fruitbin", "https://gitlab.liris.cnrs.fr/gduret/DenseFusion" and "https://gitlab.liris.cnrs.fr/gduret/gdrnpp_bop2022". These resources will guide you through the implementation of PVNet, Densefusion and GDRNPP on the FruitBin dataset, enabling you to replicate the benchmark and further explore the capabilities of the dataset.

Getting started

The expected usage involves utilizing the following command:

python main.py --World_begin="$id_begin" --Nb_world="$Nb" --dataset_id="$id_dataset" --occlusion_target_min=$occlusion_min --occlusion_target_max=$occlusion_max --rearrange=$rearrange --compute=$compute

as the exemple :

python main.py --World_begin=1 --Nb_world=10000 --dataset_id=1 --occlusion_target_min=0.7 --occlusion_target_max=1.0 --rearrange=yes --compute=yes

The following table presents the different input parameters:

Parameter Description
World_begin The ID of the first scene to be processed.
Nb_world The number of scenes to be processed.
dataset_id The ID of the dataset to be processed. It is used to determine the path of the data to be processed.
rearrange Determines whether the script should perform data rearrangement from the data generated by PickSim.
compute Determines whether the script should perform post-processing for a specific scenario.
occlusion_target_min In the case of scenario pre-processing, specifies the lower bound of the desired visibility rate for filtering.
occlusion_target_max In the case of scenario pre-processing, specifies the upper bound of the desired visibility rate for filtering.

Furthermore, additional parameters can be modified in the main.py file, such as:

Parameters Description
Nb_camera The number of cameras in the dataset
dataset_src The path to the data generated by PickSim that needs to be processed
dataset_path The destination path for the rearranged dataset
choice Determines whether the script performs post-processing for depth sensor data or RGB sensor data
list_categories The list of categories to be considered in the dataset
new_size The desired size for resizing features for future training

Rearrange step

The rearrange step involves moving data from the original organization of Picksim, such as:

├── [Scene Id]
|   ├── Meta.json
│   ├── [Camera ID]
|   |    ├── color
|   |    │   └── image
|   |    ├── depth
|   |    │   ├── depth_map
|   |    │   ├── image
|   |    │   ├── normals_map
|   |    │   ├── pointcloud
|   |    │   └── reflectance_map
|   |    ├── ground_truth_depth
|   |    │   ├── 2d_detection
|   |    │   ├── 2d_detection_loose
|   |    │   ├── 3d_detection
|   |    │   ├── 3d_pose
|   |    │   ├── id_map
|   |    │   ├── instance_map
|   |    │   ├── occlusion
|   |    │   └── semantic_map
|   |    ├── ground_truth_rgb
|   |    │   ├── 2d_detection
|   |    │   ├── 2d_detection_loose
|   |    │   ├── 3d_detection
|   |    │   ├── 3d_pose
|   |    │   ├── id_map
|   |    │   ├── instance_map
|   |    │   ├── occlusion
|   |    │   └── semantic_map
|   |    ├── infra1
|   |    │   └── image
|   |    └── infra2
|   |        └── image

to the following structured organization:

├── Bbox_2d
├── Bbox_2d_loose
├── Bbox_3d
├── Depth
├── Instance_Segmentation
├── Meta
├── Occlusion
├── Pose
├── RGB
├── Semantic_Segmentation

The table below provides information about the folders in the PickSim generation, which correspond to different sensors in the Gazebo simulation, aligning with the various sensors of the RealSense camera D415.

Parameters Description
Meta.json Scene-oriented meta file that enumerates all the recorded data and provides the list of categories and their corresponding instance IDs.
Scene ID The ID of the scene for which the data was generated.
Camera ID The ID of the camera for which the data was generated.
color The name of the sensor recording RGB images with a resolution of 1920x1080, matching the data from the RealSense camera D415.
depth The name of the sensor recording depth data or point cloud with a resolution of 1280x720, matching the data from the RealSense camera D415.
ground_truth_rgb Recorded features from a new vision plugin for the color sensor with a resolution of 1920x1080.
ground_truth_depth Recorded features from a new vision plugin for the depth sensor with a resolution of 1280x720.
infra1 Black and white infra channel 1 with a resolution of 1280x720.
infra2 Black and white infra channel 2 with a resolution of 1280x720.

The resulting data is organized based on the type of features. The following table presents information about the resulting data and its corresponding raw data from PickSim:

Parameters Equivalent PickSim Description
Meta [Scene Id]/Meta.json Scene-oriented metadata file that enumerates all recorded data and provides a list of categories and instance IDs.
Bbox_2d [Scene Id]/[camera_i]/[ground_truth_depth]/2d_detection 2D bounding boxes of objects in the scene.
Bbox_2d_loose [Scene Id]/[camera_i]/[ground_truth_depth]/2d_detection Loose 2D bounding boxes of objects in the scene.
Bbox_3d [Scene Id]/[camera_i]/[ground_truth_depth]/3d_detection 3D bounding boxes of objects in the scene.
Depth [Scene Id]/[camera_i]/[depth]/images Depth map captured by the depth sensor.
Instance_Segmentation [Scene Id]/[camera_i]/ground_truth_depth/id_map Instance segmentation mask of objects in the scene.
Semantic_Segmentation [Scene Id]/[camera_i]/ground_truth_depth/sematic_map Semantic segmentation mask of objects in the scene.
Occlusion [Scene Id]/[camera_i]/ground_truth_depth/occlusion Occlusion rate of each instance in the scene.
Pose [Scene Id]/[camera_i]/ground_truth_depth/3d_pose 6D pose (position and orientation) of each instance in the scene.
RGB [Scene Id]/[camera_i]/depth/image RGB image captured by the depth sensor.

The ground truth annotation exclusively utilizes the ground_truth_depth data, which ensures consistency with RGB-D data of the same resolution. However, all features will be included in the dataset to cater to the potential needs of the community. You can access the dataset, including these features, at the following link: [https://datasets.liris.cnrs.fr/fruitbin-version1].

Compute step for PVnet and the the scenarios

The compute step takes the rearranged data as input and processes it for future training purposes. It formats the data into the required structure for training the Pvnet 6D pose estimation model. Additionally, for training purposes, all the data is categorized and fruit-related, as illustrated in the following architectures:

├── Fruit_i
│   ├── Bbox
│   ├── Bbox_3d_Gen
│   ├── Depth_Gen
│   ├── Depth_resized
│   ├── FPS
│   ├── FPS_resized
│   ├── Instance_Mask
│   ├── Instance_Mask_resized
│   ├── Labels
│   ├── Meta_Gen
│   ├── Models
│   ├── Pose_transformed
│   ├── RGB_Gen
│   ├── RGB_resized
│   └── Splitting

The table below provides information about the various generated features. Furthermore, this processed data takes into account the filtering parameters specified in the main.py script, such as the desired level of occlusion. Regarding FruitBin, the scene scenarios are divided into 6000 scenes for training, 2000 scenes for evaluation, and 10000 scenes for testing. In addition, 9 cameras are allocated for training, while 3 cameras are assigned for evaluation and testing, resulting in a total of 15 cameras.

Parameters Description
Fruit_i The fruit category being considered
Meta_Gen Metadata describing fruit-specific information such as Scene ID, Camera ID, a list of instance IDs related to the fruit, and associated occlusion rates
BBox Bboxes
Bbox_3d_Gen 3D Bboxes
Depth_Gen Depth map data with a resolution of 1280x720
Depth_resized Resized depth map data with a resolution of 640x480 for training
FPS Farthest Point Sampling (FPS) key points for the 1280x720 image used in Pvnet
FPS_resized Resized FPS data with a resolution of 640x480 for training in Pvnet
Instance_Mask Instance mask data with a resolution of 1280x720
Instance_Mask_resized Resized instance mask data with a resolution of 640x480 for training
Labels Instance mask in the Yolov8 format (generated using the 'compute label' script explained below)
Models Meshes of the 8 fruits in a common PLY format
Pose_transformed 6D pose annotations in the PVNet format
RGB_Gen RGB image data with a resolution of 1280x720
RGB_resized RGB image data with a resolution of 1280x720
Splitting This folder is only available when the dataset is downloaded online. It contains a list of .txt splitting files for different scenarios, describing the train/eval/test split.

The following step fully prepares the data for PVNet training. For more detailed information, please refer to the following link: https://gitlab.liris.cnrs.fr/gduret/pvnet_fruitbin

Compute step for Densefusion

The final step involves preparing the data for DenseFusion training. To accomplish this, you can use the following command as an example:

python3 compute_label.py --path_dataset=/gpfsscratch/rech/uli/ubn15wo/FruitBin1/FruitBin_low_1_0.7_1.0/ --target_folder=Generated_Cameras --path_DF_data=/gpfsscratch/rech/uli/ubn15wo/DenseFusion01_Cameras/datasets/linemod/Linemod_preprocessed/data --occ_data=""

Parameters Description path_dataset The path to the preprocessed dataset generated during the previous compute step. target_folder The specific scenario to be considered for DenseFusion training. path_DF_data The path to the Densefusion folder where the training data will be placed. occ_data An additional parameter that modifies the name of the resulting .txt DenseFusion splitting file when multiple scenarios are considered within the same folder.

The following table present the different input parameters :

Parameters Description
path_dataset The path to the preprocessed dataset generated during the previous compute step.
target_folder The specific scenario to be considered for DenseFusion training.
path_DF_data The path to the Densefusion folder where the training data will be placed.
occ_data An additional parameter that modifies the name of the resulting .txt DenseFusion splitting file when multiple scenarios are considered within the same folder.

The preprocessing for DenseFusion is now complete. For more detailed information, please refer to the following link: https://gitlab.liris.cnrs.fr/gduret/DenseFusion .

Compute step for GDRNPP

The preprocessing part to obtain BOP format of FruitBin are detailled in our GDRNPP folder: "https://gitlab.liris.cnrs.fr/gduret/gdrnpp_bop2022"

Authors and acknowledgment

License

The dataset is licensed under the CC BY-NC-SA license.