FruitBin
The early released dataset can be accesible to the link : https://datasets.liris.cnrs.fr/fruitbin-version1
The datasets for the 8 scenarios are currently available in compressed zip files under the "scenarios" section. The release of the large-scale dataset, which encompasses a broader range of data, will be made available at the same location as soon as possible, within a few weeks. Considering the substantial size of the dataset, users will have the option to selectively download specific data subsets based on their requirements, such as RGB, depth, and other features, allowing for more efficient and targeted data retrieval.
Detailed information regarding the dataset generation process can be accessed through the following links: "https://gitlab.liris.cnrs.fr/gduret/PickSim" for insights into the PickSim generation, and this repositiry for access to the processing codes and procedures involved in creating FruitBin, as well as the scenarios designed for training purposes. These resources provide valuable documentation and code references for a better understanding of the dataset's origins and the steps taken to prepare it for various applications.
To facilitate the reproduction of the benchmark results, we have provided the code for PVNet, Densefusion and GDRNPP along with detailed instructions on how to apply them to the FruitBin dataset. You can access the code at the following links: "https://gitlab.liris.cnrs.fr/gduret/pvnet_fruitbin", "https://gitlab.liris.cnrs.fr/gduret/DenseFusion" and "https://gitlab.liris.cnrs.fr/gduret/gdrnpp_bop2022". These resources will guide you through the implementation of PVNet, Densefusion and GDRNPP on the FruitBin dataset, enabling you to replicate the benchmark and further explore the capabilities of the dataset.
Getting started
The expected usage involves utilizing the following command:
python main.py --World_begin="$id_begin" --Nb_world="$Nb" --dataset_id="$id_dataset" --occlusion_target_min=$occlusion_min --occlusion_target_max=$occlusion_max --rearrange=$rearrange --compute=$compute
as the exemple :
python main.py --World_begin=1 --Nb_world=10000 --dataset_id=1 --occlusion_target_min=0.7 --occlusion_target_max=1.0 --rearrange=yes --compute=yes
The following table presents the different input parameters:
Parameter | Description |
---|---|
World_begin | The ID of the first scene to be processed. |
Nb_world | The number of scenes to be processed. |
dataset_id | The ID of the dataset to be processed. It is used to determine the path of the data to be processed. |
rearrange | Determines whether the script should perform data rearrangement from the data generated by PickSim. |
compute | Determines whether the script should perform post-processing for a specific scenario. |
occlusion_target_min | In the case of scenario pre-processing, specifies the lower bound of the desired visibility rate for filtering. |
occlusion_target_max | In the case of scenario pre-processing, specifies the upper bound of the desired visibility rate for filtering. |
Furthermore, additional parameters can be modified in the main.py file, such as:
Parameters | Description |
---|---|
Nb_camera | The number of cameras in the dataset |
dataset_src | The path to the data generated by PickSim that needs to be processed |
dataset_path | The destination path for the rearranged dataset |
choice | Determines whether the script performs post-processing for depth sensor data or RGB sensor data |
list_categories | The list of categories to be considered in the dataset |
new_size | The desired size for resizing features for future training |
Rearrange step
The rearrange step involves moving data from the original organization of Picksim, such as:
├── [Scene Id]
| ├── Meta.json
│ ├── [Camera ID]
| | ├── color
| | │ └── image
| | ├── depth
| | │ ├── depth_map
| | │ ├── image
| | │ ├── normals_map
| | │ ├── pointcloud
| | │ └── reflectance_map
| | ├── ground_truth_depth
| | │ ├── 2d_detection
| | │ ├── 2d_detection_loose
| | │ ├── 3d_detection
| | │ ├── 3d_pose
| | │ ├── id_map
| | │ ├── instance_map
| | │ ├── occlusion
| | │ └── semantic_map
| | ├── ground_truth_rgb
| | │ ├── 2d_detection
| | │ ├── 2d_detection_loose
| | │ ├── 3d_detection
| | │ ├── 3d_pose
| | │ ├── id_map
| | │ ├── instance_map
| | │ ├── occlusion
| | │ └── semantic_map
| | ├── infra1
| | │ └── image
| | └── infra2
| | └── image
to the following structured organization:
├── Bbox_2d
├── Bbox_2d_loose
├── Bbox_3d
├── Depth
├── Instance_Segmentation
├── Meta
├── Occlusion
├── Pose
├── RGB
├── Semantic_Segmentation
The table below provides information about the folders in the PickSim generation, which correspond to different sensors in the Gazebo simulation, aligning with the various sensors of the RealSense camera D415.
Parameters | Description |
---|---|
Meta.json | Scene-oriented meta file that enumerates all the recorded data and provides the list of categories and their corresponding instance IDs. |
Scene ID | The ID of the scene for which the data was generated. |
Camera ID | The ID of the camera for which the data was generated. |
color | The name of the sensor recording RGB images with a resolution of 1920x1080, matching the data from the RealSense camera D415. |
depth | The name of the sensor recording depth data or point cloud with a resolution of 1280x720, matching the data from the RealSense camera D415. |
ground_truth_rgb | Recorded features from a new vision plugin for the color sensor with a resolution of 1920x1080. |
ground_truth_depth | Recorded features from a new vision plugin for the depth sensor with a resolution of 1280x720. |
infra1 | Black and white infra channel 1 with a resolution of 1280x720. |
infra2 | Black and white infra channel 2 with a resolution of 1280x720. |
The resulting data is organized based on the type of features. The following table presents information about the resulting data and its corresponding raw data from PickSim:
Parameters | Equivalent PickSim | Description |
---|---|---|
Meta | [Scene Id]/Meta.json | Scene-oriented metadata file that enumerates all recorded data and provides a list of categories and instance IDs. |
Bbox_2d | [Scene Id]/[camera_i]/[ground_truth_depth]/2d_detection | 2D bounding boxes of objects in the scene. |
Bbox_2d_loose | [Scene Id]/[camera_i]/[ground_truth_depth]/2d_detection | Loose 2D bounding boxes of objects in the scene. |
Bbox_3d | [Scene Id]/[camera_i]/[ground_truth_depth]/3d_detection | 3D bounding boxes of objects in the scene. |
Depth | [Scene Id]/[camera_i]/[depth]/images | Depth map captured by the depth sensor. |
Instance_Segmentation | [Scene Id]/[camera_i]/ground_truth_depth/id_map | Instance segmentation mask of objects in the scene. |
Semantic_Segmentation | [Scene Id]/[camera_i]/ground_truth_depth/sematic_map | Semantic segmentation mask of objects in the scene. |
Occlusion | [Scene Id]/[camera_i]/ground_truth_depth/occlusion | Occlusion rate of each instance in the scene. |
Pose | [Scene Id]/[camera_i]/ground_truth_depth/3d_pose | 6D pose (position and orientation) of each instance in the scene. |
RGB | [Scene Id]/[camera_i]/depth/image | RGB image captured by the depth sensor. |
The ground truth annotation exclusively utilizes the ground_truth_depth data, which ensures consistency with RGB-D data of the same resolution. However, all features will be included in the dataset to cater to the potential needs of the community. You can access the dataset, including these features, at the following link: [https://datasets.liris.cnrs.fr/fruitbin-version1].
Compute step for PVnet and the the scenarios
The compute step takes the rearranged data as input and processes it for future training purposes. It formats the data into the required structure for training the Pvnet 6D pose estimation model. Additionally, for training purposes, all the data is categorized and fruit-related, as illustrated in the following architectures:
├── Fruit_i
│ ├── Bbox
│ ├── Bbox_3d_Gen
│ ├── Depth_Gen
│ ├── Depth_resized
│ ├── FPS
│ ├── FPS_resized
│ ├── Instance_Mask
│ ├── Instance_Mask_resized
│ ├── Labels
│ ├── Meta_Gen
│ ├── Models
│ ├── Pose_transformed
│ ├── RGB_Gen
│ ├── RGB_resized
│ └── Splitting
The table below provides information about the various generated features. Furthermore, this processed data takes into account the filtering parameters specified in the main.py script, such as the desired level of occlusion. Regarding FruitBin, the scene scenarios are divided into 6000 scenes for training, 2000 scenes for evaluation, and 10000 scenes for testing. In addition, 9 cameras are allocated for training, while 3 cameras are assigned for evaluation and testing, resulting in a total of 15 cameras.
Parameters | Description |
---|---|
Fruit_i | The fruit category being considered |
Meta_Gen | Metadata describing fruit-specific information such as Scene ID, Camera ID, a list of instance IDs related to the fruit, and associated occlusion rates |
BBox | Bboxes |
Bbox_3d_Gen | 3D Bboxes |
Depth_Gen | Depth map data with a resolution of 1280x720 |
Depth_resized | Resized depth map data with a resolution of 640x480 for training |
FPS | Farthest Point Sampling (FPS) key points for the 1280x720 image used in Pvnet |
FPS_resized | Resized FPS data with a resolution of 640x480 for training in Pvnet |
Instance_Mask | Instance mask data with a resolution of 1280x720 |
Instance_Mask_resized | Resized instance mask data with a resolution of 640x480 for training |
Labels | Instance mask in the Yolov8 format (generated using the 'compute label' script explained below) |
Models | Meshes of the 8 fruits in a common PLY format |
Pose_transformed | 6D pose annotations in the PVNet format |
RGB_Gen | RGB image data with a resolution of 1280x720 |
RGB_resized | RGB image data with a resolution of 1280x720 |
Splitting | This folder is only available when the dataset is downloaded online. It contains a list of .txt splitting files for different scenarios, describing the train/eval/test split. |
The following step fully prepares the data for PVNet training. For more detailed information, please refer to the following link: https://gitlab.liris.cnrs.fr/gduret/pvnet_fruitbin
Compute step for Densefusion
The final step involves preparing the data for DenseFusion training. To accomplish this, you can use the following command as an example:
python3 compute_label.py --path_dataset=/gpfsscratch/rech/uli/ubn15wo/FruitBin1/FruitBin_low_1_0.7_1.0/ --target_folder=Generated_Cameras --path_DF_data=/gpfsscratch/rech/uli/ubn15wo/DenseFusion01_Cameras/datasets/linemod/Linemod_preprocessed/data --occ_data=""
Parameters Description path_dataset The path to the preprocessed dataset generated during the previous compute step. target_folder The specific scenario to be considered for DenseFusion training. path_DF_data The path to the Densefusion folder where the training data will be placed. occ_data An additional parameter that modifies the name of the resulting .txt DenseFusion splitting file when multiple scenarios are considered within the same folder.
The following table present the different input parameters :
Parameters | Description |
---|---|
path_dataset | The path to the preprocessed dataset generated during the previous compute step. |
target_folder | The specific scenario to be considered for DenseFusion training. |
path_DF_data | The path to the Densefusion folder where the training data will be placed. |
occ_data | An additional parameter that modifies the name of the resulting .txt DenseFusion splitting file when multiple scenarios are considered within the same folder. |
The preprocessing for DenseFusion is now complete. For more detailed information, please refer to the following link: https://gitlab.liris.cnrs.fr/gduret/DenseFusion .
Compute step for GDRNPP
The preprocessing part to obtain BOP format of FruitBin are detailled in our GDRNPP folder: "https://gitlab.liris.cnrs.fr/gduret/gdrnpp_bop2022"
Authors and acknowledgment
License
The dataset is licensed under the CC BY-NC-SA license.