TRADE: Transparent and Deformable Objects Dataset for Shape and Pose Estimation
Description
Context and methodology
This dataset was created to investigate to support experiment on depth estimation and pose estimation of transparent objects. Diverse objects with varying materials, shape, size and liquid fill level are recorded in scenes with various background, lighting in different organization. The aim of this dataset is to help highlight strengths and weaknesses of different approaches and empirically investigate the reliability of different methods in realistic scenarios. By using glass and plastic objects, filled with liquid and empty, properties like opacity and index of refraction are varied. The selected objects also vary significantly in shape and size, with a mix of transparent and non-transparent materials. Additionally, scene properties, including viewing angle, object arrangements, support plane texture, and lighting are varied to create diverse evaluation scenarios.
Technical details
The dataset is collected by moving a camera attached to a robot arm around a scene. The same viewpoints are collected for every scene, and the camera poses are obtained through inverse kinematics of the robot arm. We use 3D-DAT (https://github.com/markus-suchi/3D-DAT) for annotation, placing object models in the virtual 3D scene, and manually correcting their poses based on their reprojection error in the different RGB views.
To obtain 3D object models, the physical objects are coated using a mat spray paint after collecting the different scenes. A high-quality depth sensor (Photoneo MotionCam-3D scanner, https://www.photoneo.com/) is used to reconstruct them. The set of 17 objects used in our experiments includes plastics and glass objects, filled or empty with a variety of shapes, and a variety of sizes. The 3D object models of the deformable objects are obtained by first manually estimating the object masks using SegmentAnything2, and using them in a voxel carving procedure, followed by a marching cube step to obtain a final mesh model.
A total of 34 scenes is collected using a Intel Realsense D435 (https://www.intelrealsense.com/depth-camera-d435i/), saving both the RGB image and the depth image at a resolution of 1280 × 720 pixels. The robotic arm performs a circular motion around the scene with the camera oriented toward the scene center, placing the camera at four different heights and corresponding polar angles (68°, 60°, 48° and 33°). For each circle, either 16 or 26 views are collecting resulting in a total of 64 or 104 views per scene. The light is uniform and comes from the top of the scene. For seven scenes, we add a strong light projector to the side of the scene, producing caustics and other refraction and reflection effects at the interface of transparent objects. Six scenes also have a textured background instead of an uniform one, and the number of distractors in the scene is varied.
For each scene in the "scenes/" folder, the structure is as follow:
- rgb/ contains the color images
- depth/ contains the depth obtained with the Realsense D435 camera
- groundtruth_handeye.txt contains the camera poses of each viewpoint (each line contains pose in TUM format: id, tx, ty, tz, rx, ry, rz, rw with id being the current view, tx,ty,tz the translation, rx, ry, rz, rw the rotation as quaternion).
- poses.yaml contains the scene objects annotation in the same world reference frame as the camera poses as well as the liquid fill level, as a percentage of the object along the gravity vector in the scene.
We complement this dataset with a rendered training set generated using BlenderProc consisting in 148k images featuring high-quality models of the four most represented transparent objects in the real dataset to support training methods as needed. The rendered dataset includes various distractor objects and changing liquid fill levels as well as various liquid properties. As is standard with BlenderProc, the rendered dataset is using the BOP format detailed here. The models used in the renderings are also provided under the models/ folder.