PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

TransDex: Pre-training Visuo-Tactile Policy with Point Cloud Reconstruction for Dexterous Manipulation of Transparent Objects

Fengguan Li, Yifan Ma, Chen Qian, Wentao Rao, Weiwei Shang

University of Science and Technology of China

Corresponding Author

PDF arXiv Code

TransDex: a 3D visuo-tactile fusion motor policy for transparent objects based on point cloud reconstruction pre-training.

Abstract

Dexterous manipulation is a crucial skill that enables robots to complete complex tasks. Multi-fingered dexterous hands are typically employed to achieve this capability, yet they often face significant perceptual challenges during manipulation due to visual self-occlusion. These challenges are exacerbated when transparent objects are manipulated, as complex properties of light refraction and reflection cause a loss of depth information and introduce depth noise. To address these challenges, this paper proposes TransDex, a 3D visuo-tactile fusion motor policy based on point cloud reconstruction pre-training. Specifically, we first propose a self-supervised point cloud reconstruction pre-training approach based on Transformer. This method accurately recovers the 3D structure of an object from interactive point clouds of dexterous hands, even when random noise and large-scale masking are added. Building on this, TransDex is constructed in which perceptual encoding adopts a fine-grained hierarchical scheme and multi-round attention mechanisms adaptively fuse features of the robotic arm and dexterous hand to enable differentiated motor prediction. Results from transparent object manipulation experiments conducted on a real robotic system demonstrate that TransDex outperforms existing baseline methods. Further analysis validated the generalization capabilities of TransDex and the effectiveness of each of its components.

Method Overview

TransDex operates in two phases: (1) The pre-training framework encodes randomly masked and noisy hand object interaction point cloud, and reconstructs the complete object point cloud via a decoder. This process endows the pre-trained network with robust object comprehension against incomplete perceptual scenarios. (2) Building upon the pre-trained network, the motor policy accomplishes the extraction and multi-modal integration of fine-grained visuo-tactile information, ultimately generating differentiated motion predictions for both the robotic arm and dexterous hand.

Experiments

Robotic System Setup: ① a 16-DOF dexterous hand, with array tactile sensors equipped on the fingertips and finger pads; ② a 7-DOF robotic arm; ③ depth cameras; ④ experimental items; ⑤ a data glove; ⑥ a motion capture camera.

TransDex demonstrates strong capabilities in dexterous manipulation tasks of transparent objects, while also exhibiting strong generalisation and robustness against unseen and complex scenarios.

Visualization of point cloud reconstruction performance in pre-training tasks and real-world transfer reconstruction outcomes.

Real-World Demonstrations(3×)

Pouring

Shaking

Rotating

BibTeX

@misc{li2026transdex,
      title={TransDex: Pre-training Visuo-Tactile Policy with Point Cloud Reconstruction for Dexterous Manipulation of Transparent Objects}, 
      author={Fengguan Li and Yifan Ma and Chen Qian and Wentao Rao and Weiwei Shang},
      year={2026},
      eprint={2603.13869},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.13869}, 
}