Research

Our research goal is to develop robust machine vision algorithms for robotic automation and intelligence under challenging unstructured environments. To this end, we explore and conduct research on the topics of Visual Servoing, Autonomous Driving, Soft Robot, Unmanned Aerial Vehicles(UAVs), Medical Robot, Reinforcement-learning Control, Multi-robot Control and Large-scale Scheduling and Machine Vision Projects.

Visual Servoing

We concentrate on visual servoing control in unstructured environments. The depth-independent interaction matrix is proposed to decouple nonlinearity from depth, based on which the adaptive law is proposed to estimate unknown camera parameters online. The controller can be consequently designed using an uncalibrated monocular camera, realizing regulation and tracking performance. The proposed visual servoing techniques have been employed in the control of diverse types of robot platforms, including industrial robots, soft robots, flexible robots, mobile robots, etc.

Visual Servoing of Robot Manipulator
In vision-based robotic manipulation, it is usually required that the object should present a desired shape or be viewed at a specific angle to facilitate subsequent operations such as object crawling and component inspection. To implement this kind of visual servoing tasks, a novel image feature based on curve parameters of Bezier and nonuniform rational B-spline (NURBS) curves are designed. An adaptive depth-independent controller is designed to estimate the unknown curve parameters as well as the depth information online (TMECH 2018). In addition, based on the theory of visual servoing, we have developed an applications, interaction with bottle-like objects, which can be used to deburring, polishing, and welding the inner surface of the bottle-like mold. Based on the geometry of the object, a new generalized constraint called the bottleneck (BN) constraint is proposed, which ensures the tool passes through a fixed 3-D region and avoid collisions with the boundary of the region. A novel dynamic controller is designed to realize the hybrid vision/force control under the BN constraint (ICRA 2021).

Visual Servoing of Flexible Manipulator
We propose a novel image-based visual servoing for a flexible aerial refueling boom with an eye-in-hand camera. The dynamic model of the flexible refueling boom is decomposed into a slow subsystem and a fast subsystem based on the singular perturbation approach. With respect to slow subsystem, the image feedback is used to control the flexible refueling boom so that the projection of the point marker on the back of the receiver converges to the desired position. With respect to fast subsystem, linear quadratic regulator (LQR) is applied to stabilize the vibration of the flexible refueling boom. The asymptotic convergence of the image error to zero is verified based on the Lyapunov theory. Simulation is used to demonstrate the effectiveness of the proposed method. (TMSC 2020)

Visual Servoing of Wheeled Robot—Regulation
We propose novel image-based visual servoing schemes for the pose stabilization and position control problem of mobile robots with an overhead fixed camera. A new image-based kinematic model is introduced, removing camera intrinsic and extrinsic parameters from the image Jacobian matrix and making the design of camera-parameter independent image-based controllers possible. In the proposed schemes, neither accurate nor approximate knowledge about the camera intrinsic and extrinsic parameters is required, and the totally unknown camera can be mounted on the ceiling with an arbitrary pose, which can make the controller implementation very simple and flexible. (TRO 2019, TAC 2018)

Visual Servoing of Wheeled Robot—Trajectory Tracking
In real applications, it is very important to control the mobile robot to move along a desired trajectory to a desired position, in order to achieve obstacle avoidance or keep the mobile robot in the camera field of view during the control process, either of which is known to be the key to success of task execution. To this end, we propose a new calibration-free image-based trajectory tracking control scheme for nonholonomic mobile robots with a truly uncalibrated fixed camera. By developing a novel camera-parameter-independent kinematic model, both offline and online camera calibration can be avoided in the proposed scheme, and any knowledge of the camera is not needed in the controller design. The proposed trajectory tracking control scheme can guarantee exponential convergence of the image position and velocity tracking errors. (TASE 2020)

Visual Servoing of Wheeled Robot—Formation Control
In many existing formation control approaches for nonholonomic mobile robots, the leader velocity is required to be measured and transmitted to the follower for controller design. To make it applicable to the environments where providing the robots with the capability of global positioning is difficult or impossible, we should develop formation controllers without measurement of the robot global position information. To this end, we develop novel continuous formation controllers for mobile robots without measurement of the leader velocity such that communication between the mobile robots is not required. To address the unavailability issue of the leader velocity, observers based on adaptive control technique are proposed to obtain estimation of the leader velocity from information of the follower’s onboard sensors. The effect of the velocity estimation error on the closed-loop stability is considered in the stability analysis based on Lyapunov stability theory, and it is shown that global stability of the combined observer–controller closed-loop system is ensured by the developed approaches. (TMECH 2020, TRO 2018)

Human-Robot Shared Visual Servoing Based on Game Theory
We design a human-robot shared visual servoing system for the human and the robot to make a coordination during visual servoing, which can combine the precise control ability of the robot and the human decision-making ability. Game theory is used to model the behaviors of humans and robots. According to the observation of human-robot physical interaction force, the human intention is adaptively estimated using radial basis function neural network (RBFNN), and the robot control objective is dynamically adjusted to realize human-robot coordination. In this framework, when the human does not participate in the visual servoing, the robot works in autonomous control mode and occupy all control rights; when the human participates in visual servoing, the robot works in shared control mode, handing over part of the control rights to humans, and the control instructions issued by humans excees the stronger, the more control right the robot loses. When the robot completely loses control, the robot works in the human remote control mode. The Lyapunov theory is used to prove the stability of the system. The effectiveness of the proposed method is verified by experiments.

Planning based Visual Servoing
In the context of robot manipulations employing visual servoing techniques, instances may arise where the target features exhibit a considerable discrepancy from their initial setup. Such scenarios can lead to challenges in achieving convergence for the visual servo controller due to a set of inevitable constraints, including but not limited to field-of-view, kinematic limitations, visibility considerations, and the non-singularity requirements of the image Jacobian matrix. To surmount these issues, a novel local planner (controller), grounded in quadratic programming optimization, has been proposed. This planner is designed to handle the field of view and joint (positions, velocities, and torques) limitations throughout the servoing process, while effectively mitigating the occurrence of singularities. This approach is further augmented by integrating it with a sampling-based global planning framework, thereby enabling the system to efficiently tackle a broad spectrum of visual servoing tasks that are subject to constraints (T-MECH 2024 Under Review).

Autonomous Driving

For autonomous driving, our research mainly focusses on the perception and localization through learning-based algorithms. Specifically, we aim to develop robust AI-based Simultaneous Localization and Mapping (SLAM) system for autonomous driving and mobile robotics under challenging environments based on multi-sensor perception. Our work includes learning-based odometry, large-scale mapping, long-term loop closure and relocalization.

Outdoor Visual Odometry and Registration
Due to noisy pixels caused by dynamic objects in the outdoor scenes, visual odometry is heavily affected under such dynamic environments. We propose a confidence-based unsupervised visual odometry model to fully leverage the similarity and consistency of correspondent pixels, improving the robustness to dynamic objects ( IROS 2019, TITS 2021). For the occlusion problem, we propose a new unsupervised learning method of depth and ego motion using multiple masks to handle the occlusion problem (ICRA 2019, TITS 2020). For large motion, we propose a novel unsupervised training framework of depth and pose with 3D hierarchical refinement and augmentation using explicit 3D geometry (T-CSVT 2022 ). For robust visual registration, we model the RANSAC sampling consensus as a reinforcement learning process, achieving a full end-to-end learning sampling consensus robust estimation (ICCV 2023 ).

Outdoor LiDAR Odometry and Registration
For LiDAR-based odometry, we introduce a novel 3D point cloud learning model, named PWCLO-Net, using hierarchical embedding mask optimization. It outperforms all recent learning-based methods and outperforms the geometry-based approach, LOAM with mapping optimization, on most sequences of the KITTI odometry dataset (CVPR 2021). Furthermore, we propose a new efficient 3D point cloud learning method, which is specially designed for the frame-by-frame processing task of real-time perception and localization of robots. It can accelerate the deep LiDAR odometry of our previous CVPR to real-time while improving the accuracy (T-PAMI 2022). Furthermore, we designed a highly efficient LiDAR odometry framework by projecting points onto a 2D surface and then feeding them into a local transformer with linear complexity (AAAI 2023). To capture the long dependency for modeling large motion in point cloud registration, we propose an end-to-end efficient point cloud registration method of 100,000 level point clouds (ICCV 2023 ).

2D visual-3D point cloud registration
We propose Fusion-Net, an online and end-to-end solution that can automatically detect and correct the extrinsic calibration matrix between LiDAR and a monocular RGB camera without any specially designed targets or environments (ICRA 2021). Furthermore, we present I2PNet, a novel end-to-end 2D-3D registration network. I2PNet directly registers the raw 3D point cloud with the 2D RGB image using differential modules with a unique target. The 2D-3D cost volume module for differential 2D-3D association is proposed to bridge feature extraction and pose regression. The results demonstrate that I2PNet outperforms the SOTA by a large margin. Furthermore, to achieve cross-model localization, we propose a monocular visual localization pipeline named LHMap-loc. which can compress the point cloud map in an offline way, and carry out monocular pose regression online. The proposed LHMap-loc performs better in terms of precision and efficiency than the SOTA methods on KITTI, Argoverse, and self-collected datasets (ICRA 2024) .

Long-term Loop Closure and Relocalization
The changing environments pose great challenge on the long-term SLAM algorithms, especially for loop closure and relocalization. We A self-supervised representation learning is proposed to extract domain-invariant features through multi-domain image translation by introducing feature consistency loss. Besides, a novel gradient-weighted similarity activation mapping loss is incorporated for high-precision localization (JAS 2021, IROS 2019). To leverage the high-quality virtual ground truths without any human effort, we propose a novel multi-task architecture to fuse the geometric and semantic information into the latent embedding representation through syn-to-real domain adaptation(TIP 2020). For the large-scale point cloud from LiDAR, we propose a novel discriminative and generalizable global descriptor to represent the large-scale outdoor scene, which reveal the continuous latent embedding feature space for place recognition and loop closure. Based on LPD-Net, point cloud registration is implemented for 6-DoF pose regression and relocalization after loop closure (ICCV 2019, IROS 2019, IROS 2020).

Unmanned Delivery Robot
Cooperating with Vipshop, we proposed a multi-sensor fusion based unmanned system with autonomous navigation, localization, planning and control algorithm to solve the last-mile delivery problem in the logistics parks. Our unmanned system integrates multi-sensor fusion based SLAM, multi-model perception, dynamic path planning, obstacle avoidance algorithm and motion control algorithms, achieving high-precision mapping, localization, perception, navigation and obstacle avoidance under complex environments. The system has been validated on multiple platforms and accurate vision-based navigation, localization and fixed parking tasks have been accomplished in the large-scale challenging outdoor environments. And it has been successfully applied to the industry field of logistics and distribution and the trial operation in the SJTU campus and Vipshop headquarter has been completed as well. The single delivery path is more than one kilometer long with satisfying operation effects, and the cumulative delivery has been about thousands of express items.

Robot Perception

For robot perception, our research mainly focusses on the visual or point cloud based perception of robot through learning-based algorithms. Specifically, we aim to develop robust AI-based perception system for mobile robotics under challenging environments based on multi-sensor perception. Our work includes learning-based optical/scene flow estimation, object segmentation/detection/tracking, NeRF based SLAM and Hand/Human Pose Estimation.

Optical/Scene flow estimation in images and point clouds
For unsupervised optical flow estimation, we introduce a novel unsupervised learning method of optical flow by considering the constraints in non-occlusion regions with geometry analysis (T-ITS 2022). We introduce a novel hierarchical neural network with double attention for learning the correlation of point features in adjacent frames and refining scene flow from coarse to fine layer by layer. The proposed network achieves the SOTA performance of 3D scene flow estimation (TIP 2021). Furthermore, we introduce a novel flow embedding layer with all-to-all mechanism and reverse verification mechanism. Besides, we investigate and compare several design choices in key components of the 3D scene flow network and achieve SOTA performance (ECCV 2022). To combine 2D and 3D information, we propose an efficient and high-precision scene flow learning method for large-scale point clouds, achieving the efficiency of the 2D method and the high accuracy of the 3D method (ICCV 2023). To achieve the robust scene flow estimation, we proposed a novel uncertainty-aware scene flow estimation network with the diffusion probabilistic model. Iterative diffusion-based refinement is designed to enhance the correlation robustness and resilience to challenging cases, e.g. dynamics, noisy inputs, repetitive patterns, etc (CVPR 2024). In addtion, we propose a 3D scene flow pseudo-auto-labelling framework. Given point clouds and initial bounding boxes, both global and local motion parameters are iteratively optimized. Diverse motion patterns are augmented by randomly adjusting these motion parameters, thereby creating a diverse and realistic set of motion labels for the training of 3D scene flow estimation models (CVPR 2024).

Object segmentation/detection/tracking
For segmentation of unstructured point clouds, we introduce a spherical interpolated convolution operator to replace the traditional grid-shaped 3D convolution operator. It improves the accuracy and reduces the parameters of the network (T-CYB 2021). Then, for segentation of point cloud sequences, we introduce an Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) to process dynamic 3D point cloud sequences. It makes better use of the structured information within the local region and learns spatial-temporal embedding features from dynamic 3D point cloud sequences (TIM 2021). To foresee the future states of general obstacles, we further propose a novel 4D occupancy forecasting network as well as a new benchmark to support future works on general movable/static objects segmentation (CVPR 2024). For object detection, we propose an efficient feature fusion framework with projection awareness for 3D Object Detection. For object tracking, we propose an interactive feature fusion between multi-scale features of images and point clouds. Besides, we explore the effectiveness of pre-training on each single modality and fine-tuning the fusion-based model (T-ITS 2023).

NeRF and NeRF based SLAM
Neural SLAM works have fully demonstrated the advantages of Neural Radiation Fields (NeRF) in SLAM: density, consistency, and flexibility. We propose a semantic SLAM system utilizing neural implicit representation to achieve high-quality dense semantic mapping and robust tracking. In this system, we introduce hierarchical semantic representation to allow multi-level semantic comprehension for top-down structured semantic mapping of the scene. In addition, to fully utilize the correlation between multiple attributes of the environment, we integrate appearance, geometry, and semantic features through cross-attention for feature collaboration. Then, we design an internal fusion-based decoder to obtain semantic, RGB, and Truncated Signed Distance Field (TSDF) values from multi-level features for accurate decoding. Furthermore, we propose a feature loss to update the scene representation at the feature level. Through the proposed strategies, the performance of dense semantic SLAM is improved, which is demonstrated in public indoor datasets (CVPR 2024).

Vision-Based Hand/Human Pose Estimation
Currently, hand pose estimation faces challenges such as occlusion, changes in lighting conditions, and estimation biases caused by similar joint appearances. Additionally, integrating 2D and 3D inputs efficiently poses difficulties. To address these issues, we propose a hand pose estimation algorithm based on joint 2/3D estimation and dynamic recurrent optimization. To fully exploit the characteristics of 2D and 3D modalities and maximize their complementarity, our algorithm incorporates multiple bidirectional fusion connections at specific layers in 2D and 3D branches. These connections efficiently merge semantic information from 2D depth maps and 3D point cloud data. To tackle the challenges of occlusion, our algorithm employs dynamic recurrent units. Through iterative recurrent, it samples the neighborhood of hand joints and utilizes dynamic graph convolutions to facilitate dynamic information interaction within the joint neighborhood. This process enhances the semantic distinctiveness of joint neighborhoods and continuously refines the positions of hand joints. Our algorithm achieves real-time, high-precision estimation of hand poses in video streams. For human pose estimation, we introduce a novel unsupervised learning method of the 3D human pose by considering the loop constraints from real/virtual bones and the joint motion constraints in consecutive frames (T-CSD 2022).

Hand/Object Interaction Prediction
Understanding human behavior during hand-object interaction is crucial for a wide range of applications, such as service robot manipulation and extended reality environments. By accurately predicting how humans interact with objects, robots can better anticipate user needs, and extended reality systems can provide more immersive, intuitive experiences. To address this challenge, our recent research [arXiv] has focused on forecasting hand trajectories and object affordances using egocentric videos that capture human perspective. This dual prediction task offers a more holistic representation of future hand-object interactions in 2D space, capturing not only the potential motion of the hand but also the perceived functional possibilities of the objects involved. In addition, we also propose a novel hand trajectory prediction method [Project Page] to overcome the challenges of camera egomotion interference and the absence of affordance labels to explicitly guide the optimization of hand waypoint distribution.

Soft Robot

For soft robots, our research mainly focuses on design, modeling and control for soft robots that are made up of silicone rubber and electroactive polymers. To be specific, we have been working on a soft manipulator that is dedicated to surgery in a minimally invasive fashion and a soft gripper that is able to grasp fragile objects. We have also developed new designs for soft robots capable of multi-modal locomotion for the exploration of unknown environments.

Design of Soft Manipulator and Soft Gripper
A cable-driven soft manipulator system is designed for cardiac ablation in a minimally invasive manner. The system is totally made of soft materials and has no rigid structures inside. A shape sensor network based on Fiber Bragg Gratings (FBGs) is embedded inside the soft manipulator to obtain the robotic shape in real-time. A proximity sensor, which consists of a 4-point sensor array, is affixed at the end of the soft manipulator for tracking a beating heart. Shape memory polymer (SMP) has been successfully integrated with soft robotics for resolving compliance complexities associated with poor stiffness or low payload capability. However, heating of thermally responsive SMP has always been challenging for robotics applications. To overcome this challenge, we fabricated a soft robotic finger and introduced the concept of artificial joints by combining SMP substrate with attached heaters to a soft pneumatic finger and achieved different bending motions by activating different sets of joints. For soft gripper sensing, we present a novel design and fabrication procedure of a soft robotic actuator that has the attributes of pressure and curvature sensing. (ROBIO 2018, CYBER 2018)

Kinematics and Dynamics for Soft Robots
Due to the distinction of the modelling between rigid robots and soft robots, a new framework that concerns kinematics, statics and dynamics for soft robots is badly needed. To do this, we propose three-dimensional dynamics by combining geometrically exact Cosserat rod theory and the Kelvin model. Both curvature and strain are taken into accounts on the basis of the piecewise constant curvature model. Based on this, an underwater dynamics considering friction model is proposed that adopts the Column friction model to compensate for the actuation force’s loss in the transmission process. The dynamics presented can adapt to variable environments and serve as the platform for the controller design (TMECH 2018). Based on the solved system model, we have developed model-based force and collision detection algorithm (ICRA 2021).

Soft Robot Control in the Free Space
Owing to the modelling differences between rigid robots and soft robots, control algorithms for rigid robots cannot be directly applied to soft robots. Therefore, we have been dedicating ourselves to controller design specific to soft robots. We proposed series of visual servo controllers, such as an adaptive visual servoing controller considering special optical conditions and environmental disturbances (TIE 2019, TMECH 2019), shape control leveraging shape features to solve the feature correspondence problem of the continuum robot (RA-L/ICRA 2021); and fault-tolerant control merely relying on a monocular camera by meticulously designing signals into the dynamic process to trigger divergent performance (TIE 2020). And use carefully designed shape features to solve the three-dimensional shape control problem of continuum robots without global attitude/position information (TMECH 2022).

Soft Robot Control in the Constrained Environments
To improve the controllability of the soft robot in the constrained environments, we have conducted an array of research into interaction with environment. We proposed a hybrid vision force controller based on a deformation model, leveraging the real-time deflection model updating techniques and realizing accurate force interaction performance (TCST 2019). To address the control problem with high safety level, e.g., applications in robot-assisted surgery, we should sometimes satisfy the dual requirements of accurate positioning of the end effector and non-collision with the organs in the body. To this end, we proposed a hybrid controller aiming at simultaneous obstacle avoiding and visual servoing of the robot extremity (TMECH 2020).In order to fully harness the natural compliance of soft robotics and to soften the rigid avoidance constraints typically found in conventional obstacle avoidance algorithms, we have engineered a novel compliant obstacle avoidance algorithm. This algorithm is integrated with a safety interaction control system that evaluates contact forces, thereby facilitating accurate positioning and secure interaction within confined spaces that were previously unknown (TMECH2024).

Soft Robot Optimized Operation Control
In response to the need for soft robots to directly interact with objects and perform specific operating tasks in practical applications, we propose an active force adjustment mechanism oriented to operating tasks, aiming to maximize the use of the limited load of the soft robot so that it can complete Certain operational tasks. We propose a two-stage visual servo controller that achieves the dual control objectives of pushing the object to the target position through the visual control algorithm and improving the pushing effect through force planning. The proposed controller has been verified in the object pushing task of the actual prototype on its image error convergence ability and work efficiency improvement effect.(Soft Robotics 2021)

Design of Small-scale Multi-modal Locomotion Soft Robot
Despite the adaptability potential of small-scale soft robots in unknown terrains, their performance often gets hindered due to inherent restrictions in soft actuators and compact bodies. To tackle this issue, we have designed a rapid-moving soft robot that is powered by electroactive materials. This robot merges the benefits of dielectric elastomer actuators (DEAs) and shape memory alloy (SMA) spring actuators, which paves the way for its high-performance multi-modal locomotion. We have developed nonlinear models for both DEAs and SMAs to analyze the robot's performance. The design parameters of the robot were fine-tuned based on these models to enhance its running and jumping capabilities. The performance of the designed robot was experimentally tested, which showed its excellent performance in running speed and jumping height. Meanwhile, the robot's turning motion and jumping angle can be controlled by the coordinated actuation of different actuators.

Unmanned Aerial Vehicles(UAVs)

For unmanned aerial vehicles, our research mainly focuses on morphing UAV design, image-based control and trajectory planning. Because GPS can be unqualified in indoor or in cluttered urban areas, also unreliable at low altitudes, we aim to control the UAV to perform servoing or tracking tasks by taking advantage of the visual information only providing by a monocular camera. We are also interested in generating safe and dynamically feasible trajectories for UAVs in the obstacle-cluttered environment. Moreover, we have designed a morphing quadrotor with the ability to perform aerial dynamic grasping.

Visual Servoing of UAVs
Due to the underactuation and the nonlinearity of the quadrotor dynamics, we use properly defined image features to design an IBVS controller for a quadrotor UAV. By using the image features in the virtual image plane, a velocity controller is derived (TMECH 2017). One of the biggest challenges in IBVS is to on-line estimate the depth. We propose a nonlinear observer to simultaneously estimate the depth of the point features and the velocity of the quadrotor using visual feedback. Experimental tests, including the comparison with an extended-Kalman-filter based observer, are conducted to verify the validity of the observer (TMECH 2018). The visibility problem may lead to an failure in vision servoing for UAVs. To guarantee the visibility, we define a visibility constraint based on control barrier function. The control inputs are minimally modified to satisfy the visibility constraint, thus preserving visibility (TMECH 2019).

Visual Tracking of UAVs
Image-based visual tracking (IBVT) problem of quadrotors is challenging. Because for such systems, the relationship between the control inputs and the image feature's motion is often complex. We propose a nonlinear controller using features defined in virtual plane for the quadrotor to track a moving target. The target is assumed to be moving with unknown, time varying and bounded linear velocity, linear acceleration, and angular velocity and acceleration of the yaw angle. The controller is proved uniformly ultimately bounded (UUB) by means of Lyapunov analysis (ASCC 2017). By adopting the virtual camera approach and choosing image moments, we design the trajectories of the image features in image space to perform the image-based visual control task of the quadrotor. A feature's trajectory tracking controller is proposed to track the designed trajectories. The stability of the proposed tracking controller is analyzed and proved by means of Lyapunov analysis (TIE 2018).

Real‐time UAV Trajectory Planning
The trajectory planning algorithm is the core of autonomous navigation, which can undoubtedly greatly enhance the safety of flight. Due to the necessity of planning safe and dynamically feasible trajectories for quadrotors in unknown environments, we proposed a trajectory planning framework based on B‐spline and kinodynamic search. This framework can be used for a limited‐sensing quadrotor, and the flight is safe and effective along with these trajectories. First, a B‐spline based nonuniform kinodynamic (BNUK) search algorithm is proposed to generate dynamically feasible trajectories efficiently. The characteristics of nonuniform search make the generated trajectories safe and reasonable time‐allocation. Then, a trajectory optimization method based on control point optimization is proposed. Multiple outdoor flight experiments show the effectiveness of the proposed framework. (Journal of Field Robotics 2020)

Contact-based aerial manipulation
Contact-based aerial interaction control in positioning system denied environments is a challenging issue. The image-based impedance control strategies for force tracking of an unmanned aerial manipulator are proposed (TASE 2022, TAES 2024). To achieve force tracking under the visual guidance, we design an adaptive visual impedance control method which adjusts the target stiffness according to the force tracking error and the visual feature error. The closed-loop system is proved asymptotically stable by means of Lyapunov analysis. Besides, we propose a vision-guided approach for the impedance control of an aerial manipulator based on line features, with the goal of physical interaction with unknown environments. To this end, a nonlinear observer is proposed to online estimate the 3-D parameters of the environment. These parameters are adopted to estimate the interaction matrix related to the image features. By planning the image-space trajectory and the distance, desired interaction behavior can be uniquely specified without relying on any Cartesian information of the system. (TASE 2022, TIE 2023)

Human-Robot Co-Transportation With a Tethered Aerial Vehicle
Physical human-robot interaction (pHRI) in the filed of aerial vehicles has received more research attention in recent years. In this work, a visual impedance control strategy for human-aerial robot cooperative transportation with a tethered vehicle is presented. Without a positioning system, the aerial vehicle is controlled to follow the human partner by using cable force and visual features of the object as feedback. Furthermore, being aware of human motion is important to improve efficiency and smoothness of the cooperation. Without measuring velocities of the aerial vehicle and the human, we propose to directly estimate relative velocity of them by a vision-based velocity observer. This estimated velocity is then integrated into a visual impedance scheme. The stability of the system is rigorously proved by Lyapunov analysis and passivity analysis. Indoor experiments where a human participant transports a long bar with a tethered aerial vehicle are conducted.(TII 2023)

Biomimetic Morphing Quadrotor
We propose a novel biomimetic morphing quadrotor design inspired by the morphology of an eagle claw during prey capture. The arms of the quadrotor are capable of vertical folding to enable dynamic grasping, mimicking the transition of the eagle claw from an open to a closed state. This transition is achieved through the rotation of a central servomotor and the associated movement of 20 links. Thanks to the closed-loop multi-link structure of the frame, the propellers of the quadrotor remain in a fixed orientation when the arms are folded, allowing for system stabilization at any arm rotation angle. The geometric property of the whole frame is analyzed to determine the relationships and constraints of the links, which is important in experimental vehicle fabrication. To handle possible physical property changes and external disturbances during grasping, the adaptive sliding mode controllers are applied. To deal with objects of unknown size in grasping tasks, an admittance filter is proposed for adaptive morphology. While in flight, our proposed morphing quadrotor is able to rapidly or continuously transition to any configuration within its range smoothly. Experimental results show the ability of the quadrotor to dynamically grasp various unknown objects at 0.4m/s without additional tools, as well as its versatility in traversal of narrow spaces and perching.(TRO 2024)

Medical Robot

For medical robots, our research is mainly focused on the optimization of the specified operations, and is committed to improve the level of automation in the operation. Specifically, in the perception part, we study the optimization of the 3D tooth segmentation; in the control part, we develop the automatic manipulation of soft tissue in robot-assisted surgery (RAS), including deformation trajectory control, cutting control, etc.

3D Tooth Segmentation
Due to the variability and complexity of geometric feature distribution on dental meshes, traditional segmentation methods based on geometry often fail. We improve the region growing algorithm, with multiple parameters jointly evaluating the region similarity, in order to enhance the adaptability of the algorithm to the actual application scene requirements. Besides, we design a parameter adaptive method to raise efficiency and provide a multi-level label optimization algorithm for segmentation refinement (RCAR 2021). To improve the labeling accuracy and robustness against some tough conditions including tooth crowding, we establish a large-scale 3D dental mesh data set and propose a deep neural network called VFENet for 3D tooth segmentation and labeling (TMI submission).

Automatic Cutting Control of Deformable Objects
Automatic cutting is an essential task in the field of robot-assisted surgery (RAS). The high-dimensional deformation and time-varying topology caused by the interaction between the object and the cutting tool make relevant research preliminary. Thus, a cutting control method based on vision and force feedback is proposed to cut along a pre-designed trajectory automatically. Through force feedback, the resistance of the cutting tool can be reduced by adopting the pressing and slicing approach (WCICA 2018). To achieve more precise automatic cutting control to adapt to RAS, we developed an automatic cutting control algorithm for deformable objects based on surface tracking. A dynamic controller based on the combined features is designed. Compared with the cutting controller based on point features, this method can prevent the failure of the servo task caused by partial occlusion and invisible feature points (TMECH 2020).

Visual Tracking Control of Deformable Objects
Automatic manipulation of deformable objects is a challenging problem. Improper operations, such as excessive stretching, collision, are easy to cause damages to the deformable objects. Thus, not only the final configuration but also the trajectory of the deformation is supposed to be controlled during the process of interaction. In this paper, a model-free method to control the trajectory of the deformation in the unknown environment is proposed. We design an adaptive dynamic controller that adaptively estimates the deformation Jacobian matrix (DJM) online based on function approximation techniques (FAT), which approximates nonlinear functions with an arbitrary small error, avoiding modeling for compliant objects. Besides, we introduce a virtual force to improve the manipulability of the method. The stability of the proposed adaptive algorithm and the boundedness of internal signals are proved by Lyapunov theory whenever the approximation error is nonnegligible or negligible. Experiment results validate the efficiency of the algorithm proposed. (TIE 2021)

Soft Surgical Robot
The soft surgical robot system for cardiothoracic surgery has been independently developed, using flexible materials to cast prototypes to further ensure the safe interaction with the heart and lungs and other important organs during the operation. According to the needs of surgery, the medical image feedback is integrated, and the doctor can realize the man-machine interactive movement strategy based on the telecontrol lever, and realize the movement of the robot in the cavity such as forward and steering. The prototype has verified its operational performance in the organ model, and successfully carried out 5 live animal surgery experiments to test and verify the actual operating effect of the snake-shaped surgical robot. (Surgical Endoscopy and other Interventional Techniques, 2016)

The reconstruction of the cortical surfaces from magnetic resonance imaging
The reconstruction of the cortical surfaces from magnetic resonance imaging (MRI) holds immense value and significance on neurodegenerative diseases and surgical planning. However, these reconstruction tasks still face numerous challenges, such as being time-consuming and relying on manual parameter fine-tuning, structural errors like self-intersecting surfaces, as well as local surface reconstruction errors like protrusions or depressions. We propose a Diffusion-based implicit generative approach for Cortical Surface Reconstruction: DiffCSR. For the implicit reconstruction task of MRI, DiffCSR with a generative approach is a novel attempt. The ratio of self-intersecting surfaces is 0 in DiffCSR since the meshes are extracted by marching cubes and a lightweight topological correction algorithm is introduced to ensure spherical topology. Moreover, smoothing method ensures that our network produces sufficiently smooth results even at low resolutions.

Deformable Soft Tissue Reconstruction with Uncertainty-Guided Depth Supervision and Local Information Integration
Reconstructing deformable soft tissues from endoscopic videos is a critical yet challenging task. Leveraging depth priors, deformable implicit neural representations have seen significant advancements in this field. However, depth priors from pre-trained depth estimation models are often coarse, and inaccurate depth supervision can severely impair the performance of these neural networks. Moreover, existing methods overlook local similarities in input sequences, which restricts their effectiveness in capturing local details and tissue deformations. In this paper, we introduce UW-DNeRF, a novel approach utilizing neural radiance fields for high-quality reconstruction of deformable tissues. We propose an uncertainty-guided depth supervision strategy to mitigate the impact of inaccurate depth information. This strategy relaxes hard depth constraints and unlocks the potential of neural implicit representations. In addition, we design a local window-based information sharing scheme. This scheme employs local window and keyframe deformation networks to construct deformations with local awareness. By effectively utilizing local correlation information, it enhances the model’s ability to capture fine details. We validate our framework through extensive experiments on synthetic, EndoNeRF, and Hamlyn datasets, demonstrating its ability to produce high-quality representations of deformable soft tissues. A series of ablation studies further confirm the effectiveness and necessity of our proposed components.

Novel Robotic Uterine Manipulator
Uterine manipulation is essential in minimally invasive gynecological surgery to adjust position and tension of uterus. However, uterine manipulation is tedious, laborious, as well as long-lasting. Tthe manual manipulation performance would decrease over time due to the fatigue of the vaginal end assistant. To address this issue, we propose a novel uterine manipulation robot that consists of a 3-DoF remote center of motion (RCM) mechanism and a 3-DoF manipulation rod. This allows for tireless, stable, and safer manipulation in place of a human assistant. For the RCM mechanism, we propose a single-motor bilinear-guided mechanism that can achieve a wide range of pitch motion while maintaining a compact structure. This novel uterine manipulation robot is equipped with a manipulation rod that has a tip diameter of only 6 mm, providing intrauterine distal pitch and roll motions to enhance its flexibility. Additionally, the tip of the rod can be opened into a T-shape to minimize damage to the uterus. Feasibility has been demonstrated through ex-vivo and cadaver tests, as well as clinical trials. (TBME 2023)

Reinforcement-learning Control

For reinforcement learning technology, our primary focus lies in the autonomous control and robust task capabilities of unmanned systems in complex and dynamic task environments. We emphasize addressing the autonomous navigation challenges of diverse unmanned carriers and the precise control requirements of manipulator arms in high-accuracy operations. The former aims to tackle the issues of strategic adaptation of unmanned ground vehicles, drones, unmanned surface vehicles, and other unmanned platforms in open and dynamic environments, as well as the generalization of strategies for diverse task demands. The latter aims to enhance the mobility-operation coupled control capabilities of mobile manipulator arms and improve high-precision operational capabilities in complex tasks.

Reinforcement Learning-based End-to-End Robust Estimation
For the problems that traditional sampling consensus-based robust estimation algorithms cannot effectively utilize data features and historical information, cannot explore local features, and cannot be discretized, we design RLSAC: a novel reinforcement learning-enhanced sampling consensus framework for end-to-end robust estimation. Firstly, a graph neural network is used to utilize the data and memory features to guide the exploration direction of the next minimum set, thus effectively realizing the utilization of data features；In addition, to address the problem of error accumulation and policy degradation in the process of module coupling, the feedback of downstream tasks is utilized as a reward for unsupervised training to avoid differentiation of the sampling process, thus achieving end-to-end robust estimation；Furthermore, the state transition module is further integrated to encode data and memory features to effectively improve the exploration and effective utilization of diverse data features under dynamic local regions. Ultimately, the best performance is achieved in the basic matrix estimation and 2D linear fitting tasks; Finally, RLSAC can be widely used in various noisy scenarios to provide high-precision position estimation results for positioning-dependent scenarios, such as autonomous driving and robot navigation. It also can provide solutions for the robust estimation needs of diversified tasks, effectively improving the robust and anti-jamming performance of the overall system. Relevant achievements are presented in ICCV 2023.

Safe Reinforcement Learning-based Unmanned Aerial Vehicle Control
Regarding the autonomy and generalization requirements of task decision-making for unmanned aerial vehicle (UAV) systems, we have established a UAV control framework based on safe reinforcement learning technology: To avoid the high real-world trial-and-error costs caused by the use of inappropriate policies by the reinforcement learning agent during the entire learning cycle in the actual UAV control process, we first introduce a system dynamics regression technique based on deep learning and meta-learning to improve its strategy adaptability to dynamic environments and disturbances; In addition, using the Lyapunov stability determination method, we numerically analyze the stability of the UAV reinforcement learning policies and generate action constraints, further establishing a safe mapping between "system state" and "action"; Finally, this method ensures the safety of the UAV's operation in the real environment, while ensuring the good versatility of its action strategy facing diverse tasks and diverse environments. The representative results are published in IROS 2023.

Robot Visual Navigation Based on Reinforcement Learning
Considering the problem of target-driven visual navigation of intelligent robots in complex and unknown scenarios, we propose multiple learning-based cognitive navigation frameworks: Firstly, to alleviate catastrophic forgetting in learning-based navigation and enhance the robot's understanding of the scene, we introduce an online updated topological map as the memory structure of the environment, enabling the robot to utilize a wider range of temporal and spatial data to make more comprehensive decisions; Secondly, we integrate NeRF into the navigation framework, enhancing the robot's awareness of actively exploring and collecting target clues in unknown environments by estimating rendering uncertainty; Thirdly, we incorporate methods such as domain adaptation, domain randomization, and meta-learning to enhance the robot's generalization to unknown scenes and adaptability to the real world. Finally, by combining imitation learning with reinforcement learning, we significantly enhancing the robot's scene perception and task understanding abilities. We achieved highly competitive performance in multiple photorealistic navigation datasets such as Gibson and MP3D, and fine-tuned with only a small amount of data, realizing precise obstacle avoidance and image-goal visual navigation tasks for real TurtleBot robots in complex indoor scenarios. Relevant representative achievements can be found in IEEE JAS2024 and IEEE TII2024.

Autonomous Navigation of Robots for Large Range Unknown Dynamic Scenes
To address the navigation problem of the mobile robot in large-scale unknown environments, we propose a reinforcement learning-based robot navigation model guided by global information: We design a dense reward function to liberate the reinforcement learning model from the constraints of global paths during local exploration. This encourages the robot to follow the desired global path while also leveraging the advantages of reinforcement learning to autonomously explore optimal local motion coordination strategies. This approach enables small-scale learning models to navigate autonomously in environments of arbitrary large-scale; To tackle the navigation problem of robots in dynamic environments with complex interaction relationships, we establish a multi-layer relational learning network based on graph neural networks to infer potential interaction relationships between dynamic objects and robots; Additionally, we design a spatiotemporal aggregation network to further enhance the robot's understanding of dynamic scenes, ultimately achieving an understanding of interaction relationships and efficient collaborative navigation in dynamic environments. Related works can be found in IEEE RAL 2020, IEEE TITS 2023, among others.

Autonomous Navigation of Unmanned Surface Vehicles Based on Reinforcement Learning
For diverse marine mission requirements, we propose a learning framework for autonomous control of unmanned surface vehicles (USVs) integrating the dynamics of ships based on the reinforcement learning method: According to the challenges posed by the high inertia, underactuation, and lack of braking behavior of surface USVs in dynamic marine environments, we introduce a probabilistic graph model-based state prediction method to capture the current dynamics of the USVs, providing crucial prior information for dynamic policy generation; By incorporating meta-reinforcement learning techniques, we identify common navigation strategies across different maneuvering states of USVs, effectively enhancing the generalization and transferability of reinforcement learning policies to diverse dynamic characteristics; In addition, we further integrate graph neural networks to propose a communication-constrained multi-USV interaction method, proposing a pattern-switching multi-vessel formation coordination approach. This breakthrough enables dynamic formation obstacle avoidance and self-recovery capabilities in heterogeneous USV formations within complex scenarios. The method presented can provide technical support for multi-agent adversarial game tasks, including underwater acoustic countermeasures and collaborative electromagnetic interference involving multiple entities. Related works can be found in Ocean Engineering 2022 and Ocean Engineering 2023.

Reinforcement Learning-Based Control of Mobile Robotic Arms
To address the issues of accuracy in end-effector control and whole-body coordination of mobile manipulator during continuous operation tasks, as well as the low effectiveness of reinforcement learning(RL) algorithms themselves, we propose a RL-based method for 6-DOF end-effector trajectory tracking with hindsight experience replay: To improve the sample efficiency, we first model the trajectory tracking task as a multi-trajectory reinforcement learning framework. After each episode, we use gradient descent method to rewrite low-reward experiences into high-reward regions; Besides, we correct distribution shifts caused by experience replay using the f-divergence function as a density ratio estimator to estimate distribution shifts; Finally, we conducted extensive experiments on mobile manipulator in simulation and real world, demonstrating that this method can achieve accurate end-effector tracking for mobile manipulator under model-free conditions and achieve high sample efficiency in RL training.

Multi-robot Control and Large-scale Scheduling

For multi-robot system (MRS), we mainly focus on two aspects: efficient planning for large-scale multi-robot systems and robust formation control for multiple robots. The former aims to address task allocation, path planning, and local motion coordination challenges in large-scale MRS composed of thousands of robots, ensuring effectiveness and real-time performance under uncertainties. The latter aims to achieve coordinated motion of multiple robots in formation, while ensuring robustness and safety under constraints such as limited communication, time-varying network topologies, and individual robot failures.

Large-scale Multi-robots Planning and Scheduling
In addressing the scheduling challenges of large-scale robot network with uncertainties, we propose an integrated optimization framework that combines environment-awareness, dynamic task allocation, and real-time trajectory coordination. Firstly, we employ a data-driven approach to adaptively adjust the environmental roadmap based on historical and predictive data, accommodating real-time changes in task flows and robot traffic flows. Then, we introduce a theory for identifying key tasks and critical individuals in large-scale systems, facilitating optimal resource distribution for scheduling. Additionally, we develop an integrated model for task allocation and robot path planning that mitigates uncertainties during the planning phase, reducing uncertainties during task execution. To handle real-time solving challenges in large-scale optimization problems, we design a greedy solving method that achieves sub-second dynamic planning capabilities in a warehousing scenario with over 2000 robots, ensuring safety and collision-free operations under motion and communication uncertainties. At the individual robot planning level, we introduce accompanying behaviors to enable synchronous execution of multiple tasks. Our work has been recognized with publications in IEEE TITS 2016, IEEE TASE 2020, and IEEE TASE 2021, as well as awards including the Best Paper Finalist at IEEE RCAR 2017, the Best Paper Award at IEEE RCAR 2020, and the Best Paper Award at IEEE ROBIO 2023.

Robust Formation Control of Multi-robots
In the realm of multi-robot formation control and coordination, we address the control of multi-robot formations under network communication environments. Firstly, we transform discrete sampled data with time delays into equivalent continuous systems under time-varying delays. Then we construct synchronization control methods and global stability analysis theories through the introduction of a cross-coupling model. Building upon this foundation, we further address challenges arising from dynamic communication-coupled topologies and individual robot failures, which result in changes to the overall formation and decreased motion synchronization. Specifically, we develop repair criteria based on recursive switching topologies and distributed self-repair algorithms, breaking through the distributed energy-optimal self-repair capabilities of motion synchronization systems with thousands of robots under partial node failures. By incorporating methods such as vehicle-level control compensation, slip parameter estimation, and visual servo control compensation, we extend robust formation control theories to fleet control tasks of industrial heavy-duty autonomous driving systems. This extension enables robust formation control and adaptive formation switching capabilities for industrial heavy-duty vehicles ranging from 50 to 200 tons. Our work has been published in IEEE TCST 2016, IEEE TSMC 2020, and IEEE/ASME TMECH 2022, and has been recognized with the Best Paper Finalist Award at IEEE RAL 2019 and the Best Paper Award at IEEE CYBER 2022.

Large-scale Robot Scheduling System for Unmanned Warehousing Scenarios
To address the large-scale scheduling demands in robot warehousing scenarios, our team collaborated with multiple companies to develop intelligent scheduling systems for warehousing robots. Concurrently, we participated in the National Key Research and Development Project "Unmanned Flexible Warehousing Logistics Robot System and Application Demonstration for E-commerce" and closely collaborated with leading domestic robot warehousing enterprises. In relevant applications, we achieved comprehensive project experience covering AGVs, four-way shuttle vehicles, and automated forklifts, establishing a hierarchical decentralized planning system, multi-level safety response mechanisms, and real-time rolling online optimization algorithms. Additionally, we developed software for iterative lifelong learning of warehousing robot data and for analyzing the task feasibility and robot scale capability limits in warehousing environments. Our team has made breakthroughs in task allocation, path optimization, and distributed motion coordination technologies for over 2000 warehousing robots in practical application projects. We achieved a transition from offline planning at the hour level to real-time planning at the second level in large-scale warehousing scenarios. Under 3% motion uncertainty and 2% communication loss rates, we realized safe collision-free planning for thousands of robots and completely avoided local congestion and deadlocks. Moreover, our team holds multiple related patents.

Humanoid Robot

Planning for Humanoid Dual Arm Grasping
The 6-degree-of-freedom workspace for general grasping tasks is more challenging than planar grasping tasks. Therefore, to avoid obstacles while generating the most stable dexterous hand grasping posture based on the type and shape of objects, and to consider force adaptability for stable grasping, we have developed a method that integrates visual guidance with multi-sensor information to generate reference grasping poses for dexterous hands. This method also adapts to the force changes during the grasping process using force sensing information to ensure stable grasping.

Robot Teleoperation & Imitation Learning
To resolve complex long-term dual-arm manipulation tasks, we built a dual-arm direct joint space teleoperation system to collect demonstration data and utilized the generative imitation learning algorithm to learn complex dual-arm manipulation strategies. We utilized isomorphic master-puppet six-axis robotic arms for direct joint space teleoperation. Additionally, we employed a sliding rail structure in the gripper actuator to reduce friction resistance, thereby improving object gripping. Based on the collected data, we employed a generative imitation learning model for behavior cloning. The generative model can effectively learn the action sampling probabilities conditioned by different observations from the observation-action pairs obtained through human demonstrations. This allows the model to generate reasonable actions from the probabilistic model under new observation conditions during inference to complete the task. The experiment shows the imitation learning model trained with 50 demonstrations can autonomously perform the task of placing a laser pointer in its case.

VLMs

VLM-Based Robotic Manipulation
We propose KeyManip, a robotic manipulation method based on Vision-Language Models (VLMs) and spatially unified dense geometric primitives. Our approach adopts a code2action paradigm, eliminating the need for training data and enabling strong zero-shot capabilities for long-horizon tasks. By leveraging the reasoning abilities of VLMs, complex tasks are decomposed and executed through dense geometric primitives defined over both the internal and external spaces of objects, enabling generalized manipulation. To address execution failures and disturbances, we construct and continuously update a spatial topology graph, tracking primitives through spatial transformations. This allows real-time understanding and replanning, achieving closed-loop manipulation.

VLM-based Temporal Interaction Localization
We pioneered a new research direction in egocentric temporal interaction localization (TIL), exploring hand-object contact/separation timings localization to capture high-precision temporal state transitions for downstream tasks like human-robot skill transfer and behavior analysis. Traditional temporal action localization (TAL) methods suffer from domain bias, poor generalization, and coarse-grained estimation within open-loop frameworks, making them unsuitable for TIL. To address this, we propose EgoLoc, the first zero-shot TIL framework integrating 3D perception and closed-loop feedback. Our approach includes hand-motion-based adaptive sampling, VLM reasoning, and closed-loop correction. EgoLoc shows powerful TIL ability on public datasets and our DeskTIL benchmark, reducing VLM-induced uncertainty by ~60% and proving effective in robot pick-and-place tasks. This work is accepted by IROS 2025.