Speaker: Wei-Lun (Harry) Chao
Time: Wed 11/6, 4pm-5pm
Location: Dreese Lab 480
Title: “Pseudo-LiDAR for Image-based 3D Object Detection in Autonomous Driving”
Abstract: Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving. Recent techniques excel with highly accurate detection rates, provided that the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies — a gap that is commonly attributed to poor image-based depth estimation.
In this talk, I will show that it is not the quality of the data but its representation that accounts for the majority of the performance gap. Taking the inner workings of ConvNets into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations — essentially mimicking the LiDAR signal. With this representation, we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach significantly outperforms the existing state of the art in image-based performance — leading to a 300% relative improvement and halving the gap between image-based and LiDAR-based systems.
I will then present methods to advance the pseudo-LiDAR framework through improvements in image-based depth estimation. Concretely, we adapt the network architecture and loss function to be more aligned with accurate depth estimation. Further, we explore the idea to leverage cheaper but extremely sparse LiDAR sensors, which alone provide insufficient information for 3D detection, to de-bias our depth estimation. On the KITTI benchmark, our combined approach yields substantial improvements — leading to a 40% relative improvement for far-away objects and achieving comparable performance to LiDAR-based systems for nearby objects. I will conclude the talk with research challenges and opportunities in robust perception for autonomous driving.
Bio: Wei-Lun (Harry) Chao is an Assistant Professor in Computer Science and Engineering at the Ohio State University. His research interests are in machine learning and its applications to computer vision, natural language processing, artificial intelligence, and healthcare. His recent work has focused on robust robotic perception and large-scale visual understanding in the wild. Prior to joining OSU, he was a Postdoctoral Associate in Computer Science at Cornell University working with Prof. Kilian Q. Weinberger and Prof. Mark Campbell. He received a Ph.D. degree in Computer Science from the University of Southern California, supervised by Prof. Fei Sha.