Yihan Wang | EPFL

About Me

I am a Master’s student at École polytechnique fédérale de Lausanne (EPFL), with a focus on computer vision and machine learning. I’m currently working as a research student focusing on parametrical diestortion-free Structure-from-Motion at CVG group under the supervision of Prof. Dr. Marc Pollefeys. Previously, I spent time in VITA lab working on pose-estimation based lane marker detection, supervised by Prof. Dr. Alexandre Alahi. Prior to joining EPFL, my deep-seated interest in human-centric technology led me to conduct research on the spread mechanisms of respiratory diseases at Zhejiang University, where I completed my bachelor’s degree, working under the supervision of Prof. Jianjian Wei.

Research Interests

My research interest primarily focuses on the application of deep learning to various computer vision tasks, such as image recognition, 3D localization, and 3D reconstruction. I am particularly interested in exploring these areas within the realm of autonomous vehicle perception. Additionally, I am keen to expand my expertise and gain practical experience in other domains including augmented reality.

Selected Projects

Beyond Structure-from-Motion with Camera Intrinsics

We developed a parametrical distortion-free incremental Structure-from-Motion (SfM) pipeline, incorporating with 1D radial camera model and implicit distortion model. The significantly better performance for both 3D point cloud quality and camera pose estimation quality on severly distorted datasets compared to the state-of-the-art COLMAP pipeline highlights the our pipeline's robustness to radial distortions.

Pose Estimation Based Monocular 3D Lane Detection

We adjusted a deep neural network originally used for human pose estimation, OpenPifPaf, to identify lane markers using a keypoint-based detection approach. This adaptation is particularly suited for real-time applications due to its sparse representation efficiency. To repurpose the network for lane detection, we developed new dataloaders and plugins, enabling it to be retrained on the OpenLane dataset. A keypoint downsampling strategy was also investigated to facilitate the lane pose learning.Our modified 2D model matched the performance of leading 2D lane detection methods on OpenLane benchmark, as indicated by PersFormer. Building on this, we extended this task into 3D domain. A monocular dense depth estimation method was incorporated, for which we finetuned with limited sparse depth annotations. This method facilitated the prediction of lane markers at distances exceeding the initial 80 meters limitation. Such improvements boost the model’s utility in outdoor autonomous driving scenarios.

Improving Neural Networks Performance with Zeroth-order and First-order Hybrid Optimization Methods

This project delves into the limitations of first-order (FO) gradient-based optimization in neural networks, exploring their gradient-free zeroth-order (ZO) alternatives like ZO-SGD and ZO-signSGD. Using a PyTorch-based framework, we implemented and compared ZO and FO methods across various network configurations and hyperparameters, assessing time efficiency, convergence rate, stability and overall performance for hand-written digits classification task. The study reveals ZO methods' increased stability and lower hyperparameter sensitivity, attributed to the regularizing effect of perturbation techniques in gradient estimation. However, ZO methods were found to be more reliant on model architecture and scaling. A hybrid FO-ZO approach emerged as a balanced solution, optimizing both gradient computation costs and efficiency.

Crafting a Personalized Beer Landscape: Analyzing User Preferences and Naming Impact for Guiding Targeted Recommendation

This project takes a deep dive into beer popularity and user taste preferences using review datasets from BeerAdvocate and RateBeer, consisting of both categorical ratings and textual reviews. Merging statistical methods with natural language processing techniques like sentiment analysis and word2vec, we uncovered a multi-faceted view of the beer domain, including the popularity of beer and the influence of naming on ratings, and the similarities between styles. Our findings provide tailored recommendations for users, considering not just popularity but also qualitative attributes and regional taste variations. An interactive webpage was also deployed, I invite you to click on the image and take a look at our data story.

Training a Chatbot for Commonsense Persona-grounded Dialogue Generation

We participated in the Commonsense Persona-grounded Dialogue Challenge organized by Sony and EPFL, which aims to generate dialogue responses that possess both persona consistency and contextual coherence. We enhanced a BART-based model by incorporating a knowledge-graph-based data augmentation technique and evaluated model performance given different set of augmentation settings and persona constraints. Our best-performing model achieved a word F1 score of 17.27, surpassing the baseline score of 17.001 set by GPT-3.5 Turbo using a simple prompt.

Deep Learning Specialization Projects

After initiating practical projects through the Deep Learning for Autonomous Vehicles @ EPFL, I opted for a structured learning path to thoroughly understand deep learning as it applies to computer vision by enrolling in the Deep Learning Specialization on Coursera. This program enabled me to acquire a robust theoretical foundation and hands-on experience in constructing efficient neural network architectures from the ground up—including ResNet, LSTM, Transformers, etc. Additionally, I applied these architectures to a range of computer vision tasks, such as vehicle detection using the YOLO framework and image segmentation utilizing the U-Net architecture.

Publications

Tracing the origin of large respiratory droplets by their deposition characteristics inside the respiratory tract during speech

Yihan Wang, Jianjian Wei, Caroline X. Gao, Li Liu, Building Simulation, 16, 781 – 794 (2023), orally presented in IEHB (2021).

During the Covid-19 pandemic, the prevalence of asymptomatic and pre-symptomatic transmissions posed a continuous threat to public health, with the significance of airborne versus droplet-spray transmission being a subject of debate. To offer insights into non-pharmaceutical infection control measures, we carried out a computational fluid dynamics study to investigate the threshold at which differently-sized droplets, produced at primary SARS-CoV-2 replication sites, can escape during speech activities using a realistic human airway model. In conjunction with published medical data, our study emphasized the substantial risk posed by small droplets and highlighted the previously underestimated importance of the airborne transmission pathway.