COMPUTER VISION & VR GROUP
INTELLIGENT TRANSPORTATION SYSTEM
ASSISTIVE ALARMING SYSTEM
PEDESTRIAN & VEHICLE DETECTION
Real-Time Semantic Segmentation with Edge Information for Autonomous Vehicle
在先進駕駛輔助系統 (ADAS)中，其中一項基本功能需求是藉由影像切割功能找到車輛可以行駛的區域。有別於傳統影像切割方法，採用語意分割的深度學習網路架構，可以更正確辨識不規則的道路區域，指引自駕車行駛在更複雜的道路環境中。本研究中，首先分析最先進的即時影像語意分割系統的輸出。 由這些輸出結果顯示，大多數被錯誤分類的像素，都是位於兩個相鄰物件的邊界上。基於此觀察，本研究提出一種新穎的即時影像語意分割網路系統，它包含一個類感知邊緣損失函數模塊與一個通道關注機制，旨在提高系統準確性而不損害運行速度。
In Advanced Driver Assistance Systems (ADAS), image segmentation for recognizing drivable areas and guiding the vehicle forward is a basic function. For the latter, unlike those traditional image segmentation methods, image semantic segmentation based on deep learning architecture can handle the irregularly shaped road areas better, guiding a vehicle to drive in a more complex environment. In our research work, we first analyze the output of state-of-the-art real-time semantic segmentation networks. The result shows that most of the misclassified pixels are located on the edge between two classes. Based on this observation, we propose a novel semantic segmentation network which contains a class-aware edge loss module and a channel-wise attention mechanism, aiming to improve the accuracy with no harm to inference speed.
Deep Learning-based Robust Real-time Road Marking Detection System
In recent years, Autonomous Driving Systems (ADS) become more and more popular and reliable. Road markings are important for drivers and advanced driver assistance systems better understanding the road environment. While the detection of road markings may suffer a lot from various illumination, weather conditions and angles of view, most traditional road marking detection methods use fixed threshold to detect road markings, which is not robust enough to handle various situations in the real world. To deal with this problem, some deep learning-based real-time detection frameworks such as Single Shot Detector (SSD) and You Only Look Once (YOLO) are suitable for this task. However, these deep learning-based methods are data-driven while there are no public road marking datasets. Besides, these detection frameworks usually struggle with distorted road markings and balancing between the precision and recall. We propose a two-stage YOLOv2-based network to tackle distorted road marking detection as well as to balance precision and recall. Our network is able to run at 58 FPS in a single GTX 1070 under diverse circumstances. In addition, we propose a dataset for the public use of road marking detection tasks. The dataset consists of 11800 high resolution images captured at distinct time under different weather conditions. The images are manually labeled into 13 classes with bounding boxes and corresponding classes. We empirically demonstrate both mAP and detection speed of our system over several baseline models.
A research on early fusion based on multi-sensor
將多感測器間的資訊透過透過深度學習各自進行特徵擷取，各感測器擷取的特徵投影製相同的座標系，將這些特徵在特定通道上串接在一起得到融合後的特徵 ，進行前端的融合後，各感測器間可以利用彼此的優點截長補短，以得到良好的特徵。我們發展出一連串以電腦視覺(computer vision)為基礎的技術，其技術應用於障礙物偵測(行人或車輛偵測)。
The information between multiple sensors is collected through deep learning for individual feature collection. The features collected by each sensor are projected to the same standard system, and these features are connected to a specific channel to obtain the fused feature. After the early fusion, each sensor can take advantage of each other to cut short complements to obtain good features. We have developed a series of techniques based on computer vision techniques are applied on obstacle detection (such as pedestrian or vehicle detection).
User-Specific Motion Recognition
This research focuses on developing a reliable and general action-recognition (AR) system, which is difficult to adapt various appearance and action styles of different users. Many future applications rely on this technique such as the real-time Human-Machine Interaction (HCI) systems due to the significance of identifying the input signal of a certain motion pattern, and are potentially used in extension techniques of AR as well as enhance the convenience of life.
This research is now focusing on hand pose estimation and human-computer interaction, and we would like to propose a method to provide a natural way to interaction with the computer and the virtual environment. The difficulty of this includes hand pose variations, self-occlusion and depth image broken. The process of our application is listed below: we will first get the depth image from the depth camera which is set on the helmet, then estimation the 3D coordinates of the hand joint by deep learning. Finally, project the joint coordinates in to the virtual environment and interact with the object.