报 告 人：Chng Eng Siong, Associate Professor, School of Computer Engineering, Nanyang Technological University, Singapore(新加坡南洋理工大学，计算机工程学院，语音与语言处理实验室)
主 持 人：谢磊教授
报告题目：A LEARNING-BASED APPROACH TO DIRECTION OF ARRIVAL ESTIMATIONIN IN NOISY AND REVERBERANT ENVIRONMENTS
We presents a learning-based approach to the task of direction of arrival estimation (DOA) from microphone array input. Traditional signal processing methods such as the classic least square (LS) method rely on strong assumptions on signal models and accurate estimations of time delay of arrival (TDOA) . They only work well in relatively clean conditions, but suffer from noise and reverberation distortions. We propose a learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation. Specifically, we extract features from the generalized cross correlation (GCC) vectors and use a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA. One advantage of the learning based method is that as more and more training data becomes available, the DOA estimation will become more and more accurate. Experimental results on simulated data show that the proposed learning based method produces much better results than the state-of-the-art LS method. The testing results on real data recorded in meeting rooms show improved root-mean-square error (RMSE) compared to the LS method.
Chng Eng Siong is currently an Associate Professor in the School of Computer Engineering, Nanyang Technological University, Singapore. He received his BEng (Hons) in Electrical and Electronics Engineering from the University of Edinburgh, U.K in 1991, and PhD from the same University in 1996. He has worked in Riken (Japan), Institute of System Science (Singapore), Lernout and Hauspie and Knowles Electronics prior to joining NTU. He is currently leading the speech and language technology program in Emerging Research Lab at the School of Computer Engineering and his research interests are in pattern recognition, signal, speech and video processing.