On May 7, 2024, in late spring and early summer, the seventh @ World "Touching Academic Frontiers" International Exchange Activity of the SCS successfully completed on the campus. At this event, we were honored to invite four student representatives who have achieved outstanding results in their respective research fields. They not only shared their valuable research experience in scientific exploration, but also brought the latest achievements in scientific exploration in their respective fields.
Mu Bingshen's report focuses on the far-field speech recognition task of the CHiME-7 competition and multi-channel speech recognition models for multiple array topologies. Firstly, he presented the report on the participating systems in the CHiME-7 Far Field Speech Recognition Challenge. The participating systems successfully navigated complex scenarios that include background noise, reverberation, and speaker voice overlapping. The design paper of the system has been accepted at the INTERSPEECH2023 conference. In addition, Mr. Mu also demonstrated a multi-channel speech recognition model for multiple array topologies using automatic channel selection and spatial feature fusion. Relevant experiments were conducted on the CHiME-7 corpus with given speaker information based on the proposed multi-channel speech recognition model. Experimental results showed that the model achieved a relative 40.1% reduction in DA-WER (Diagnosis Attributed WER) on the Eval set compared to the baseline model. This paper was accepted at the ICASSP (International Conference on Acoustics, Speech and Signal Processing) 2024, which is the world's largest and most comprehensive top-level conference on signal processing and its applications.
Zhu Xinfa brought his observations and experiences at ICASSP 2023, introducing a method of expressive speech synthesis based on multi factor decoupling. This method decomposes speech into multiple representations, including content, speaker, emotion, and style, enabling multi factor recombination of speaker, emotion, and style to obtain expressive synthesized speech. This technology has broad application prospects, and we often feel its potential impact in our lives.
Wei Kun shared his feelings on ICASSP2023 and introduced the latest research on end-to-end voice to speech translation. Speech to speech Translation (S2ST) refers to a type of technology that converts speech in one language into speech in another language. This technology is becoming increasingly important under the trend of globalization, especially in providing more direct convenience in cross-border exchanges, tourism, business, and other scenarios. Speech to speech translation is one of the important research directions of the research group of audio speech and language processing at NPU (ASLP@NPU).
Finally, Xue Xizhe presented her research findings in the field of open world instance segmentation, proposing a consistency based open environment instance segmentation method called TOIS (Transformer based Open World Instance Segmentation) to address the challenges faced by instance segmentation in open scenarios. This method not only achieves significant performance improvement, but the proposed cross-task consistency constraint can also be extended to the setting of semi-supervised learning, more effectively synergizing labeled and unlabeled data, thus improving the performance of open scene instance segmentation.
This activity provides students with a platform to showcase their research findings, but also deepens the depth of their academic exchanges, affording them more opportunities to interact with and explore the forefront of international academia. Through the continuous hosting of the @ World "Touching Academic Frontiers" international exchange series activities, the SCS will continue to inspire students' innovative spirit, cultivate them into future leaders in the field of technology, and look forward to their more outstanding and eye-catching achievements in the global computer science stage.