AI lip-reading app to playback lip movements — Voices lost due to illness can now be restored | News

An AI app that can help people regain their voices lost due to laryngeal cancer, hypopharyngeal cancer, or other conditions has been developed. A research group comprising Specially appointed Associate Professor Yoshihiro Midoh and Professor Noriyuki Miura at the Graduate School of Information Science and Technology, and Professor Hidenori Inohara at the Graduate School of Medicine at Osaka University, has developed "Lip2ja: Lip-Reading-Based Japanese Speech System," and implemented it in a smartphone application. In addition, by combining the system with the speech platform "CoeFont," which can synthesize personal voice from a recording of a pre-registered short speech, the research group made it possible to output a recognized speech in a person's voice. The results were presented at the 75th Annual Meeting of the Japan Broncho-Esophagological Society.

There are alternative speech methods available for people who have lost their voice due to laryngeal cancer or other conditions. However, these methods require the use of a special machine worn around the neck or a hole in the neck to produce sound, which places a great physical strain on the person and results in a voice that is far from that of the original. A machine lip-reading application, which analyzes speech from video images of the mouth captured by a camera, achieved high accuracy in lip reading when using English, which has a large number of vowels (approximately 20). However, in Japanese, which has a very small number of vowels (5), lip reading was considered difficult because, for example, the shape of the mouth is almost identical when pronouncing "ka" and "a."

Professor Tsuyoshi Miyazaki at the Faculty of Information Technology at Kanagawa Institute of Technology, and his colleagues published the mouth shape code in 2009. Miyazaki and his group found patterns in the changes in mouth shapes when speaking Japanese focusing not only on the vowels of the spoken characters but also on the context of the spoken words. They succeeded in high-level encoding of mouth shapes that continuously change according to the context of the speech. The mouth shape code consists of 16 patterns and can associate mouth shapes with spoken words in more detail going beyond the five Japanese vowels. The research group developed a unique AI lip-reading application following a two-stage process to combine an AI algorithm that estimates mouth shape codes from video images and an AI algorithm that converts the estimated mouth shape codes into Japanese words.

By employing natural language processing AI models to the highly accurately estimated mouth shape codes, the researchers were able to translate mouth shapes that are indecipherable for ordinary Japanese people into natural Japanese language. Furthermore, by combining the application with the speech platform "CoeFont," which can synthesize personal voice from a recording of a short pre-registered speech, it is now possible to output a recognized speech of an individual in his/her own voice. By registering their voices in advance, patients who may lose their voices due to surgery or other reasons can reproduce their lost voices based solely on the movements of their mouth.

Voice reproduction improves the quality of life of not only the patient but also the patient's family members. The research group successfully developed this system with the help of CoeFont Co., Ltd., which provided the application free of charge.

Midoh said, "In this joint research project conducted through medical−engineering collaboration led by Osaka University, we aimed to make speech-based communication barrier-free. Even if a person unfortunately finds it difficult to speak naturally, we are researching and developing a system for them that can support communication in a manner as close as possible to how they currently speak. When you suddenly recall old memories with family and friends, vivid voices often come back to you along with the images. We have now succeeded in developing a lip-reading speech system in Japanese, which was technically difficult to achieve. We hope that this system will not only improve the quality of everyday life but also add color to the memories of loved ones through the sound of their voices. During a panel discussion at the 125th General Meeting and Academic Lecture of the Japanese Society of Otolaryngology - Head and Neck Surgery held in May 2024, we were impressed by the high quality of CoeFont's AI voice. From that time, the company firmly supported our research, which led to a major improvement in the system that allows users to speak in their own voice through lip reading."

This article has been translated by JST with permission from The Science News Ltd. (https://sci-news.co.jp/). Unauthorized reproduction of the article and photographs is prohibited.

AI lip-reading app to playback lip movements — Voices lost due to illness can now be restored

Recommended

Recent Updates

Most Viewed