A research team consisting of Research Associate Shinnosuke Takamichi of the Graduate School of Information Science and Technology, the University of Tokyo, and his colleagues has developed technology to artificially reproduce the singing voice of Yumi Matsutoya as it was when the singer debuted 50 years ago, as part of research commissioned by the Toei Zukun Laboratory. The reproduced voice sang a duet with Ms. Matsutoya's current voice using the name Yumi Arai, Ms. Matsutoya's name as an artist when she made her debut. The duet was publicly released on October 1 as the Call me back music video on YouTube.
A research group made up of people from Parakeet Inc., a company started by Research Associate Takamichi and his colleagues, which is specialized in audio information processing and machine learning, focused their research on synthesis/conversion technologies that reproduce speaking and singing voices with a high degree of accuracy, as well as data design technologies for speaking and singing voices that contribute to synthesis/conversion technologies.
For this study, they engaged in research to artificially reproduce the singing voice of Ms. Matsutoya as it was 50 years ago. Ms. Matsutoya is a musician who has remained on the front lines from the New Music era (Began in the 80's) to the present golden age of J-POP, but the audio materials from her debut are limited. What's more, as these are not materials that were designed with reproduction in mind, reproduction using conventional information engineering technology was difficult.
From this background, the researchers proposed a method that made use of multi-stage synthesis conversion task mixed-learning algorithms and a data editing/cutting method that can be applied to actual data (a way of semi-automatically deleting any data unnecessary to machine learning).
First, the research group created a design to compensate for the lack of audio materials from the time using multiple speaking and singing voice databases, including JSUT and JVS, which were presented in 2020. They applied machine learning to text voice synthesis using this huge database of speaking voices to find "the epitome of a reading voice." They also used machine learning for "the epitome of a singing voice" when it came to singing voice synthesis using the singing voice database. Finally, they used singing voice conversion to reproduce a singing voice from this period. Through this singing voice conversion, they generated a singing voice with Ms. Matsutoya's contemporary tone and rhythm, using the voice of a singer from the database as a baseline.
The intonational fluctuations of a singing voice can be unique expressions particular to a singer, and these can be difficult to handle using machine learning. With regard to this, the research group had the option of "using machine learning for the fluctuations," which had already been suggested, but for this research they instead proposed a method to suppress the fluctuations for the data and edited and cut audio materials in accordance with the strengths and weaknesses of machine learning.
The singing voice, a reproduction of that of Yumi Arai (her name when she made her debut), has been used for one of Ms. Matsutoya's music videos and on an album for her 50th anniversary under the artist name of "Yumi Matsutoya with Yumi Arai."
This article has been translated by JST with permission from The Science News Ltd.(https://sci-news.co.jp/). Unauthorized reproduction of the article and photographs is prohibited.