A Look Around Innovation, Part 12: Practical Application of "Multiple Sound Spot Synthesis" for Delivering Different Languages in Multiple Areas

In the segment 'A Look Around Innovation,' we introduce research and development (R&D) sites that have led to social implementation. In this 12th installment, we introduce the efforts of Takuma Okamoto, Research Manager of the Universal Communication Research Institute of the National Institute of Information and Communications Technology (NICT), who is working on the practical application of the "Multiple Sound Spot Synthesis Technology," which uses a large number of loudspeakers to cancel out unwanted sounds and deliver different voices in multiple areas at will.

Takuma Okamoto.
Advanced Speech Technology Laboratory, Advanced Speech Translation Research and Development Promotion Center, Universal Communication Research Institute, National Institute of Information and Communications Technology. Research Manager 2022-24 A-STEP Principal Researcher.

Simultaneous narration in four languages — Fusion of sound field control and speech synthesis

The Universal Communication Research Institute of the National Institute of Information and Communications Technology (NICT) is located in Kansai Science City, which straddles the three prefectures of Kyoto, Osaka, and Nara, and is situated in lush green hills. When entering a room in this science city, a Japanese narration came from a circular device approximately 18 cm in diameter placed in the center of the room. Walking clockwise along the device, the audible sound switched from Japanese to English at a certain point, then to Korean, then to Chinese, and then back to Japanese after one full pass around.

In other words, the device plays narration in four different languages simultaneously; nevertheless, each language is heard differently depending on where you are standing, and the sound is clear without any mix up between languages. A closer look at the circular device reveals 16 small loudspeakers, each about three cm in diameter, arranged in a circle. "This is called multiple sound spot synthesis technology, which overlaps and erases sound waves, so that you can hear sound at a location where you want it to be audible and not at where you don't want it to be heard," explains Takuma Okamoto, Research Manager at the NICT's Universal Communication Research Laboratory.

Since his days as a student, Okamoto has been researching "sound field control," which controls sound in a space as if one is in a three-dimensional image. After moving to NICT, he expanded his research into natural and smooth "speech synthesis technology," such as multilingual speech translation applications for smartphones, and devised "Multiple Sound Spot Synthesis Technology " that combines sound field control and speech synthesis technologies. Conventional technologies for delivering sound to specific locations include directional loudspeakers that utilize the directivity of ultrasonic waves. However, the developed technology has the characteristics of high sound quality and low ear impact.

A change in arrangement from straight line to circle — It can be divided into up to eight areas

While sound reproduced by a normal loudspeaker propagates in all directions, the multiple sound spot synthesis technology cancels out sound from directions other than the desired one using the same principle as noise-canceling headphones (Figure 1). "This makes it possible to hear multiple sounds simultaneously in each direction," said Okamoto on how this technology works.

Figure 1:Mechanism of multiple sound spot synthesis technology

Normally, sound waves spread in all directions from the sound source; however, by canceling out unwanted parts, only sound in front of the loudspeaker is heard with emphasis. Currently, sound can be divided into up to eight areas.

Okamoto, who thought that the loudspeakers he developed could be used in various ways to meet the needs of society, was given the opportunity to attend practical training in the JST "Program Manager (PM) Training and Activity Promotion Program," which was also attended by project leader Keita Hikita of the NICT IDI Co-Creation Design Project, and from this the pair began various developments for practical use in 2021. The prototype originally developed in 2014 comprised 64 loudspeakers arranged in a straight line, with each having a diameter of 5 cm. When mounted on the walls or ceilings of exhibition rooms, the system allowed one to distinguish between Japanese and English commentary depending on the position at which they stand; however, there were problems of limited installation options and difficulty in carrying the system around. To resolve this issue, he decided to change the arrangement of the speakers to a circle.

"I thought that it will be difficult to cancel out the sound leakage from the backside of the loudspeakers, but after some trial and error, I was able to cancel out leakage cleanly," Okamoto recalled. In 2022, with the cooperation of the National Museum of Emerging Science and Innovation (Miraikan), he presented a demonstration test of multiple sound spot synthesis using two types of loudspeakers, a circular array of 16 loudspeakers and a linear array of 64 loudspeakers, which were developed jointly with Kitanihon Onkyo (Sakata, Yamagata Prefecture). These were well received by visitors, and there were several inquiries from companies.

Inquiries from leading domestic and international companies: "Listening is believing"

To accelerate development toward practical use based on the results of the first prototype, Okamoto applied for the A-STEP tryout program in 2022. After being accepted into the project, he worked on reproducing high-quality, stable voices, including low male voices, ensuring sufficient sound volume to accommodate large-event venues and improving operability and portability. While Okamoto developed a new calculation method for loudspeaker playback signals to improve speech intelligibility, he also jointly developed a high-quality loudspeaker array with Kitanihon Onkyo and completed the fabrication of the second prototype in March 2023.

The second prototype was unveiled at the open house of the NICT headquarters in June of the same year, where it is currently placed in the center of a round table, providing simultaneous interpretation in four languages: Japanese, English, Chinese, and Korean (Figure 2). "The system could smoothly translate chatty conversations, allowing the general public to experience the power of this technology," he said with a smile. Since then, he has participated in events such as the Internet Governance Forum organized by the United Nations, CEATEC organized by the Japan Electronics and Information Technology Industries Association, and various international workshops. In January 2024, his invention was featured on a TV program, and inquiries have been received from leading domestic companies. Then, in February of the same year, a more compact third prototype (Figure 3) was developed.

Figure 2:Exhibition at the Open House of the NICT Headquarters held in June 2023

The simultaneous four-language interpretation system (Japanese, English, Chinese, and Korean) and multiple sound spot synthesis technology work in tandem to achieve natural conversations with no time lag.

Figure 3:Development of circular loudspeaker array demonstrator

From left to right: First prototype, second prototype, and third prototype. The second and third prototypes, which improved the volume and sound quality, which were issues encountered by the first prototype, have almost the same functionally, but a smaller version was created with practical application in mind.

In the future, Okamoto aims to fabricate the device with wireless connectivity and increased loudness as well as to develop its applications. This system has various possible applications. For instance, it can deliver sound tailored to individuals in a living room where they spend time with their family or in the car while driving. Or, it can be used for emergency traffic information at train stations and disaster prevention radio telecasts tailored based on regional characteristics such as coasts and cliffs. "I believe that there is still a lot of scope for the commercialization of sound. I would also like to produce loudspeaker systems that can change volume depending on where you sit," he says of his aspirations. The multiple sound spot synthesis system developed by Okamoto and his team is on permanent display in the exhibition room at the NICT Headquarters (Koganei, Tokyo). "Listening is believing." Why don't you go and see it for yourself?

(Article: Shinji Moribe, Photography: Hideki Ishihara)

A Look Around Innovation, Part 12: Practical Application of "Multiple Sound Spot Synthesis" for Delivering Different Languages in Multiple Areas

Simultaneous narration in four languages — Fusion of sound field control and speech synthesis

A change in arrangement from straight line to circle — It can be divided into up to eight areas

Inquiries from leading domestic and international companies: "Listening is believing"

Recommended

Recent Updates

Most Viewed