On December 9, IBM announced the development of an optical packaging technology that can dramatically improve the learning and inference of generative artificial intelligence (AI) models in data centers. The company created a new process for next-generation co-packaged optics (CPO), a packaging technology that combines optical components and semiconductors, enabling the introduction of optical technology to complement existing short-range electrical wiring in data centers. By designing and assembling the world's first successful polymer waveguide (PWG), the company showed that CPO could redefine broadband transmission between chips, circuit boards, and servers in the computing industry.

Current optical fibers transmit data over long distances at high speeds and support traffic in commerce, telecommunications, and other industries. Data centers use fiber-optic communication technology for external communication networks. However, inter-rack communication within the data center is still performed via electrical wiring. Even when electrical wiring is connected to a graphics processing unit (GPU) accelerator, it remains idle - i.e., it is not running - for more than half the time it is operated. The GPU accelerator thus wastes enormous amounts of money and energy waiting for signals from other devices during large-scale distributed learning processes.
To solve this issue, IBM demonstrated an approach to bring the speed and capacity of optics into the data center. An IBM technical paper reporting on this topic introduces a new CPO prototype module that enables high-speed optical communications. According to this paper, the proposed module can considerably increase the bandwidth of communications within the data center and AI processing power while minimizing the GPU idling time. This can reduce costs associated with the scaling of generation AI. Cable lengths in the data center will be extended from one meter to hundreds of meters while consuming less than one-fifth of the power of mid-range electrical wiring. It can also accelerate AI model training and can train large language models (LLMs) faster than conventional electric wiring by up to five times. The time it takes to train a standard LLM is now 3 weeks instead of 3 months, and performance gains can be achieved using larger models and more GPUs. In addition, it can dramatically improve data center power efficiency. Every AI model trained can save power equivalent to the annual power consumption of 5,000 U.S. households.
The developed CPO technology allows chipmakers to add optical communication interconnects between accelerators, which exceed the performance limits of electrical interconnects, allowing for increased interconnect density between accelerators. Their thesis revealed that combining these new, high-density optical communication structures with wavelength-division multiplexing technology can increase the bandwidth of chip-to-chip communications by up to 80 times compared with electrical wiring.
This innovation will allow chipmakers to route six times as many optical fibers on the end-face of a silicon photonics chip compared to today's state-of-the-art CPO technology. According to the company, this is achieved by coupling 50-µm-pitch high-density PWGs with silicon photonics waveguides using a standard assembly packaging process, which has passed the stress tests required for mass production. Its components have passed high-humidity environmental tests ranging from −40℃ to 125℃, and its optical interconnects have passed mechanical strength tests without sustaining physical damage or data loss. The company also demonstrated high-density PWGs with an 18-µm pitch, and 4 PWGs can be stacked to achieve connections of up to 128 channels.
This article has been translated by JST with permission from The Science News Ltd. (https://sci-news.co.jp/). Unauthorized reproduction of the article and photographs is prohibited.