A few days ago, MIT researchers developed a new "photon" chip that uses light instead of electricity and consumes relatively little power in the process. The chip is used to process large-scale neural networks with millions of times more efficiency than existing computers. The simulation results show that the photonic chip runs 10 million times more efficiently than the electronic chip. Neural networks are machine learning models that are widely used for robot target recognition, natural language processing, drug development, medical imaging, and driving unmanned vehicles. New optical neural networks that use optical phenomena to accelerate computations can operate faster and more efficiently than other electronic counterparts. But as traditional neural networks and optical neural networks become more complex, they consume a lot of energy. To solve this problem, researchers and major technology companies including Google, IBM, and Tesla have developed "Artificial Intelligence Accelerators," a specialized chip that improves the speed and efficiency of training and testing neural networks.

10 million times lower than the energy limit of traditional electron accelerators

The simulation training of neural network on the MNIST image classification dataset shows that the accelerator can theoretically process the neural network, which is 10 million times lower than the energy limit of the traditional electron accelerator and 1000 times lower than the energy limit of the photon accelerator. Researchers are now working on a prototype chip to test the results.

“People are looking for a technology that can calculate beyond the basic energy limit,” said Ryan Hamerly, a postdoctoral fellow at the Electronic Research Laboratory. “The photon accelerator is promising... but our motivation is to build a (photon accelerator) Expanded to large neural networks."

Practical applications of these technologies include reducing energy consumption in the data center. "The demand for data centers running large neural networks is growing, and as demand grows, it becomes more and more difficult to calculate," said Alexander Sludds, a co-author and graduate student in the electronics research lab. Network hardware meets computing needs... to address bottlenecks in energy consumption and latency."

Co-authored with Sludds and Hamerly: RLE graduate student, co-author Liane Bernstein; MIT physics professor Marin Soljacic; MIT associate professor of electrical engineering and computer science Dirk Englund; a RLE researcher, and Head of the Quantum Photonics Laboratory.

Rely on a more compact, energy-efficient "photovoltaic" solution

The neural network processes the data through a number of computational layers containing interconnected nodes (called "neurons") to find patterns in the data. The neuron receives input from its upstream "neighbor" and computes an output signal that is sent to further downstream neurons. Each input is also assigned a "weight", a value based on its relative importance to all other inputs. As data spreads "in-depth" across layers, the network is learning more complex information. Finally, the output layer generates a prediction based on the calculation of the entire layer.

The goal of all artificial intelligence accelerators is to reduce the energy required to process and move data in a particular linear algebraic step in a neural network called "matrix multiplication." There, neurons and weights are encoded into separate rows and lists, which are then combined to calculate the output.

In a conventional photon accelerator, the pulsed laser encodes information about each neuron in a layer and then flows into the waveguide and through the beam splitter. The resulting optical signal is fed into a square optical element grid called a "Mach-Zehnder Interferometer" that is programmed to perform matrix multiplication. The interferometer encodes with each weight of information and uses the signal interference technique that processes the optical signal and weight values to calculate the output of each neuron. But there is a scaling problem: for each neuron, there must be a waveguide, and for each weight there must be an interferometer. Since the amount of weight is proportional to the number of neurons, those interferometers take up a lot of space.

"You will soon realize that the number of input neurons will never exceed 100 or so, because you can't install as many components on the chip," Hamerly said. "If your photon accelerator can't handle more than 100 layers per layer." Neurons, it is difficult to apply large neural networks to this structure."

The researchers' chips rely on a more compact, energy-efficient "photovoltaic" scheme that uses optical signals to encode the data, but uses "balanced homodyne detection" for matrix multiplication. This is a technique for generating a measurable electrical signal after calculating the product of the amplitude (wave height) of two optical signals.

The optical pulse-encoded information input and output neurons of each neural network layer - used to train the network - flow through a single channel. Individual pulses encoded with the entire row of weight information in the matrix multiplication table flow through separate channels. The neuron and weight data are transmitted to the optical signal of the homodyne photodetector grid. The photodetector uses the amplitude of the signal to calculate the output value of each neuron. Each detector inputs an electrical output signal for each neuron into a modulator that converts the signal back into a light pulse. The light signal becomes the input to the next layer, and so on.

This design requires only one channel per input and output neuron, and requires only as many homodyne photodetectors as the neuron, without the need for weight. Because the number of neurons is always much less than the weight, this saves a lot of space, so the chip can be extended to a neural network with more than one million neurons per layer.

Find the best location

With a photon accelerator, there is inevitable noise in the signal. The more light that is injected into the chip, the less noise and the higher the accuracy - but it can be very inefficient. The less input light, the higher the efficiency, but it will have a negative impact on the performance of the neural network. But there is a "best point," Bernstein said, which uses the smallest optical power while maintaining accuracy.

The optimal position of the artificial intelligence accelerator is measured by how many joules are required to perform a single operation of multiplying two numbers (such as matrix multiplication). Today, traditional accelerators are measured with picojoules or terajoules. The photon accelerator is measured at attojoules and is one million times more efficient. In the simulation, the researchers found that their photon accelerators can operate at less than attojoules. "Before losing accuracy, you can send some minimum optical power. The basic limits of our chips are much lower than traditional accelerators..... and lower than other photon accelerators," Bernstein said.

For electronic chips, including most artificial intelligence accelerators, there is a theoretical minimum power consumption limit. Recently, MIT researchers began developing photon accelerators for optical neural networks. These chips are orders of magnitude more efficient, but they rely on bulky optical components that limit their use in relatively small neural networks.

In a paper published in Physical Review X, MIT researchers described a new type of photon accelerator that uses more compact optics and optical signal processing techniques to dramatically reduce power consumption and die area. This allows the chip to scale to the neural network, which is orders of magnitude larger than the corresponding chip.

In a paper published in Physical Review X, MIT researchers described a new type of photon accelerator that uses more compact optics and optical signal processing techniques to dramatically reduce power consumption and die area. This allows the chip to scale to the neural network, which is orders of magnitude larger than the corresponding chip.

10 million times lower than the energy limit of traditional electron accelerators

The simulation training of neural network on the MNIST image classification dataset shows that the accelerator can theoretically process the neural network, which is 10 million times lower than the energy limit of the traditional electron accelerator and 1000 times lower than the energy limit of the photon accelerator. Researchers are now working on a prototype chip to test the results.

“People are looking for a technology that can calculate beyond the basic energy limit,” said Ryan Hamerly, a postdoctoral fellow at the Electronic Research Laboratory. “The photon accelerator is promising... but our motivation is to build a (photon accelerator) Expanded to large neural networks."

Practical applications of these technologies include reducing energy consumption in the data center. "The demand for data centers running large neural networks is growing, and as demand grows, it becomes more and more difficult to calculate," said Alexander Sludds, a co-author and graduate student in the electronics research lab. Network hardware meets computing needs... to address bottlenecks in energy consumption and latency."

Co-authored with Sludds and Hamerly: RLE graduate student, co-author Liane Bernstein; MIT physics professor Marin Soljacic; MIT associate professor of electrical engineering and computer science Dirk Englund; a RLE researcher, and Head of the Quantum Photonics Laboratory.

Rely on a more compact, energy-efficient "photovoltaic" solution

The neural network processes the data through a number of computational layers containing interconnected nodes (called "neurons") to find patterns in the data. The neuron receives input from its upstream "neighbor" and computes an output signal that is sent to further downstream neurons. Each input is also assigned a "weight", a value based on its relative importance to all other inputs. As data spreads "in-depth" across layers, the network is learning more complex information. Finally, the output layer generates a prediction based on the calculation of the entire layer.

The goal of all artificial intelligence accelerators is to reduce the energy required to process and move data in a particular linear algebraic step in a neural network called "matrix multiplication." There, neurons and weights are encoded into separate rows and lists, which are then combined to calculate the output.

In a conventional photon accelerator, the pulsed laser encodes information about each neuron in a layer and then flows into the waveguide and through the beam splitter. The resulting optical signal is fed into a square optical element grid called a "Mach-Zehnder Interferometer" that is programmed to perform matrix multiplication. The interferometer encodes with each weight of information and uses the signal interference technique that processes the optical signal and weight values to calculate the output of each neuron. But there is a scaling problem: for each neuron, there must be a waveguide, and for each weight there must be an interferometer. Since the amount of weight is proportional to the number of neurons, those interferometers take up a lot of space.

"You will soon realize that the number of input neurons will never exceed 100 or so, because you can't install as many components on the chip," Hamerly said. "If your photon accelerator can't handle more than 100 layers per layer." Neurons, it is difficult to apply large neural networks to this structure."

The researchers' chips rely on a more compact, energy-efficient "photovoltaic" scheme that uses optical signals to encode the data, but uses "balanced homodyne detection" for matrix multiplication. This is a technique for generating a measurable electrical signal after calculating the product of the amplitude (wave height) of two optical signals.

The optical pulse-encoded information input and output neurons of each neural network layer - used to train the network - flow through a single channel. Individual pulses encoded with the entire row of weight information in the matrix multiplication table flow through separate channels. The neuron and weight data are transmitted to the optical signal of the homodyne photodetector grid. The photodetector uses the amplitude of the signal to calculate the output value of each neuron. Each detector inputs an electrical output signal for each neuron into a modulator that converts the signal back into a light pulse. The light signal becomes the input to the next layer, and so on.

This design requires only one channel per input and output neuron, and requires only as many homodyne photodetectors as the neuron, without the need for weight. Because the number of neurons is always much less than the weight, this saves a lot of space, so the chip can be extended to a neural network with more than one million neurons per layer.

Find the best location

With a photon accelerator, there is inevitable noise in the signal. The more light that is injected into the chip, the less noise and the higher the accuracy - but it can be very inefficient. The less input light, the higher the efficiency, but it will have a negative impact on the performance of the neural network. But there is a "best point," Bernstein said, which uses the smallest optical power while maintaining accuracy.

The optimal position of the artificial intelligence accelerator is measured by how many joules are required to perform a single operation of multiplying two numbers (such as matrix multiplication). Today, traditional accelerators are measured with picojoules or terajoules. The photon accelerator is measured at attojoules and is one million times more efficient. In the simulation, the researchers found that their photon accelerators can operate at less than attojoules. "Before losing accuracy, you can send some minimum optical power. The basic limits of our chips are much lower than traditional accelerators..... and lower than other photon accelerators," Bernstein said.