Hello Guest

Sign In / Register

Welcome,{$name}!

/ Logout
English
EnglishDeutschItaliaFrançais한국의русскийSvenskaNederlandespañolPortuguêspolskiSuomiGaeilgeSlovenskáSlovenijaČeštinaMelayuMagyarországHrvatskaDanskromânescIndonesiaΕλλάδαБългарски езикGalegolietuviųMaoriRepublika e ShqipërisëالعربيةአማርኛAzərbaycanEesti VabariikEuskera‎БеларусьLëtzebuergeschAyitiAfrikaansBosnaíslenskaCambodiaမြန်မာМонголулсМакедонскиmalaɡasʲພາສາລາວKurdîსაქართველოIsiXhosaفارسیisiZuluPilipinoසිංහලTürk diliTiếng ViệtहिंदीТоҷикӣاردوภาษาไทยO'zbekKongeriketবাংলা ভাষারChicheŵaSamoaSesothoCрпскиKiswahiliУкраїнаनेपालीעִבְרִיתپښتوКыргыз тилиҚазақшаCatalàCorsaLatviešuHausaગુજરાતીಕನ್ನಡkannaḍaमराठी
Home > News > US AI chip "black horse" to kill, the largest chip in history

US AI chip "black horse" to kill, the largest chip in history

According to a number of foreign media reports, recently, the US AI chip startup CerebrasSystems launched the largest chip ever, the chip called "TheCerebrasWaferScaleEngine" (hereinafter referred to as WSE) has 1.2 trillion transistors.

In chip history, in 1971, Intel's first 4004 processor had only 2,300 transistors, and the most recent advanced micro-device processor had only 32 billion transistors. Samsung has also built a flash memory chip (eUFS chip) with 2 trillion transistors, but it is not suitable for AI computing.

WSE, the largest record-breaking chip, was born for AI computing.

The data shows that the 42,225-square-mm chip has 400,000 cores that are connected together by a fine-grained, all-hardware intra-mesh networked communication network that provides 100 PB of total bandwidth per second. More cores, more local memory, and a low-latency, high-bandwidth architecture create the best architecture for accelerating AI work. WSE is 56.7 times larger than the largest GPU and has 18GB of on-chipsram.

In fact, most of today's chips are multi-chip integrations based on 12-inch silicon. But the chip from CerebrasSystems is a separate chip in which transistors are interconnected on a single crystal silicon wafer. The interconnected design allows all transistors to operate at high speeds as a whole.

Explicitly explained, this product is completely computer learning, more than computing power and storage bandwidth, sorry, people's level or new vocabulary - beat bytes (Petabytes, 1PB = 1024TB = 10^6GB = 2^50bit ), the speed is about 3,000 times that of NVIDIA's largest graphics processor (GPU, floating-point computing power, commonly used in AI related research), and the storage bandwidth is 1000 times.

Such a powerful capability comes from its 1.2 trillion transistors on the chip. It is known that Intel’s 4004 processor had 2,300 transistors in 1971. According to Moore’s Law, “every 18 months, the number of transistors on the chip doubles,” By this year, there should be exactly 1 trillion transistors and one more transistor, and the computing power that can be realized is increased by one point. Secondly, its chip architecture design and chip interconnect and communication scheme are also very advanced, making the synergy between 1.2 trillion transistors very synchronous, delaying the nanosecond level. At runtime, this 1.2 trillion transistors is like A transistor is synchronised.

In the field of artificial intelligence, the size of the chip is very important. Because large chips process information faster, the time to generate answers is shorter. Reducing the time of observation, or "training time," allows researchers to test more ideas, use more data, and solve new problems. Google, Facebook, OpenAI, Tencent, Baidu, and many others believe that the fundamental limitation of the development of artificial intelligence today is that it takes too long to train the model. Therefore, reducing training time will eliminate a major bottleneck in the industry's progress.

Of course, chip makers usually don't produce large chips for a reason. On a single wafer, some impurities usually appear during the manufacturing process. A little impurity can cause chip failure, and even severely break down several chips. If only one chip is fabricated on a single wafer, the probability of it containing impurities is 100%, and impurities will definitely cause the chip to fail. But the chip design of CerebrasSystems is marginal, ensuring that one or a small amount of impurities will not invalidate the entire chip.

Cerebras Systems' CEOFeldman said in a statement, "The company's WSE chip is designed for artificial intelligence and contains basic innovations that solve technical challenges that limit chip size for decades, such as cross-connection*, yield, power output. And packaging. Every architectural decision is to optimize the performance of artificial intelligence work. As a result, the WSE chip provides hundreds or thousands of times of existing solutions depending on the workload, with little power and space. performance."

These performance improvements are achieved by accelerating all the elements of neural network training. The neural network is a multi-level computational feedback loop. The faster the input passes through the loop, the faster the loop learns or "trains". The way to get input through the loop faster is to speed up the computation and communication within the loop.

In the communication architecture, the cluster communication architecture breaks through the bandwidth and delay caused by part of the power consumption in the traditional communication technology due to the use of the relay processor on the WSE. By using a two-dimensional array structure to connect 400,000 WSE-based processors together, the cluster architecture achieves low latency and high bandwidth, with an overall bandwidth of up to 100 beats per second (1017 bytes per second). . Even if no additional software is installed, such a cluster structure can support global information processing, and the received information is processed by the corresponding processor.

For this product, mass production and heat dissipation may be the main challenges. However, the advent of WSE, its own highlights are enough.

Linley Group Principal Analyst Linley Gwennap said in a statement: "CerebrasSystems has made great strides in wafer-scale package technology, and the processing performance on a silicon chip far exceeds anyone's imagination. Achieving this feat, the company has solved a series of engineering challenges that have plagued the industry for decades, including enabling high-speed die-to-mode communication, solving manufacturing defects, packaging such large chips, providing high-density power supplies and cooling systems. CerebrasSystems Bringing together top engineers from different disciplines, creating new technologies and delivering a product in just a few years is an impressive achievement."

TiriasResearch principal analyst and founder Jim McGregor said in a statement: "So far, the reconfigured graphics processor has met the huge demand for artificial intelligence for computing power. Today's solution will have hundreds of these reconfigured graphics. The processors are connected together and take months to install, use hundreds of kilowatts of power, and extensively modify the artificial intelligence software, even months to achieve functionality. In contrast, single-chip WSE The absolute size of the chip enables more computation, higher performance memory, and greater bandwidth. WSA chips avoid loose connections, slow memory, cache-based, and integration through wafer-scale package integration techniques. The traditional performance limitations inherent in graphics-centric processor chips.

Founded in 2016, CerebrasSystems has been a mysterious and low-key in the industry since its inception, focusing on providing data products for data center training. It has been named "the world's most anticipated 100 chip companies" by CBInsights. According to the data, the company completed a $25 million Series A financing in 2016. The investor was a well-known venture capitalist Benchmark, and later received multiple rounds of financing. As of September 2017, it received a total of $112 million in financing, valued at $860 million.

The background of the company's founding team is also very strong. Co-founder and CEO Andrew Feldman, who founded the chip company SeaMicro, was acquired by AMD in 2012 for $334 million. After SeaMicro was acquired by AMD, the original classmates mostly entered AMD to continue their work, so when Andrew Feldman took the lead to continue his business, many old colleagues chose to follow, and most of the other major team members were mostly with the founder Andrew Feldman.

One of the things worth mentioning is Gary Lauterbach. In the 1990s, when Sun was in the midst of the day, Gary Lauterbach served as the company's senior chip designer. Later, at SeaMicro, he was mainly engaged in low-power server design. It can be said that the company accumulated a large number of low-power consumption at the beginning of the creation. The veteran of the chip design, this is undoubtedly a win for the average startup.

Then, in 2018, another heavyweight joined Celebras Systems, and former vice president of architecture and data center CTO Dhiraj Mallick officially served as vice president of engineering and business. During his tenure at Intel, the second quarter of 2018 revenue increased by $1 billion year-on-year. In the first half of 2018, the company's data center revenue was raised to $10 billion. It is a recognized technology and business genius. He is also an old colleague of AndrewFeldman at SeaMicro and AMD. The company now has 194 employees.

CerebrasSystems has a long way to go in the future, but it is not difficult to imagine that AI is bringing a wave of computer architecture and chip packaging technology. We can expect that we will witness more interesting and even unexpected AI chips.