Arm replaces CPU and GPU flagships, and more

Arm has announced flagship processors for phones: Cortex-X1 CPU, Cortex-A78 CPU, Mali-G78 GPU and Ethos-N78 neural network processor (NPU).

Offered as part of Arm’s custom programme, Cortex-X1 is the most powerful Cortex processor yet, according to the company, with 30% greater peak performance over the current Cortex-A77 CPU, as well as 22% single-thread integer performance improvements over the just-announced Cortex-A78.

Arm-Cortex-X1“This short high-performance burst is best for reactivity and responsiveness when using devices, enabling the highest performance ever for smartphones and large screen devices,” said Arm. “Furthermore, Cortex-X1 offers 2x machine learning performance improvements over Cortex-A77. This is part of our wider push for more local compute performance.” (see diagram right – note memory differences)

Cortex-A78 is tweaked for efficiency: “unquestionably our most efficient Cortex-A CPU ever designed for mobile”, according to the company. It has the same architecture as Cortex-A77, with modified micro-architecture to increase performance/W and performance/area. “We have maximized efficiency through reducing structures that have low performance and area, such as on the L1-I and L1-D caches,” said Arm. “We have then optimised existing structures to consume less power, such as the brand prediction structures. This leads to 4% less power and 5% less area.”



20% increase in sustained performance over Cortex-A77-based devices within a 1W power budget, is claimed.

Cortex-X1 Corted-A78 ArmCortex-X1 is a higher performance processor because it has more resources compared to Cortex-A78: its decode bandwidth it 25% up, to five instructions decoded per cycle (diagram left), and MOP cache throughput has been increased by 33 % to 8 MOP/cycle.

On Cortex-X1, the Neon engine gets two additional pipes, doubling its compute capacity over Cortex-A78, then Cortex-X1 supports 64kbyte L1 and up to 1Mbyte L2 cache.

DynamIQ cluster was a cluster of four Cortex-A77 CPUs and four Cortex-A55 CPUs.

Arm-Cortex-X1 cluster, compared with 4by A78 and 4MB L3Performance comparison with 4x Cortex-A78 and 4Mbyte of L3 cache

This can now be upgraded to four Cortex-A78 CPUs and four Cortex-A55 CPUs which “provides 20% sustained performance improvements in 15% less area”, said Arm, and then: “Cortex-X1 DynamIQ cluster has been upgraded to now support 8Mbyte of L3 for ultimate performance. This larger L3 can also be used by Cortex-A78 when used in conjunction with Cortex-X1” – that latter combinations appears to be 3x A78 and 1x X1.

Mali-G78 

Like the Mali-G77 GPU, Mali-G78 has Arm’s Valhall architecture, but is said to deliver 25% more graphics performance.

“With support for up to 24 cores, these advances are made possible via asynchronous top level, tiler enhancements, and improved fragment dependency tracking,” according to ARM, which, at the same time as Mali-G78, announced Mali-G68 “The first in a new sub-premium tier of GPUs, which supports up to six cores and inherits all the latest Mali-G78 features.”

Ethos-N78 

Ethos-N78 is a neural networking processor that compared with the Ethos-N77 delivers “greater on-device ML capabilities and up to 25% more performance efficiency. Ethos-N78 also offers unprecedented levels of configurability, with available configurations starting at 1Top/s on up to 10Top/s,” said Arm.

 

All performance comparisons above come with a list of caveats which, to Arm’s credit, it details on its website.


Leave a Reply

Your email address will not be published. Required fields are marked *

*