Friday, June 9, 2023
HomeUncategorizedBroadcom Enhances AI and ML with Tomahawk 5

Broadcom Enhances AI and ML with Tomahawk 5


Can’t you attend Transform 2022? Check out all the summit sessions in our on-demand library now! look here.


Artificial Intelligence (AI) and Machine Learning (ML) are not just algorithms: Correct Hardware acceleration of your AI and ML calculations is key.

To speed up job completion, AI and ML training clusters require high bandwidth and reliable transport with predictable low tail latency (tail latency is 1% or 2% of tracking jobs ) and the rest of the replies). High-performance interconnects optimize data center and high-performance computing (HPC) workloads in a combination of hyperconverged AI and ML training clusters, resulting in lower latency for better model training, improved packet utilization, and lower operational costs.

As AI and ML training jobs become more prevalent, it is critical to have higher cardinality switches, which allow for lower latency and power consumption, and higher ports speed to build larger training clusters with flat network topologies.

Ethernet switching for performance optimization

While data center network bandwidth requirements continue to rise sharply, there is also a strong push to move general-purpose computing and storage infrastructure Combined with an optimized AI and ML training processor. As a result, AI and ML training clusters, where you can specify multiple machines for training, are driving the need for fabrics with high bandwidth connectivity, high cardinality, and completing work faster while operating at high network utilization.

Event

MetaBeat 2022

MetaBeat will Gathering thought leaders in San Francisco on October 4 to provide guidance on how Metaverse technologies are changing the way all industries communicate and do business, CA.

Register here

To speed up job completion, efficient load balancing for high network utilization and congestion control mechanisms for predictable tail latency. Virtualization and efficient data infrastructure combined with powerful hardware can also improve CPU offloading and assist network accelerators to improve neural network training.

The Ethernet-based infrastructure is now a unified network. They combine low power consumption, high bandwidth and radix, with the fastest serializer and deserializer (SerDes) speeds, with predictable bandwidth doubling every 18 to 24 months. With these advantages and its vast ecosystem, Ethernet can provide the highest performance-per-watt and per-dollar interconnect for AI and ML and cloud-scale infrastructure.

According to IDC, the worldwide Ethernet switch market grew 12.7% year over year to $7.6 billion in the first quarter of 2022 (Q1 2022). Broadcom offers the Tomahawk family of Ethernet switches to support the next generation of unified networks.

Today, San Jose-based Broadcom announced the StrataXGS Tomahawk 5 switch family, which provides 51.2 Tbps of Ethernet switching capacity in a single Twice as much

“The Tomahawk 5 has twice the capacity of the Tomahawk 4. So it’s one of the fastest switching chips in the world,” said Ram Velaga, Broadcom Core Switching Senior Vice President and General Manager of the Division. “New additions of specific features and capabilities to optimize AI and ML network performance make [the] Tomahawk 5 twice as fast as previous versions.”

Tomahawk 5 switch chip is designed to help data center and HPC environments to accelerate AI and ML capabilities. The switch chips use a Broadcom approach called cognitive routing, an advanced shared packet buffering, programmable in-band telemetry, and hardware-based link failover built into the chip.

Cognitive Routing optimizes network link utilization by automatically selecting the link with the least system load for each flow through the switch. This is especially important for AI and ML workloads, which often combine short- and long-term high-bandwidth streams with low entropy.

“Cognitive routing is a step beyond adaptive routing,” Villaga said. “When using adaptive routing, you only know about the data congestion between two points, but not the other side,” Velaga said.

Cognitive routing, he added, could allow the system to learn the best path beyond the next neighbor to reroute to provide better load balancing while avoiding congestion.

Tomahawk 5 includes real-time dynamic load balancing, which monitors the switch and the usage of all links downstream of the network to determine the best path for each flow. It also monitors the status of hardware links and automatically redirects traffic from failed connections. These features increase network utilization and reduce congestion, resulting in faster job completion times.

The Future of Ethernet for AI and ML Infrastructure

Ethernet has the characteristics required for high performance AI and ML training clusters: high bandwidth, end-to-end congestion management, Load balancing and fabric management at a lower cost than contemporaries such as InfiniBand.

It is clear that Ethernet is a powerful ecosystem that continues to grow with rapid innovation. Broadcom has shown that it will continue to improve its Ethernet switches to keep pace with innovation in the AI ​​and ML industries, and to continue to be part of HPC infrastructure in the future.

The Mission of VentureBeat will be a digital town square for technology decision makers to acquire knowledge about transformative enterprise technologies and transactions. Learn more about membership.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

LAST NEWS

Featured NEWS