Globe Newswire LNA World Technology

Pliops Demonstrates Over 5X Acceleration for LLM Inference at SC24

ByE NEWSNovember 14, 20242 Mins read109

Innovative Solution Significantly Accelerates GPU Transactions, Addresses the Critical Power Budget Issue, and Reduces Carbon Emissions for Hyperscalers and Enterprises

XDP LightningAI

SAN JOSE, Calif., Nov. 13, 2024 – Addressing the critical issue of constrained power budgets, Pliops is enabling AI-powered businesses and hyperscalers to achieve impressive performance by optimizing power usage, reducing costs, and shrinking their carbon footprint. Next week at SC24, Pliops will spotlight its innovative XDP LightningAI solution, which enables sustainable, high-efficiency AI operations when paired with GPU servers.

Organizations are increasingly concerned about the lack of power budgets in data centers, particularly as AI infrastructure and emerging AI applications lead to higher energy footprints and strain cooling systems. As they scale their AI operations and add GPU compute tiers, the escalating power and cooling demands, coupled with significant capital investments in GPUs, are eroding margins. A monumental challenge looms as data centers struggle to secure essential power, creating significant pressure for companies striving to expand their AI capabilities.

Pliops knows that efficient infrastructure solutions are essential to address these issues – and the company’s newest Extreme Data Processor (XDP), XDP-PRO ASIC – plus a rich AI software stack and distributed XDP LightningAI nodes – address GenAI challenges by utilizing a GPU-initiated Key-Value I/O interface as a foundation, creating a memory tier for GPUs, below HBM. Pliops XDP LightningAI easily connects to GPU servers by leveraging the mature NVMe-oF storage ecosystem to provide a distributed Key-Value service. Pliops has focused on LLM inferencing, a crucial and rapidly evolving area within the GenAI world that demands significant efficiency improvements. The company’s demo at SC24 is centered around accelerating LLM inferencing applications. This same memory tier is seamlessly applicable for other GenAI applications that Pliops plans to introduce over the next few months.

In today’s LLM inferencing computing, GPU prefill operations are heavily compute-bound and critically determine the batch size. While prefill can fully utilize GPU resources, increasing the batch size beyond a certain point only increases the Time to First Token (TTFT) without improving prefill rate. On the other hand, GPU decode operations are HBM bandwidth-bound and mainly influenced by model and KV cache sizes, benefiting significantly from larger batch sizes through higher HBM bandwidth efficiency. Pliops’ solution improves prefill time, allowing for larger batch sizes without violating user SLA for prefill operations. This enhancement directly affects decode performance as well, as it benefits greatly from the increased batch size. As a result, by improving prefill time, the system achieves nearly proportional improvements in end-to-end throughput.

“By leveraging our state-of-the-art technology, we deliver advanced GenAI and AI solutions that empower organizations to achieve unprecedented performance and efficiency in their AI-driven operations,” said Ido Bukspan, Pliops CEO. “As the industry’s leading HPC technical conference, SC24 is the ideal venue to showcase how our solutions redefine AI infrastructure, enabling faster, more sustainable innovation at scale.”

Highlights at the Pliops booth #1559 on the SC24 show floor of the Georgia World Congress Center include:

Pliops XDP LightningAI running with Dell PowerEdge servers
Pliops XDP enhancements for AI VectorDB

Pliops can also be found at the SC24 PetaFLOP reception at the College Football Hall of Fame on Tuesday, November 19 from 7:00 p.m. to 11:00 p.m. local time.