LG AI Research taps FuriosaAI to get 2.25X better LLM inference performance vs GPUs

LG AI Research has used chips from FuriosaAI to achieve 2.25 times better large language model (LLM) inference performance vs. graphics processing units (GPUs).

FuriosaAI said its RNGD accelerator has successfully passed rigorous performance tests with LG AI Research’s ExaOne models.

Following this successful validation, FuriosaAI and LG AI Research will partner to supply RNGD servers to enterprises utilizing ExaOne across key sectors like electronics, finance, telecommunications, and biotechnology for diverse applications.

Because it can be deployed in a wide range of settings, RNGD addresses the growing need for sovereign AI by offering an integrated hardware and software solution that allows enterprises like LG AI Research to own and control their AI stack and deploy advanced LLMs.

FuriosaAI said the biggest barrier to scaling AI is the unsustainable power consumption and costs associated with traditional GPU hardware (which is the leading AI processing solution from companies like Nvidia).

Today, FuriosaAI said it is pleased to announce a major step toward solving this challenge. FuriosaAI’s RNGD (pronounced “Renegade”) accelerator has successfully passed rigorous performance tests with LG AI Research’s ExaOne models.

Following this successful deployment – which delivered high performance, met low-latency service requirements, and achieved significant improvements in energy efficiency compared to previous GPU solutions – RNGD Server solution is now available to enterprise customers leveraging LLMs.

These include the diverse spectrum of LG businesses across electronics, chemicals, and telecommunications. This showcases real-world enterprise GenAI deployments and provides a powerful reference use case for other global enterprises.

“After extensively testing a wide range of options, we found RNGD to be a highly effective solution for deploying ExaOne models. RNGD provides a compelling combination of benefits: excellent real-world performance, a dramatic reduction in our total cost of ownership, and a surprisingly straightforward integration,” said Kijeong Jeon, product unit leader at LG AI Research, in a statement. “For a project of this scale and ambition, the entire process was quite impressive.”

Testing power consumption as well as performance

LG AI Research first announced plans two years ago to evaluate RNGD and assess the accelerator’s efficiency and, if successful, integrate RNGD into various ExaOne-based services across LG. We unveiled RNGD, which leverages its unique Tensor Contraction Processor chip architecture to deliver up to 512 TFLOPS of FP8 performance with a Thermal Design Power (TDP) of just 180W, last summer at Hot Chips and began sampling with customers last fall.

RNGD Server aggregates the power of eight RNGD accelerators into a single, air-cooled 4U chassis, enabling high compute density. Up to five RNGD Server Systems can be deployed within a single, standard 15kW air-cooled rack.

LG AI Research has decided to leverage RNGD to ensure power efficiency, cost-effectiveness, and scalability when delivering its LLM services. They evaluated RNGD for its ability to meet demanding, real-world benchmarks using 7.8-billion-parameter and 32-billion-parameter versions of EXAONE 3.5, both available with 4K and 32K context windows.

Performance and efficiency results

LG AI Research’s direct, real-world comparison demonstrates a fundamental leap in the economics of high-performance AI inference.

RNGD achieved 2.25x better performance per watt for LLMs compared to a GPU-based solution.

A RNGD-powered rack can generate 3.75x more tokens for ExaOne models compared to a GPU rack operating within the same power constraints.

Using a single server with four RNGD cards and a batch size of one, LG AI Research ran the ExaOne 3.5 32B model and achieved 60 tokens/second with a 4K context window and 50 tokens/second with a 32K context window.

Deployment and integration

After installing RNGD hardware at its Koreit Tower data center, LG AI Research collaborated with the FuriosaAI team to launch an enterprise-ready solution. They successfully optimized and scaled ExaOne 3.0, 3.5, and now 4.0 models, progressing from a single card to two-card, four-card, and then eight-card server configurations. To achieve this, FuriosaAI said it applied tensor parallelism not only across multiple processing elements but also across multiple RNGD cards.

To maximize the performance of tensor parallelism, FuriosaAI optimized PCIe paths for peer-to-peer (P2P) communication, communication scheduling, and compiler tactics to overlap inter-chip DMA operations with computation. In addition, FuriosaAI utilized the global optimization capabilities of Furiosa’s compiler to maximize SRAM reuse between transformer blocks.

This successful integration highlights the maturity and ease-of-use of the software stack, including the vLLM-compatible Furiosa-LLM serving framework. The migration demonstrates the platform’s programmability and simplified optimization process.

It also showcases key advantages required in real-world service environments, such as support for an OpenAI-compatible API server, monitoring with Prometheus metrics, Kubernetes integration for large-scale deployment in cloud-native environments, and easy deployment through a publicly available SDK.

LG AI Research’s ChatExaOne, an ExaOne-powered Enterprise AI Agent, provides robust capabilities including document analysis, deep research, data analysis, and Retrieval-Augmented Generation (RAG).

Moving forward, LG AI Research plans to expand ChatEXAONE’s availability to external clients, utilizing RNGD to facilitate this expansion.

Next Steps for RNGD and LG AI Research

Furiosa and LG AI Research are committed to enabling businesses to deploy advanced models and agentic AI sustainably, scalably, and economically.

Furiosa has added support for EXAONE 4.0 and is working with LG AI Research to develop new software features, expand to additional customers and markets, and provide a powerful and sustainable AI infrastructure stack for advanced AI applications.

LG AI Research aims to continue optimizing RNGD software and hardware for EXAONE models around specific business use cases.

Testing power consumption as well as performance

Performance and efficiency results

Deployment and integration

Next Steps for RNGD and LG AI Research

Dean Takahashi