logo
Nhà Các trường hợp

WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks

Chứng nhận
Trung Quốc Beijing Qianxing Jietong Technology Co., Ltd. Chứng chỉ
Trung Quốc Beijing Qianxing Jietong Technology Co., Ltd. Chứng chỉ
Khách hàng đánh giá
Các nhân viên kinh doanh của Beijing Qianxing Jietong Technology Co., Ltd rất chuyên nghiệp và kiên nhẫn. Họ có thể cung cấp báo giá một cách nhanh chóng. Chất lượng và bao bì của sản phẩm cũng rất tốt. Sự hợp tác của chúng tôi rất suôn sẻ.

—— 《Festfing DV》 LLC

Khi tôi đang tìm kiếm gấp CPU intel và SSD Toshiba, Sandy từ Beijing Qianxing Jietong Technology Co., Ltd đã giúp đỡ tôi rất nhiều và nhanh chóng nhận được sản phẩm tôi cần. Tôi thực sự đánh giá cao cô ấy.

—— Kitty Yen

Sandy của Beijing Qianxing Jietong Technology Co., Ltd là một nhân viên bán hàng rất cẩn thận, người có thể nhắc nhở tôi về lỗi cấu hình kịp thời khi tôi mua máy chủ. Các kỹ sư cũng rất chuyên nghiệp và có thể nhanh chóng hoàn thành quá trình thử nghiệm.

—— Strelkin Mikhail Vladimirovich

Chúng tôi rất hài lòng với trải nghiệm làm việc với Bắc Kinh Qianxing Jietong. Chất lượng sản phẩm tuyệt vời và giao hàng luôn đúng hẹn. Đội ngũ bán hàng của họ chuyên nghiệp, kiên nhẫn và rất hữu ích với tất cả các câu hỏi của chúng tôi. Chúng tôi thực sự đánh giá cao sự hỗ trợ của họ và mong muốn có một mối quan hệ đối tác lâu dài. Rất khuyến khích!

—— Ahmad Navid

Chất lượng: Kinh nghiệm tuyệt vời với nhà cung cấp của tôi. MikroTik RB3011 đã được sử dụng, nhưng nó ở trong tình trạng rất tốt và mọi thứ hoạt động hoàn hảo.và tất cả những lo ngại của tôi đã được giải quyết nhanh chóng- Nhà cung cấp rất đáng tin cậy.

—— Geran Colesio

Tôi trò chuyện trực tuyến bây giờ

WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks

April 10, 2026
WEKA has announced the integration of its NeuralMesh platform with the NVIDIA STX reference architecture, establishing its Augmented Memory Grid as a key building block for next-generation AI infrastructure. The combined solution addresses one of the most significant bottlenecks in large-scale inference environments: memory constraints that directly affect performance, total cost of ownership, and scalable growth.

Operating through NeuralMesh, WEKA’s Augmented Memory Grid expands GPU memory by externalizing and persisting key-value caches. When deployed with NVIDIA STX, this architecture delivers high-throughput context memory storage for agentic AI workloads, supporting long-context reasoning across sessions, tools, and end-to-end workflows. According to the company, configurations combining NVIDIA Vera Rubin NVL72 systems, BlueField-4 DPUs, and Spectrum-X Ethernet can boost context memory token throughput by 4x to 10x. The platform is also projected to deliver at least 320 GB/s read and 150 GB/s write throughput, more than doubling the performance of traditional AI storage architectures.

trường hợp công ty mới nhất về WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks  0

Memory Infrastructure Becomes the Inference Bottleneck


WEKA centers this integration on the growing memory wall challenge in modern AI deployments. Within today’s inference pipelines, limited high-bandwidth GPU memory forces frequent KV cache evictions, leading to repeated recomputation and diminished operational efficiency. As system concurrency rises, these inefficiencies multiply, increasing infrastructure expenses and reducing performance predictability.

The company promotes shared KV cache infrastructure as the solution. By preserving persistent context across users and sessions, shared caching eliminates redundant processing and stabilizes token throughput. NVIDIA STX provides the validated reference architecture for this model, while WEKA delivers the storage and memory extension layer.

NeuralMesh and Augmented Memory Grid Architecture


NeuralMesh acts as WEKA’s distributed storage platform, built to integrate seamlessly across the full NVIDIA STX stack. It delivers high-performance data services optimized for AI workloads, while the Augmented Memory Grid serves as a dedicated memory expansion layer that consolidates KV cache outside of GPU memory.

This design allows inference environments to sustain long-context sessions without overloading GPU resources. By retaining cache state and enabling reuse across workloads, the platform maintains high utilization and consistent performance as deployments scale.

WEKA notes that the Augmented Memory Grid, first unveiled at GTC 2025 and now generally available, has been validated on NVIDIA Grace CPU platforms paired with BlueField DPUs. The architecture delivers measurable gains in inference efficiency, including drastically faster time-to-first-token, higher per-GPU token throughput, and stable performance under increased concurrency. Offloading the data path to BlueField-4 also reduces CPU overhead and alleviates I/O bottlenecks.

Performance and Efficiency Gains


In production-like environments, the platform is engineered to enhance responsiveness and infrastructure efficiency. WEKA states that the Augmented Memory Grid can reduce time-to-first-token by 4x to 20x, while increasing per-GPU token output by up to 6.5x. These improvements stem from higher KV cache hit rates and fewer recomputation cycles, enabling systems to maintain performance as context sizes and user counts expand.

Firmus, an AI infrastructure provider, is highlighted as an early adopter leveraging NeuralMesh with NVIDIA-based infrastructure. The firm reports improved token throughput and lower latency at scale, with gains coming from more efficient use of existing GPUs rather than additional hardware deployments.

Implications for AI Infrastructure Design


This integration highlights a shift in AI system design, where memory and storage strategies increasingly define overall performance and cost efficiency. As agentic AI workloads expand and context windows widen, DRAM-only approaches become unsustainable due to rising recomputation costs and underutilized GPUs.

WEKA positions persistent, shared KV cache as a foundational capability for AI factories. Organizations adopting this model can achieve higher GPU utilization, lower energy consumption per inference task, and more predictable scaling. In contrast, environments relying exclusively on local GPU memory will likely face rising operational costs and diminishing returns as workloads grow.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Chi tiết liên lạc
Beijing Qianxing Jietong Technology Co., Ltd.

Người liên hệ: Ms. Sandy Yang

Tel: 13426366826

Gửi yêu cầu thông tin của bạn trực tiếp cho chúng tôi (0 / 3000)