Google Cloud now offers a fully managed version of the Lustre parallel file system. The Google Cloud Managed Lustre service went live (“general availability”) globally on July 8.
An open source, high-performance file system, Lustre has long been used in high-performance computing (or supercomputing) systems, for large-scale work such as weather modelling.
Google is presenting its Lustre offering as a lower-headache way of running those supercomputing jobs. And Lustre’s ability to stream data in the range of terabytes per second should also make it appealing to large-scale AI jobs as well. Those hungry GPUs need feeding; you can’t get to artificial general intelligence without plenty of data.
This service offers a 1TB/s read throughput per namespace, with less than a millisecond of latency.
Deploys can start at 18TiB — a tebibyte is 2 to the 40th bytes (or very approximately 1TB) — and can scale up to 8PiB or more.
With this release, Google Cloud has caught up with other cloud providers in offering a cloud-based Lustre. It competes with Azure Managed Lustre, AWS‘s Amazon FSx and Oracle‘s OCI File Storage with Lustre.
But this service comes with some special sauce: It heavily relies on EXAScaler, a Lustre deployment optimized for throughput and scalability, courtesy of a partnership with DDN.
“By integrating DDN’s enterprise-grade data platforms and Google’s global cloud capabilities, organizations can readily access vast amounts of data and unlock the full potential of AI,” suggested NVIDIA Director of Accelerated Computing Products Dave Salvator, in a statement.
In other words, Lustre could help keep those NVIDIA GPUs busy.
The Lustre Way
Created by Carnegie Mellon researcher Peter J. Braam in 1999 specifically for HPC systems, Lustre addressed supercomputing’s need for extremely high throughput. At the time, traditional network file systems, such as NFS, had difficulty scaling to the levels required by HPC.
Lustre is a parallel file system: The files themselves are sharded across multiple servers for faster access, and each server can deliver its bits simultaneously.
Metadata for each file (file locations, permissions, directory hierarchies, etc.) is kept separately on a metadata server, which also controls the file system operations like creating and deleting files and eliminating I/O bottlenecks at the server level.
How Much Does It Cost?
Google’s Lustre service is globally available as four different throughput tiers: 125 MB/s, 250 MB/s, 500 MB/s and 1000 MB/s for each TB of data. Each namespace can be as large as 1PB.
A Managed Lustre instance is billed in 1-second increments, based on the instance’s provisioned capacity. As an example, storing a gibibyte on the 125MB/s tier will run $0.145 a month.
There’s a whole page dedicated to pricing:

Sample managed Lustre pricing list (Google).
The managed Lustre is natively integrated with other Google Cloud services, such as Google Cloud Compute Engine, Google Kubernetes Engine (GKE), IAM, VPC Service Controls and Google Cloud’s Vertex AI platform.
The Google HPC Stack
At Google Next 25, Wyatt Gorman, Google solutions manager for HPC and AI infrastructure, further discussed the managed Lustre offering.
Google set up the service to keep “the scratch space as close to the compute as possible,” he said.
The managed Lustre offering provides persistent scratch space, though at the conference, Gorman also spoke about Parallelstore, a complementary service built on the open source DAOS project that would provide super-speedy non-persistent scratch space.

The ideal Google HPC stack would also include H4D CPU VMs (now in preview), which are built on AMD’s EPYC processors and RDMA-enabled, as well as the Terraform-based Cluster Toolkit for repeatable deployments.
For more about Google’s HPC stack, check out the entire Google Next HPC presentation.
The post Google Brings the Lustre Parallel File System to Its Cloud appeared first on The New Stack.
The high-performance computing file system could find a second AI users hungry for 'tebibytes' of fast data.