High performance computer (HPC) does not refer to a single fast computer but rather a set of computers that work together to run various complex computations, such as DNA analysis, physics simulations, and more. ITS has been working for several years to develop a true HPC environment so that Pomona faculty and staff have every advantage in their scholarly work. As we continue to expand our HPC environment, we are fast approaching our storage capacity.
Normally, when you need more storage on your computer, you buy a USB flash drive or another type of external hard drive. While this is a simple and relatively inexpensive solution for a personal computer, it doesn’t transfer to the HPC environment. With flash drives or external hard drives, storage can only be used by a single computer at a time. In the HPC environment, multiple computers need access to the same files at the same time. That is why Pomona College needs an HPC-specific storage solution.
So, I researched a few storage options for our use that would meet the following criteria:
- Open-source – This means that ITS will have access to the code and can modify it for our use if needed.
- Easy to install – This allows ITS to easily install any needed software with little effort.
- Scalable – This means we can add more computers with little or no loss in performance.
- No proprietary hardware – This allows ITS to buy any computer from any company. If we were locked into buying from a single company, the company could charge whatever they wanted, and there is a possibility that support would disappear if the company went bankrupt.
Product Investigation Summary
The storage solutions I looked at were BeeGFS, CEPH, GlusterFS, GPFS (IBM Spectrum Scale), and Lustre.
The first cuts I made were CEPH and GlusterFS because both require a minimum of a three-computer purchase. Also, I could not find HPC environments using either of these storage solutions, raising a red flag.
I next ruled out GPFS because it is proprietary software. Furthermore, their licensing is based on how much storage our HPC will use, which would become expensive. For example, based on what we plan to use, it would cost us $4,584.60 per month or $109,512.00 if we purchased a permanent license. Finally, installations to and maintenance of this system are complicated.
Early on, I was interested in Lustre because it is completely free without any licenses attached to it. However, upon closer examination, I discovered that the software only runs on an older operating system, which is a major security concern. Thus, I ruled out Lustre as well.
Ultimately, BeeGFS beat out the other storage solutions due to it being open-source, scalable, and easy to install and configure. Additionally, it would not require us to obtain any special hardware. The only downside is that some of the features I want to use require a license, but the license is still much cheaper and has no storage limits compared to GPFS.
How BeeGFS Works
BeeGFS is made up of three systems: management, metadata, and storage. The management program allows different software packages to communicate with each other. The metadata program stores information about file and folder permission, such as the owner. The storage program is where all the data are kept. Files are broken into smaller pieces and then distributed to the storage computers. As a result, BeeGFS makes it possible for multiple clients to access the storage at the same time.
BeeGFS is an excellent choice for our HPC environment. It will allow us to easily add new storage and scale out to multiple terabytes with little setup required.
More importantly, BeeGFS will allow you, Pomona College faculty and staff, to store large datasets and have multiple computers process that data and get the results quickly. With the help of BeeGFS, we look forward to increasing our HPC system by 20 times in the next 6-12 months. The future of HPC at Pomona is bright!