This post is the first in a series about the storage capacity chargeback system at Harvard Faculty of Arts and Sciences Research Computing (FASRC), that’s enabled by Starfish Storage. The solution is currently recovering close to $2M in storage costs annually as well as accruing additional data management benefits to the university. In this chapter we’ll cover how Research Computing leadership overcame the obstacles that many large institutions face implementing chargeback and what drove its early success.
Growing storage costs at Harvard FASRC were unsustainable
Harvard’s FASRC provides HPC and storage for thousands of researchers. The group is headed up by Raminder Singh, Senior Director of Data Science and Research Facilitation. FASRC manages over 99PB of storage that has grown at double digit rates year over year and Raminder realized that giving researchers unlimited access to storage was unsustainable.
Four major obstacles stood in the way of implementing a chargeback solution that would bring costs under control
FASRC had a long-standing objective to implement storage chargebacks, but there were key obstacles that made it seem impossible. As Raminder explains:
“Implementing chargebacks in an environment as big and as complex as ours is seemingly impossible. First you have to get the data out of our massive storage systems, then you have to make sense of the files that are spread all over, then you have to meet the regulatory requirements for accountability, and finally you have to overcome the objections of our major stakeholders who are used to getting storage for free.”
Each of these obstacles had its own unique expression at Harvard:
- Getting the data – Harvard’s file systems contain billions of files, and they’re constantly churning. Conventional means of scanning file systems simply take too long at this scale: By the time a scan is complete, the data has changed and is no longer actionable.
- Making sense of the data – There would need to be a simple way to match files to grant programs. Grant related files tend to be stored in multiple places and belong to multiple people, projects and departments.
- Regulatory / governance requirements – Storage consumption would need to be measurable in a consistent way, and the cost data must be auditable and reproducible.
- Cultural resistance – No one likes to pay for something that used to be free. The biggest grant-holders at Harvard are very influential, considering that they’re the ones bringing in the revenue.
An innovative integration of ColdFront and Starfish overcame these challenges
Based on his prior experience using Starfish, Raminder came up with the innovative idea of combining Starfish Storage and a product called ColdFront to overcome these obstacles. He undertook the creation of a prototype interface between Starfish and ColdFront which links consumption of compute resources to grant and project cost centers.
Raminder’s prototype successfully provided proof of concept that together, Starfish and ColdFront could deliver an automated system for charging storage consumption back to the corresponding grants that fund the research.
Both ColdFront, and Raminder’s integration—which is continually developed and maintained by programmers at Harvard—are open source, and available at the following sites:
- ColdFront: https://coldfront.dev/
- Harvard FASRC’s Starfish/ColdFront Integration: https://github.com/fasrc/sftocf
Starfish provided the detailed, credible data about data storage usage needed to drive acceptance of the solution
- Getting the Data: Starfish has patented methodologies that wildly accelerate file scanning. Starfish gets the data needed for the chargebacks substantially faster than anyone ever thought possible, providing up to date information for monthly, weekly or even daily billing.
- Making Sense of the Data: Starfish has a novel metadata system that—among other things—allows for tags to be applied to files and directories. Directory tags are used to match file content from anywhere in the file storage environment, logically group it, and match it to the grant code. Tags can be applied programmatically: Raminder prototyped the code to automatically tag directories with grant codes.
- Accountability: Starfish provides an enumeration of every directory and every file that’s charged back to the grant. The information is retained for every pay period and is auditable.
- Cultural Resistance: Once Raminder was able to track storage use by grant, it became clear there’s no correlation between the dollar value of the grant and the amount of storage capacity it consumed. This signaled to the major grant holders that they had inadvertently been subsidizing the storage needs of other researchers who consumed more storage relative to the size of their grants. The chargeback system fixed this and brought a level of transparency and fairness that caused the most influential members of the faculty to wholeheartedly support it.
The Starfish/ColdFront solution is now successfully recovering storage costs and incentivizing researchers to delete unwanted data
After an initial pilot period, the program was deployed broadly across Harvard FAS and began generating substantial returns, both in revenue charged back to grants, and in improved storage management. For the first time, researchers had both the tools and the incentive to manage their own storage like a true project resource, and gradually have become much more proactive about archiving data, and deleting what isn’t needed. The program’s initial results tell the story unequivocally:
- Users are in the process of deleting 20PB of existing data they don’t need.
- First year revenue was $500,000.
- Revenue expanded to $1.5 million in the second year.
- Now in 2025, the system brings in approximately $2 million a year
Harvard’s chargeback system is newsworthy because Raminder overcame the classic obstacles to implementing chargeback—at one of the world’s largest and most diverse research institutions. And while the chargeback system generates enormous revenue for Raminder’s department, that’s just the beginning. Stay tuned for Chapter Two.
Many thanks to Raminder Singh, Senior Director of Data Science and Research Facilitation at Harvard University’s Faculty of Arts and Sciences Research Computing (FASRC) for collaborating with Starfish on this story.
____________________________
NOTE: We first reported on this system in a case study published in 2023. Since then the chargeback system has taken on a life of its own, delivering additional benefits that come from a combination of the right tools, the right personnel, and the impetus that results when users pay for the resources they consume.