Research Data Management (RDM) covers everything that happens to research data from the moment it’s generated (by an instrument, simulation, or field observation) to its long-term preservation or responsible destruction. It includes data management plans required by funding agencies, storage and backup strategies, access controls, documentation standards, and sharing protocols.
RDM matters because modern research generates staggering data volumes. A single genomics lab can produce petabytes per year. Climate simulations, particle physics experiments, and AI training runs create similarly massive outputs. Without structured management, this data becomes dark data: stored but unfindable, costing money while delivering no scientific value. Funding agencies like NIH, NSF, and DOE increasingly mandate formal data management plans and FAIR compliance as conditions of grant awards.
In HPC and multi-researcher environments, RDM gets complicated fast. Data spans parallel file systems, scratch spaces, archive tiers, and cloud storage. Multiple researchers share systems with no clear ownership boundaries. Effective RDM at this scale requires metadata-driven visibility across the entire storage environment: the ability to track provenance, enforce retention policies, manage storage chargeback, and give researchers self-service tools to manage their own data without burdening IT.
Starfish Storage supports RDM at HPC scale with metadata-driven visibility, automated policy enforcement, and storage chargeback. Harvard FAS Research Computing uses it to recover $2M/year in storage costs while giving researchers self-service data management.
Related Links
- Starfish Storage for Higher Education | Starfish Storage
- Harvard FAS Research Computing Case Study | Starfish Storage
- Research Data Management | ICPSR
- Research Data Management | University of Denver
- NIH Data Management and Sharing Policy | NIH
