Guest Post: File Lifecycle Management in the Cloud Era
The idea of considering a life cycle for data is not new. However, in my experience, the topic has usually been floated as more of a “document lifecycle” in the context of enterprise content management (ECM).
The problem is that ECM is a fringe practice (think records management). A typical enterprise with an ECM system will see that for every “document” placed into an ECM system; there are hundreds if not thousands of files that will continue to live their entire lifecycle in file shares, forever remaining outside of any ECM framework.
Organizations have too much information to manage in a heavy-handed way.
Does that mean we give up?
Giving up is not an option because there are real costs and risks involved (e.g., brand damage from data loss or data privacy failures).
My view is that we need to think about file lifecycle management from a practical angle, which is to say that IT owns the problem, the needs of users come first, and we must consider protecting the business throughout the lifecycle of information.
Thus, I am going to propose an updated, and more practical, file lifecycle management framework that speaks to the reality of how we see IT leaders taking advantage of the cloud to manage the file lifecycle.
The status quo file lifecycle approach
The typical organization has little visibility and control over most of their information. (That’s all changing with cloud, of course, which we’ll get into shortly.)
For many organizations, machine and user-generated information live on network-attached storage devices (file shares). And it is common for files dating back decades to occupy space in the same file server as new data.
Once created, a good number of files typically see modification and retrieval activity within the first week to 30 days. After 30 days, most files are rarely (if ever) accessed again.
When it comes time for a hardware refresh, IT administrators may take the opportunity to move some of the ancient data to a lower-cost storage array. If not, all information continues to sit on SSD or spinning disk and require backup and replication.
It is not the role of people in IT to purge any of the information. And since users do not clean up their older data, the lifecycle of information in organizations is generally to keep all data on primary storage indefinitely.
Cloud-era information lifecycle management
There is a new information lifecycle paradigm available with the combination of software-defined storage (SDS), cloud storage, and cloud services.
For starters, SDS solutions can fully automate the task of storage optimization, including the encryption, deduplication, compression, and movement to low-cost storage tiers as retrieval activity diminishes. SDS can help you to take advantage of cloud storage for backup, tiering, and long-term archival so that primary NAS devices can be sized for active workloads only.
Next, cloud services can perform a variety of computational work to support a modern file lifecycle management flow. For example:
- Files can be analyzed to detect any private or sensitive contents and are tagged accordingly.
- Chargeback analytics can attribute storage consumption to specific groups or departments automatically.
- Access controls can be queried and analyzed to detect over-privileged access and orphaned data.
- Ongoing access activity can be monitored to refine access controls and to optimize the storage tiers based on retrieval demand.
- Retention management allows for immutable retention periods and automatic deletion to help ensure the right information is kept untampered and that purging of data can occur where it makes sense.
- Finally, data privacy and legal requests for holds, collection, and deletion are more straightforward due to the nature of queriable object storage, sophisticated policy controls, and indexing services that include relevancy ranking, speech-to-text, and optical character recognition capabilities.
The benefits of modernizing information lifecycle management
The modern file lifecycle process involves less hardware, less risk, and less cost when compared to the way IT storage admins grew massive storage footprints inside the corporate datacenter without any agile insight and control over the data.
At a time when workers are doing their jobs remotely, the resulting shift in IT infrastructure towards cloud-based solutions offers us the opportunity to revisit our information lifecycle management practices. Opening up our data management approach to cloud services can give us new ways to understand and optimize stored information on a corporate scale. Ultimately, file lifecycle management in the cloud era is about avoiding threats and brand damage while operating a more agile, business-friendly IT infrastructure at less cost than before.
This guest post was written by Geoff Bourgeois – CEO @ HubStor