“It’s cheaper to keep data than delete it,” is a quote from Bob Page, vice president of products at Hortonworks, from the last eMetrics Summit. It sounds absurd at first, but his premise is sound.
The cost of throwing more hardware at your storage system or increasing your rental space in the cloud might just be lower than the cost of deciding what to delete.
I’m not referring to medical records, primary research data, or financial records, etc. “Standard accounting practices,” research protocols, and the IRS cover those instances. I’m talking about the customer-related data you keep for advertising and marketing purposes.
Data governance usually revolves around what data to collect, how it will be cleaned and managed, and who may access or manipulate it. But deletion seldom enters the conversation.
Philosophy, strategy, policy, and methodology of data deletion all need to be discussed, aligned, rolled out, managed, and maintained, which will take a fair amount of time and resources.
How Long Is Data Valid and Valuable?
The U.K.’s Data Protection Act says, “Personal data processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes.” One finds Zen koans in the strangest places.
Today, data is collected and kept on the chance that somebody will think of any interesting question. Deleting data too soon may cause trouble.
The United Parcel Service decided the 200 and some addresses I had entered into their database were not worthy of maintaining (See “Where’s My Freakin’ Data You B@stards?“) and obliterated it without asking me.
How Long Until Data Is Dangerous?
Your legal department will tell you that data becomes dangerous when it is kept so long that it might fall into the wrong hands: hackers or opposing attorneys.
Data protection is getting more and more attention these days and your IT department is tasked with keeping it all safe and sound. After all, your legal liability grows the more data you keep and the longer you keep it.
“Discoverability” is also a serious concern to your legal beagles. When the other side in a lawsuit asks you to produce electronic evidence, it’s best if one can point to a policy that states, “customer data shall be destroyed after X years,” along with a well-documented procedure that manages the deletion process.
But data also becomes dangerous when it no longer represents the truth.
Amazon can show me everything I’ve bought from them since my first purchase on March 26, 1996. That data is still valid. It may not be valuable, but it is still true. However, what I searched for in 1996 no longer represents my intent to purchase, is no longer true, and is actually harmful to an algorithm trying to help me find and buy new stuff.
Amazon’s approach is to trust their customers to do the decision-making for them by offering you the chance to “Improve Your Recommendations.” They invite you to rate the items you’ve purchased, identify which you bought as a gift, or simply check the box that says, “Don’t use for recommendations.”
How steep is your data decay curve? When does your data become toxic and corrupt the veracity of the answers you seek?
Clearly, it’s necessary to delete some data from time to time. The question is, what data is not worth the effort of even worrying about?
As both a Googler and ClickZ team member, I recently attended and participated in the always-inspirational ClickZ Live New York event. Along ... read more
Your customers are engaging with your business across an increasing number of touchpoints – websites, social media, in-store, mobile and tablets. But ... read more