The Big Data Dilemma

Question: What data is needed, and for how long?

Answer: All of it, forever.

That was a rhetorical question posed during the opening customer keynote at the Teradata Partners in October 2011. While the question and answer were hypothetical in context, it immediately caused a flood of tweets, and as I looked around the room packed with over 3,500 attendees, I could almost see the thought bubbles emerging from the heads of many…

  • How can my business determine which data is most valuable?
  • How long should we store this “important” data?
  • What are the cost implications of collecting everything and storing it forever?
  • Is it even legal to store data in perpetuity?
  • Who in the world is going to go back multiple years and begin conducting new analysis on really old data?

Collecting and storing all your digital data is a luxury that most businesses today do not have. While data storage options are becoming more economical all the time, most businesses cannot – and should not – collect every bit, byte, or petabyte that flows through their enterprise applications. However, as the importance of digital channels grows, so too does our massive quantities of digital data. Here are a few tips to help you think through the big data dilemma in a rational manner:

Need to know vs. nice to know data. As businesses operating in the digital age, there is an understandable inclination to want to collect every piece of information possible. Especially, because we have the means to do so with advanced data collection tools, massively scalable storage environments, and options that parse data to disparate systems across the enterprise. Yet, most businesses don’t effectively use the data they already collect, and adding more information to the mix doesn’t help matters.

Understanding what data matters to your business requires empathizing with business stakeholders, examining marketing programs, and getting to the mission-critical values of the organization. In my experience, I’ve found that simply asking business stakeholders what metrics or KPIs are most important to them is a futile endeavor. For starters, they don’t speak our language of analytics, and even those who do are hard-pressed to articulate their business needs in neat, metric-sized bites. The task of discerning which data matter involves investigation, collaboration, and refinement. Oftentimes this requires stepping away from your daily grind to see the big picture or bringing in an outsider to help you see what data is right in front of you. Either way, the goal is to interpret the business needs and translate those needs by packaging them up in a way that makes the business salivate for your data, because they not only need it, but they thrive on it!

Archived vs. accessible data. The next big thing to consider after you’ve determined which data matters is how long do you really need to keep it active for your analysis, marketing, and business intelligence applications? Even the most egregious hoarders of digital data typically roll off data at specified intervals so that they can work with a manageable set of data and reduce processing times and storage costs. Whether this happens after 24, 36, or 60 months is dependent upon how you’re using your data and in some cases what the legal requirements are for your industry.

Yet, a few things to consider are who’s using the data and for what purpose? If your teams are conducting digital click-stream analysis to evaluate usability of your digital properties, each time you redesign or modify your online destinations you’re introducing new variables that make historic data less comparable. In most cases, maintaining the processed high-level data is sufficient for trending purposes. While some analytics pros will make a case that data exploration requires raw data sets and volumes of historic data, they are in fact correct. Yet, my experience tells me that only a handful of companies have the time or resources to “play” with data and explore trends and anomalies because getting out the weekly report or answering a top priority request usually trumps deep data diving for fun.

Complexity vs. simplicity. One of the overarching themes at the Teradata Partners conference was the simple fact that complexity becomes more and more apparent as we grow our digital data stockpiles. This complexity stems from having multiple systems collecting data and processing information across myriad customer touch points that fire off responses at real-time speeds. To do anything less is insufficient and often times futile. Yet, consumers for the most part don’t deal well with complexity. They need a simplified experience that masks the complexity of the big data business world.

Thus, your challenge becomes simplifying an incredibly complex environment by shielding the customer from an overwhelming world of statistics, algorithms, and business logic to present seamless online experiences. While showing the mechanics of a precision watch may be fascinating to some, most just want to know what time it is. Therein lies the challenge of meeting consumer demands in an empowered customer-driven ecosystem. The best that you can do is to offer an experience that reveals answers in an instant and offers multiple levels of depth for those that request more.

As the thought bubbles rose to the ceiling and more information was delivered, it wasn’t long after the opening customer keynote that the real answer to the hypothetical question was delivered. The CEO, Mike Koehler stated to the audience; “Not all data is created equal. Some data is more valuable than others.”

This column was originally published Oct. 6, 2011.

Related reading