connectedcloud

What Every Marketer Needs to Know About Hadoop

  |  August 30, 2012   |  Comments

With "big data" on everybody's lips, here's all you need to know to keep up your end of the conversation.

"Big data." There's no escaping it.

It's catchy. It's generic enough that everybody is using it for everything. It's a one-size-fits-all phrase.

It's so all-encompassing that the best definition I've seen recently is from Stephané Hamel who put it this way:

bigdatatweet

So with "big data" on everybody's lips, here's all you (the marketing executive) need to know to keep up your end of the conversation.

A. Disk drives got cheaper so we can store more data. The ways and means of collecting all sorts of data have proliferated faster than Twitter traffic or TSA lines at the airport. We have more of data, more types of data, and it's coming at us faster (real time) than ever dreamed possible. That's what makes up "volume, variety, velocity."

So, the ability to replace big, honkin' disk drives with many smaller, cheaper drives that we can wire together is the first, significant technical advance.

B. We can split up the processing. The second advance is the ability to augment the big, honkin' processors with many smaller, cheaper servers. We have distributed the processing to the data instead of waiting for the data to rocket back and forth from disk farm to processor.

connectedcloud

So What?

So, there are two things to keep in mind when your marketing budget is being allocated to what seems like pure IT projects.

  1. The more data you throw into the pot, the more likely you are of finding some sort of relationship (correlation) to act on. More on that can be found in a July column I called "Consilience - The Intrinsic Value of Big Data."
  2. This practice of splitting up the data, solving smaller problems, and bringing it back together (MapReduce) is very useful for some specific types of processing. Getting this under your belt gives you voting rights when discussing options.

Big, honkin' analytics processors are very good at finding hidden pieces in a hurry. (Show me all the customers who have bought in the past three months after clicking on these special offers and abandoning their shopping carts.)

But those types of questions are known unknowns. You know the things you're going to ask and the entire database is set up that way. You know you'll want to see things by date, by region, by product line, etc. That is what gives these enterprise data warehouses their power: they are designed in advance to answer the questions you know you might ask, and they can answer them very quickly so you can refine your questions - as long as you have deep knowledge about what data you have and how it is structured in the database.

hal-seye

But the other data - the messy data - is chock-full of unknown unknowns. We know the information might be valuable, but we don't know what to ask.

MapReduce is great as a low-cost storage medium for unstructured data and for refining that data into a more structured form for heavy analysis. Social media data, call center transcripts, clickstream data, website content, and sensor data all start out unstructured.

MapReduce is ideal for pre-processing text, turning all those tweets into numerical models of opinion (sentiment analysis), which can then be fed to the big, honkin' analytics machines for correlation discovery and problem solving. It's great for asking slower questions of larger amounts of data. It's great for finding a representative sample of data so the big, honkin' processors don't have to juggle all of the bits at once.

So the next time somebody throws "Hadoop" into the conversation, you'll know more than the fact that it was named after Doug Cutting's son's stuffed elephant.

Connected Cloud and Hal's Eye images via Shutterstock.

Tags:

ClickZ Live New York What's New for 2015?
You spoke, we listened! ClickZ Live New York (Mar 30-Apr 1) is back with a brand new streamlined agenda. Don't miss the latest digital marketing tips, tricks and tools that will make you re-think your strategy and revolutionize your marketing campaigns. Super Saver Rates are available now. Register today!

ABOUT THE AUTHOR

Jim Sterne

Jim Sterne is an international consultant who focuses on measuring the value of the Web as a medium for creating and strengthening customer relationships. Sterne has written eight books on using the Internet for marketing, is the founding president and current chairman of the Digital Analytics Association and produces the eMetrics Summit and the Media Analytics Summit.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Analytics newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

UPCOMING TRAINING

Featured White Papers

Google My Business Listings Demystified

Google My Business Listings Demystified
To help brands control how they appear online, Google has developed a new offering: Google My Business Locations. This whitepaper helps marketers understand how to use this powerful new tool.

5 Ways to Personalize Beyond the Subject Line

5 Ways to Personalize Beyond the Subject Line
82 percent of shoppers say they would buy more items from a brand if the emails they sent were more personalized. This white paper offer five tactics that will personalize your email beyond the subject line and drive real business growth.

WEBINARS

    Information currently unavailable

Resources

Jobs

    • Lead Generation Specialist
      Lead Generation Specialist (The Oxford Club) - BaltimoreThe Oxford Club is seeking a talented writer/marketer to join our growing email lead-generation...
    • Health Marketing Editor
      Health Marketing Editor (Agora Inc.) - BaltimoreCome flex your intellectual muscle as part of Agora, Inc’s (http://agora-inc.com/) legal team...
    • Marketing Systems Analyst
      Marketing Systems Analyst (OmniVista Health) - BaltimoreOmniVista Health is looking to add a Marketing Systems Analyst to our expanding team. We...