MapReduce and Marketing: Are Small Bits of Big Data Meaningful?

There are many truisms around managing big data that I do think are actually true, like the one about how just because a technology solution can be used for some purpose doesn’t mean that it is the best option for that purpose. I feel the same way about that truism in business that says the only way to solve big problems is to break it down into small problems and solve them in turn.

However, I’m not convinced that it’s a good idea to follow advice that I often hear from several quarters, that you should “break down big data into small data” so that you can manage and understand it. Perhaps it’s semantics, but the point is to keep big data big – that is where the power and opportunity lay.

I recommend tapping technology tools that provide the ability to identify the key bits of data that are really important to your decisioning or messaging, and use those in context to create an ominichannel view of the customer. This is only possible with effective management of the big data. Big data is not simply the sum of the small data parts. It’s a view of the customer profile, intent, and behavior that is only possible because marketers have access to and can utilize all the data to improve the offer timing and content.

At the same time, there is a lot of big data that is useless to marketers – and it often gets captured and stored anyway. A better solution is to skim just what you need out of a big data set – while keeping the context intact. This is not new, and is increasingly available to marketers through their data warehouse, data management, or campaign management solution(s). MapReduce is a tool that helps marketers handle the unstructured and semi-structured resources that are not easy to analyze with traditional tools. defines “MapReduce” as a programming framework that “supports distributed computing on large data sets on clusters of computers” – essentially to simplify data processing across massive data sets. We hear lots of talk about Hadoop too, which is an open source version of MapReduce supplied by the Apache organization and the best known implementation of the MapReduce framework.

Unstructured or semi-structured data are things like web session logs, clickstream data, web analytics and optimization streams, social data, and other types that do not fit the “rows and columns” structure that is easy to analyze with relational database tools.

MapReduce can help sort through the masses of data and pull out the important parts. Many large data streams like web logs have a lot of data in them that has no long-term value. It doesn’t make sense to spend a lot of time and processing power to upload data to a persistent location (the database) when you only need it for a short time. This is true for things like sentiment analysis or when publishing an event-based word cloud – when the event is over, the data is no longer needed, but the cloud itself is worth keeping.

Another great example of useless data getting in the way is an automated browse messaging scheme. What you really want is to comb through the entire web log, and find all customers who browsed but didn’t buy. All the other data – the length of session, the other products viewed, the ads that were viewed, etc. – you don’t need in order to trigger an email follow-up with the right product and offer based on the non-purchased item.

MapReduce is not a database. It has no querying power and no knowledge of what other data sets exist. It runs processes in parallel and is especially adept at pulling out small sets of data from the big data set and understanding them so they can be used as part of a larger picture. Lots of such jobs can be run at the same time and without any connection to each other – until the results get into the main database. Please note that it usually requires a specific expert to implement and optimize – many great database teams do not have this experience (yet).

Big data is just the latest generation of intimidating data sets – and tools like MapReduce can help tame big data by preprocessing it and passing important pieces on for further analysis. It lets you see and utilize small data inside the big data context. I think that is an important distinction – and opportunity.

Please comment below and let me know how your company is using various big data tools to help you manage big data insights.

Big Data image on home page via Shutterstock.

Related reading