Customer Data Munging and Reconciliation for Correlation

  |  August 18, 2011   |  Comments

What custom clothiers and tailors have in common with analytics professionals.

You want custom made clothing? Step right up and we'll measure you. We'll find out how tall, wide, and thick you are in any number of places. Well, 33 places to put a specific number on it.

At least, that's how many are used by MGL Industries and burlesque costumer Glitz by Linda Joyce and that's not even counting ear height, glove length, or pastie size.

Now imagine that your custom clothier were to measure your neck with calipers, your arms with a yard stick, your wrist with a micrometer, your hat size with a protractor, your inseam with a tape measure, your arms with a laser range finder, and your waist with a Smart Finger.

Not only would the process be time consuming, the results would be a mishmash of not quite relatable numbers.

I've been pondering the dilemma of differing customer data for a while. I remain hopeful but not immediately confident. Much the same as I feel about medicine, law, and government. Will we ever be able to put all our digital data eggs into one customer warehouse basket and come out with a reliable omelet?

Data management mechanics have long been mapped out: capture, cleanse, store, extract, etc. But it's the transformation of all that customer behavioral data that comes just before loading it all into the master warehouse that has me concerned. Customer data come in all shapes, sizes, weights, density, and value.

It is a given that any two advertising servers will record their performance in slightly different ways, that any two web analytics tools on the same site will generate different numbers, and that any two customer satisfaction indexes will differ. This is merely the problem of the man with two watches who does not know what time it really is.

This issue is put to bed by giving up hope for standard, industrial strength metrics, acknowledging that every yardstick is slightly dissimilar. Organizations succeed when they settle for internal consistency over galactic exactitude.

Data cleansing is not as problematic. It draws on the services of a data dictionary. In system A, men and women are identified as either M or W, in system B, as either M or F, and in system C, as either 1 or 2. A quick cross-reference puts all things to right as long as "Decline to state," "A little of each," and "Not sure yet" are accounted for.

Merging or joining all of these data so they make sense requires a thorough understanding of how each is calibrated. In one case, a week's worth of data represents data collected between Monday morning and Sunday night. In another, it's Sunday morning to Saturday night. In a third, it's simply the monthly total divided by 4, 4.25, or 4.33333. Messy, but manageable.

The real tricky bit comes when trying to attribute said data to individual individuals. For that, a common key is needed. If we all have one and only one telephone number, email address, customer ID number, or ship-to address, then all the information about one person could be correlated to all of the other information about that one person. Multiply that multi-headed hydra with the number of cookies we have on the number of devices we use and the problem becomes nail-biting.

The additional challenge is something I have heard referred to as "data munging." This is the art of associating apples and orangutans. The two have very little in common and have dramatically different attributes. Nevertheless, we are compelled to assume that their coexistence in the same database will reveal hitherto unrealized returns on investigatory investment.

Social media influences, advertising exposures, click-through activities, email opens, blog post sentiments, shopping cart inclusions, likes, shares, and +1s are not measurable in the same way by the same scale and in any standard form. And yet...

During a recent interview, Brandt Dainow from ThinkMetrics asked about the complexity of data reconciliation. I could not give him a clear answer. I was, instead, frustrated that the term "data reconciliation" would be perfect for this problem if it were not already in vogue to describe rectifying errors introduced by measurement noise.

Now that you've read this far, I have two pleas to make to you:

  1. What is the proper term for the conjoining of disparate data types for the purpose of building a truly useful model - in this case of customer behavior - for the purpose of optimizing marketing?
  2. And, does anybody have any ideas they'd like to share on how this can be done in a way that is useful across more than one instance (industry/product line/campaign)?

I'm all ears. (3 ¼" x 1 ¾" each.)

Tags:

ClickZ Live Chicago Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Sept 5 to take advantage of Super Saver Rates!

ABOUT THE AUTHOR

Jim Sterne

Jim Sterne is an international consultant who focuses on measuring the value of the Web as a medium for creating and strengthening customer relationships. Sterne has written eight books on using the Internet for marketing, is the founding president and current chairman of the Digital Analytics Association and produces the eMetrics Summit and the Media Analytics Summit.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Analytics newsletter delivered to you. Subscribe today!

COMMENTS

UPCOMING EVENTS

Featured White Papers

IBM: Social Analytics - The Science Behind Social Media Marketing

IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.

Marin Software: The Multiplier Effect of Integrating Search & Social Advertising

The Multiplier Effect of Integrating Search & Social Advertising
Latest research reveals 68% higher revenue per conversion for marketers who integrate their search & social advertising. In addition to the research results, this whitepaper also outlines 5 strategies and 15 tactics you can use to better integrate your search and social campaigns.

Resources

Jobs

    • Senior Director US Agency Ad Sales
      Senior Director US Agency Ad Sales (Expedia, Inc.) - ChicagoJob Title:  Senior Director US Agency Ad Sales   Position Overview: The Senior...
    • Senior Director US Agency Ad Sales
      Senior Director US Agency Ad Sales (Expedia, Inc.) - New YorkPosition Overview: The Senior Director US Agency Ad Sales is responsible for managing...
    • Digital Marketing Analyst
      Digital Marketing Analyst (GovLoop) - Washington D.C.Are you passionate about audience acquisition? Love effective copy and amazingly effective...