A Not-So-Unique Definition of Big Data

The Oxford English Dictionary is under constant assault. The antecedent of today’s “Wiktionary” was, like so many good things, launched by an early English entrepreneur by the name of Professor James Murray. Coincidentally, Murray’s technical co-founder was an American military surgeon named William Chester Minor who happened to be an inmate at an insane asylum. Tech incubators take note. They worked together remotely, and for almost 20 years Murray apparently didn’t know his colleague was crazy.

The original OED was a compendium of Murray’s and Minor’s efforts, more Minor than Murray. But one man’s effort only scales so far. Eventually, using only first-party, hand-collected data on scraps of paper can’t keep up with the march of progress. And so, recognizing that words and phrases come and go, the venerable vetter of all matters lexically legitimate comes out with a periodic refresh, overseen by its editors and a league of independent contributors. In its latest update in December 2012, the OED added words like Captcha, iced tea, base case, and xolo to the listing of acceptable words – a great resource for those of us who are trying to figure out what to do with the “X” in scrabble.

One phrase that doesn’t show up in the OED is “big data.” You know what “iced tea” means, so what is the prognosis for the addition of “big data” to the “big dictionary”? The chances that it makes it into the OED seem slim to me. That’s too bad because big data has been around and in use for a long time, but don’t tell anyone where it came from, because then it won’t be cool anymore. Why?

Because since before big data was called big data, email marketing was the original big data. (The same goes for “native advertising,” but that’s for another post.) How can that be?

Email marketing databases managed by sophisticated marketers and large retailers are among the more data-rich stores available, and they drive millions of sales and interactions every single day. Email marketing is big data at work in the real world.

So let’s first help the OED out with some definitions. Big data is simply the collection of large data sets from numerous sources organized and related by a single or several key fields (email address, UDID, household address, PID) that unite a dataset under a specific object, individual, or event (person, place, or thing). Email has been doing this for almost two decades. Users create the entry (subscribe to something, log in, Facebook connect), they interact with a received message (on a device, in a place, on a page), and they take action (open, click, buy, reply). Then this data is enhanced via a method referred to as “appending,” using data from different data sets, but related via the key fields found within the original customer record. Eventually through enough interactions, a lattice is formed and there are records of what an individual bought, when people clicked on links, what devices were used for clicking, how many units were bought, and so on. And this email-driven big data is what drives modern commerce and CRM.

What’s a better unique identifier to tie all these elements together than an email address? While cookies can be made long enough to be considered unique, you can’t build a lasting database on them – people fear and clear cookies, they change computers, and they use different devices all the time. But they keep the same email address for years, and some people even reserve an email address for their children as soon as they are born.

The email address is the ideal PID for big data, at least in the CRM sense. What other string of letters have you memorized that is uniquely you? And what is the connective fiber in the B2C databases that fuel today’s commerce?

Call me crazy, but when the story of big data is written, don’t be surprised if email is left out of it. Because as long as Professor Murray could ignore that his co-founder was an accused murderer in an insane asylum, today’s big data purveyors will be free to deny email’s legacy in the creation of today’s big buzzword, big data.

If you want to learn more about this incredible partnership, I urge you to read “The Professor and the Madman.”

Big Data image on home page via Shutterstock.

Related reading

A picture of a Graze summer box with its lid open, with a bold illustration of strawberries inside the lid and four punnets inside. Scattered around it are healthy snack items like pieces of fruit and nuts.