AnalyticsAnalyzing Customer DataHow to Handle the Truth

How to Handle the Truth

A 13-year-old boy with a six-figure income just registered at your site. You know he's lying about something -- but how do you get to the truth?

A truism among interactive marketers says the best and easiest way to find out about your customers is to ask them about themselves. So you do. You ask for age, gender, and Zip Code. You may even be querying site registrants about their job titles, income levels, and marital status. You think you know a lot about your customers. But do you really?

Internet users — especially those savvy about marketing practices — often try to protect themselves and their privacy by assuming another identity. In other words, they lie.

“There aren’t really any statistics on this behavior, but if you do a sampling of your friends and you do a sampling of how you have filled out Web questionnaires, you know what is happening,” said Rakesh Agrawal, a researcher at IBM’s Privacy Research Institute.

As consumers, we understand and respect people’s desire for privacy. As marketers, we desire the data that will enable us to target content, offers, and advertising.

“Marketers are getting a raw deal. They basically have to build a model based on people picking things out of thin air,” said Agrawal.

Finding the Truth With Technology

How do we get truthful data but also help people feel secure about their personal information? There are many potential solutions to the dilemma. Being who he is, Agrawal set out to develop a technological solution. It’s called Privacy-Preserving Data Mining and is based on the idea that randomizing people’s information as they enter it can result in data nearly as good as the real thing — if it’s subjected to some post-processing. The software takes what’s input and adds or subtracts to achieve a random value. The level of that randomization — and the resulting privacy — depends on the software settings.

For example, Susan enters her age as 30. It’s randomized to 42. Mary enters her age as 34, which is randomized to 28. This continues for every person who enters her age. The resulting aggregate randomized data is processed and “corrected” by the software. IBM says experiments show only a 5 to 10 percent loss in accuracy, even for 100 percent randomizations, after corrections are applied.

It sounds like an interesting idea and has apparently gotten the IBM team some attention in the privacy community. For marketers who will use that data, it’s a another matter. Even if the final data turns out to be only a little bit off, it’s still off. Can you still target, say, emails based on these data? Agrawal says yes.

One solution has the user’s true data residing on his own computer, with a kind of “dialogue” occurring once a marketing message reaches the computer. (This is Gator’s way of dealing with the privacy issue.) The second answer involves a trusted third party (à la Microsoft Passport) keeping the information, rather than the site. The third, and most complex, involves a sophisticated dialogue (“oblivious function computation”) in which the marketer could specify a rule, such as the message should go to women between 18 and 34. The technology would determine whether the individual met the conditions of that rule, without that person’s age or the rule information actually being exchanged.

Then there’s the challenge of convincing your users their data is safe because of this technology. You’d have to say something like, “It’s OK to enter your real data, because we’ll be processing it and we won’t store the real numbers.”

Finding the Truth With Customer Loyalty

The other approach is a simpler, less technological one, but it won’t work for everyone. To get to the bottom of it, I spoke with Barbara Rice, group director of research at New York Times Digital, and Dave Morgan, founder of Real Media and CEO of Tacoda Systems, a start-up that helps publishers manage and act on their data. Why The Times? As one of the first sites that ever required registration, it’s built up a business heavily dependent on ads being targeted based on registration information. It had better be right.

Rice believes it is. Her faith is partly based on a user survey the company conducted and partly on third-party information from @plan. Both profiles of The Times audience match up pretty well with registration data. In an environment in which everyone agrees users are lying, they apparently aren’t lying to The Times.

The big advantage The Times has is just being The Times. “When it comes to trusted branded news sources, people tend to give accurate information,” said Morgan. “I don’t know what some of these brands are going to do with my email. With The New York Times or The Weather Channel, I have a good feeling that they’re not going to screw around. I know they’ve been around for a long time, and I know they want to be around for a lot longer.”

Another technique The Times uses is quid pro quo. Rice believes giving something to the user, a newsletter, for example, makes it more likely the person will provide accurate information. (The email address must be correct, of course, or the person won’t receive the newsletter.) The same goes for providing a customized Web site based on information users provide.

“If you tie the information to the experience, you can get better information,” said Morgan. “If it’s my sports teams, I’m going to tell them what sports teams I like. If it’s my weather, I’m going to tell them where I live. I don’t want to lie about that.”

There’s also the direct reward approach. The Times has, for the past few months, conducted a sweepstakes in which people who update their information on the site are entered to win a prize. You’ve got to enter correct information, or it can’t notify you if you’ve won.

The bottom line is getting people to trust you. They will do that if they like you, want to be associated with your brand, and want to be a member of your “club.” Or they can do it because you’ve convinced them you’re using technology that effectively protects their personal data. Whatever your approach, you need to get the truth.

Melaney Smith is on vacation.

Related Articles

Metrics to support 'your' digital monetization strategy

Analytics Metrics to support 'your' digital monetization strategy

1m Adam Singer
6 ways to increase your conversion rate using behavioral data

Analyzing Customer Data 6 ways to increase your conversion rate using behavioral data

7m Mike O'Brien
Influencer marketing: Eight tools to identify, track and analyze your brand's next biggest fan

Content Influencer marketing: Eight tools to identify, track and analyze your brand's next biggest fan

7m Tereza Litsa
Tools and tips for calculating the ROI of social media

Conversion & ROI Tools and tips for calculating the ROI of social media

7m Clark Boyd
How machine learning can help you optimize your website's UX

AI How machine learning can help you optimize your website's UX

7m Chris Camps
Why banks are becoming customer-centric organizations

Analyzing Customer Data Why banks are becoming customer-centric organizations

7m Al Roberts
How to achieve true omnichannel relevance

Analyzing Customer Data How to achieve true omnichannel relevance

8m Clark Boyd
How to use behavioral data to enhance your website's conversion rate

Analytics How to use behavioral data to enhance your website's conversion rate

8m Chris Camps