User Search Data: AOL Gives Away the Farm

AOL this morning released search log data on more than half a million of its users. A hefty 439 MB file available on the company’s research Web site shows queries conducted during a three-month period earlier this year. While AOL said the data was anonymized (users are identified only by an ID number), it’s entirely possible to use the file to compile very personal and detailed search histories on individual users, and possibly identify them. The tech blogs are covering this angle extensively, and while AOL has yanked the file, it has already been widely mirrored and downloaded and the situation looks to be an unmitigated disaster for the company.

Affiliate and search marketers will be all over the keyword data it provides, including high traffic search terms and common misspellings. As some have pointed out, a whole crop of low quality AdSense sites optimized for this data are gong to crop up, lowering the quality of orgnanic results. Needless to say, AOL search partner Google will not be pleased.

Update: AOL just sent me the following statement with additional facts:

“This was a screw up, and we’re angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant.

“Although there was no personally-identifiable data linked to these accounts, we’re absolutely not defending this. It was a mistake, and we apologize. We’ve launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again.

“Here was what was mistakenly released:

* Search data for roughly 658,000 anonymized users over a three month period from March to May.

* There was no personally identifiable data provided by AOL with those records, but search queries themselves can sometimes include such information.

* According to comScore Media Metrix, the AOL search network had 42.7 million unique visitors in May, so the total data set covered roughly 1.5% of May search users.

* Roughly 20 million search records over that period, so the data included roughly 1/3 of one percent of the total searches conducted through the AOL network over that period.

* The searches included as part of this data only included U.S. searches conducted within the AOL client software.”

Related reading