Twitter has opened up its whole catalogue of messages to a search engine, meaning that its entire content, around 500 billion documents, is available to sift through online.
The social network launched around eight and a half years ago, and now claims to have some 284 million monthly active users.
Twitter engineer Yi Zhuang said in a blog post that the improved and enlarged search required a lot of work.
“Since that first simple tweet over eight years ago, hundreds of billions of tweets have captured everyday human experiences and major historical events,” he wrote.
“Our search engine excelled at surfacing breaking news and events in real time, and our search index infrastructure reflected this strong emphasis on ‘recency’.
“But our long-standing goal has been to let people search through every tweet ever published. We [have] built a search service that efficiently indexes roughly half a trillion documents and serves queries with an average latency of under 100ms.”
The search engine has the entire canon of 140-character messages at its disposal, and Zhuang suggested that it would have a range of applications, particularly for companies or individuals looking for content on an incident or event.
The new index is 100 times larger than the previous one, and grows by “several billion tweets a week,” according to the engineer. This could be a good time for Twitter users to go back into their accounts and delete any dubious tweets to prevent them from resurfacing and embarrassing them.
The system is a mix of batched data aggregation, partitioning, and indexing, and is scalable, easy to use, and dependable, according to Zhuang.
“Our fixed-size real-time index clusters are non-trivial to expand; adding capacity requires re-partitioning and significant operational overhead. We needed a system that expands in place gracefully.”
just setting up my twttr
— Jack (@jack) March 21, 2006
This article was originally published on http://searchenginewatch.com/sew/news/2382397/twitter-lets-users-search-for-every-public-tweet-ever-sent.