Going Non-English: A Look at Chinese Search and Social Listening

Coming from English-dominant cultures where Google is often the top search provider, whenever we think of search, we think of Google. We talk about Google’s algorithm, site rankings, search results relevance, and more.

Having conducted searches using the English language since search engines were born, working in Asia now has opened my eyes to the intricacies and challenges we face with non-English languages when analyzing search data and now, social media behavior.

With technology generally being developed in the West and adopted in the East, the digital space has historically represented an over-simplified version of the world, which positioned the English language and Western culture at its center. But times are changing.

The last count of the world population showed there are now more non-English speakers in this world than English speakers. We also know that China and India are two countries that are generating more people, adding to the population of Digital Natives, faster than the rest of the world. If anything, while English is not going away as a major language of commerce, we cannot deny the rise in demand for many non-English capabilities, especially evident in the digital marketing business.

I was recently in a meeting with Baidu, China’s leading search engine. We were chatting about consumer search queries, and how Baidu collates and analyze their search data. For English search results, searchers have come to expect results of high relevance. This is evident at the most basic level, in which the keywords being searched appear frequently and in the right context within the page content. Think about how that may work for Chinese language for a moment. For every Chinese keyword phrase, made up of any combination, from one to more characters, the way characters are positioned together means one thing; at times simply changing the way the characters are positioned within the phrase could alter entirely the meaning of the phrase being searched. It is not about learning the search algorithm in the West and applying it to non-English language search algorithms. I’m certainly not suggesting that the people behind Baidu attempted to do so. Put simply, the search algorithms from the West have little relevance.

Fundamentally, it is about understanding the language, tradition, and cultures influencing the language, juxtaposition of characters, phrases, etc., to develop an algorithmic system that delivers the most relevant results to non-English searchers.

The same goes for social media. I was also humbled recently by an exercise we were conducting internally, in the search for a social listening tool for our Asian markets, focusing among other factors, on non-English, i.e. Asian language capabilities. Most of these providers were founded in the West and have in varying degrees, developed their listening tool to crawl the web for non-English sentiments. Most can do the “crawling” job fairly well, few could provide automated reliable interpretations of those conversation to determine their context and sentiment.

Companies that do this often rely on a team of native language speakers and interpreters, who laboriously go through each thread, reading the sentiments and manually dropping them into “positive” and “negative” buckets. While that process is tedious and vulnerable to human error, I don’t see any other way. For example, there may be a sentiment tracked about a brand, so the crawlers would tell us that such a piece of data has been identified. But it is not immediately clear on the nature of the piece, until someone reads it and interprets it. For example, the consumer may have stated that the said brand has good product quality that he/she liked, but they did not receive good customer care. This sentiment would be dropped in the positive bucket for “product quality” but negative bucket for “customer care.” What this implies, is that brands need to determine what to listen out for, what common phrases their brands are being referenced and talked about.

Our digital lead from the Korea team has put it best when she said, “Due to the homonym and metaphorical nature of Korean language, social listening tools are not enough to achieve a valid social analysis. To obtain meaningful data, a marketing professional who has a strong understanding of the overall market and audience insights, media usage, and behavior must carefully select and manage the right keywords to listen to. The right keywords will deduce key data findings and further manual intervention of the programs will provide a more succinct study of the audiences online presence.”

Until the day when technology can satisfactorily automate the interpretation of non-English sentiments, people are our best bet. From where I stand, I quite like the fact that technology is not replacing human intelligence. We may one day reach the point where algorithms can do this just as well, but we are not there yet.

Related reading