Are search engines 'semantic’?
Let’s get to the crux of the matter.
It never fails to make me smile when people misuse the term ‘semantic search’, so really… what is semantic web search?
We often talk of semantic search as if it’s something new. Semantic search goes back years, even centuries. Here is a simple definition of semantic search…
Let’s take a village. Mark and Claire have a seven year old son, James, and a baby who is six months’ old and who has asthma, Olivia. Mark and Claire want to go to a dinner party next Thursday, alone. They need a babysitter. Not just any babysitter but one who will keep James away from video games after 7pm and someone who has experience minding new-born children and experience with asthma management.
If Mark and Claire were to go to a search engine they might type something like:
[babysitter no video games] or [experience of new born asthma babysitter] or a combination of them both, a semantic query, [babysitter no video games and new born asthma experience].
And we have not even got to the Thursday thing or even included location…
The engine fails to return a list of people/babysitters who are strict and know how to work video games (let’s say James puts a video game on and the babysitter needs to take the video game out or disconnect the console, in the most extreme case) but also has experience with new-born childminding with experience or a certification in infant childminding which covers asthma management.
The day an engine returns a list of babysitters that meets all of Mark and Claire’s requirements is the day we have semantic search. We are a far away off from this.
Google’s Knowledge Graph, and Bing, Yahoo and Baidu’s Knowledge Bases have all been working on presenting media objects to the user using an additional SERP snippet or two.
Knowledge Graph and other engines’ bases themselves do touch on semantic search but engines first need to understand the query. This is something Google’s Hummingbird, and now with the use of Artificial Intelligence, is starting to lead the way with.
Just look at translation services, get a native speaker and you will see that search engines cannot yet properly translate queries into full, local dialect. Keeping in mind that Hummingbird was released over two and a half years ago!
Google Hummingbird attempts to examine queries, usually more than two keywords long, and first filters out which keywords are required and which are optional. There must always be one required keyword which is also the subject keyword.
Subject keywords are searched for semantically and today this is often just synonyms, a bit like an online thesaurus. For semantic search, the engine must deconstruct the whole query and reformulate it with variations, matching it with semantics, and construct sub-queries for each combination.
To do this properly engines need to add a segmented, semantic tab to their index.
Media objects, such as, webpages, images, audio clips, social media profiles, have always been connected within the current web. This is what Knowledge Graph and Bases use. Not semantics.
Knowledge Graphs and Bases are often called semantic search and media objects are often called entities. Semantic search goes further by also connecting media objects to objects themselves (e.g. people, places, organisations and events).
The current modes, above, cause unfair coverage bias since Schema and HTML scrapers will not cover all websites. Schema is only used by those who know what it is and who can code whereas HTML scrapers will be biased towards popular websites just as most of the engines’ top ranked results are crawled more frequently than less popular websites.
The middle man can easily be left out. This isn’t semantic web search.
The internet has changed how our brains obtain and store information. We search to retrieve information rather than store it. This is why search is a popular online activity.
Engines want to maintain happy, returning users. Otherwise they will miss out on all that paid search activity which keeps them afloat.
Knowledge Graph and Bases simply retain the searcher for as long as possible and vastly diminishes us from visiting other websites allowing us to subconsciously feel that the engine itself is more trustworthy, which reinforces search loyalty.
No. They are starting to move in the right direction with their use of semantic technology, Google has been creating an entity network within images since 2006 by naming them with numeric values as supposed to text strings, but all you need to do is play around with a search translation tool to see how rusty this is.
Or, even worse, ask the engine for a babysitter…