Google has launched a new Google Scholar search service, providing the ability to search for scholarly literature located across the Web.
“The goal is to allow and enable users to search over scholarly content,” said Anurag Acharya, a Google engineer leading the project.
Much of the material was added to Google over the past few months. The new service allows searchers to search specifically against just the academic material.
Opening Up Invisible Content
Google has worked with publishers to gain access to some material that wouldn’t ordinarily be accessible to search spiders as it’s locked behind subscription barriers.
For example, in a search for “search engines,” the current fifth site listed is a paper called “ProFusion: Intelligent fusion from multiple, distributed search engines.” That paper is only available to those with password access to material within the Journal of Universal Computer Science, which comes with a subscription.
Normally, such material would never be spidered by search engines like Google, so the material would remain “invisible” to Web searchers. But Google made arrangements with publishers to get into these password-protected areas.
The advantage for searchers is suddenly having much greater ability to locate material that may be of interest. It also means actually trying to read the full text of such documents, which Google does index, will only be possible for those with relationships with the publishing sites. Google says it doesn’t earn money from any new subscriptions generated between searchers and publishers.
This may lead to problems for some searchers. In the above example, not only could I not read the paper, as I don’t have a subscription, I also couldn’t read so much as an abstract. Instead, a password-prompt continued to appear, even when I cancelled it, making it extremely difficult to close the window.
This is probably unusual. One Google requirement for inclusion in Google Scholar is for publishers to display abstracts to searchers.
The special publisher access flies in the face of Google’s anti-cloaking policy. Google is shown material regular users wouldn’t normally see, its own definition of cloaking. It’s a good thing for searchers, but the company must amend its cloaking policy so as not to appear hypocritical.
Indeed, that’s long overdue. It’s has been a problem since I first reported a similar issue earlier this year. One possible fix is for Google to finally move forward with formalizing such programs for all publishers.
Citation Extraction and Analysis
When spidering the content, Google works to understand who the authors of the papers are, as well as the formal titles of papers and other documents citing the material. These citations are a key part of the special ranking algorithm Google Scholar uses.
Google says citation extractions allow it to see connections between papers, even if the connections aren’t made via links. As a result, it can use citation analysis to try to put the best papers at the top of results.
Next to each paper listed is a “Cited by” link. Clicking this shows the citation analysis in action; all the pages pointing at the original one listed through textual citations are displayed. For example, “A technique for measuring the relative size and overlap of public Web search engines” lists 135 citations Google knows about through Google Scholar.
The same paper might be hosted in more than one place, of course. In such instances, Google picks what it believes is the best version and links to other versions after the paper’s description.
In some cases, the material isn’t actually online. Google may know about a paper only through references it’s seen on other papers. In these cases, Library Search and Web Search links appear next to the paper’s or book’s title.
Library Search provides a means to see if a local library near you carries the paper or book, through the Open WorldCat program. This is the same system recently integrated into a special version of the Yahoo Toolbar launched last week.
Web Search generates a Google Web search to help find more information about the material across the entire Web.
More on the program can be found on Google Scholar’s About page.
Driving New Traffic to Libraries?
On ResourceShelf, Shirl Kennedy and Gary Price co-authored another look at the program that’s well worth a read. They love it, despite the fact it contains some material they consider not quite scholarly. They also mention other citation tools and how much of this material is already available to the public, if only the public knew to visit libraries.
That’s likely another key Google Scholar feature. Sure, some material may already be available. But if the public doesn’t realize it, it remains invisible. More and more, people turn to search engines to access all types of information. This, ironically, may raise more awareness and use of libraries as an offline research resource.
That’s even more likely as Google’s competitors, like Yahoo, follow suit. Yahoo already has long-standing ties to gather material from academic publishers through its Content Acquisition Program. What Yahoo doesn’t currently provide is a specialized way to search just this material. It’s likely, in my view, this will come.
Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.
Google’s official slogan is “Don’t Be Evil”, but it’s long been rumoured that the company has a second, internal motto that they ... read more
A report by Ofcom has found that just 60% of adults can realise that PPC ads in search results are in fact ... read more
By optimizing your website for Google, you could be sabotaging your site for Baidu in China and Yandex in Russia and Eastern Europe.