Beyond HTML: Security Concerns With Google

You want them to find your site, but new search capabilities make users vulnerable to viruses.

Author

Danny Sullivan

Date published December 26, 2001 Categories

Now that Google is indexing a wide range of document types beyond HTML and plain text formats, potential security concerns are cropping up, both for searchers and Webmasters.

From the searcher point of view, the concern is that you might unwittingly open yourself up to viruses that are embedded in non-HTML files, such as Word macro viruses.

Until recently, search engines only delivered you to “safe” HTML or text files. It was possible that even these type of files might try to harm you, such as via JavaScript exploits. However, anyone who browses the Web is already exposed to such potential threats routinely and generally doesn’t have problems.

In contrast, people do not routinely open data documents such as Word or Excel files from those they do not know. Google has changed this, because its search results now contain direct links to such files from across the Web. These direct links mean that users might unwittingly open infected files.

For example, try a search for “clearcutting and fish populations in idaho.” The second result is an oddly named document called “Clearcutting in.” If you were to click on this link, instead of the document loading in your browser, your computer would instead launch Microsoft Word (assuming you have it installed).

This is because the link leads to a .doc file, a data file used by Microsoft Word. Such files can contain viruses, and if you open one without protection you’d be exposed to any virus inside.

The safe alternative is to always view such results using the “View as HTML” link that Google provides. You’ll see this link any time Google lists a non-HTML or text format file. By following it, you will be shown a safe, HTML version of the listing in your browser.

Ideally, Google would switch things around. I think by default the main link should bring up the safe HTML version while the “View as HTML” link would instead say something like “View Original File Type.” That would greatly reduce the odds of searchers getting accidentally infected by a virus. Google says it’s something it’ll consider.

“We’re going to continue to take a close look at this because, as you know, our users and their experience with Google is our number-one priority,” said spokesperson David Krane.

Krane also said that Google is noticing that when non-HTML content is offered, many users are opting to use the “View as HTML” choice. Aside from avoiding viruses, another good reason to do this is because the HTML versions are typically smaller than the actual data files, which means they load faster.

Another important point to note is that while the potential for viruses to hit searchers exists, the reality is that this doesn’t seem to have actually happened.

“We’ve yet to see email from any of our users complaining about computer viruses that they obtained via our search results,” Krane said.

Meanwhile, some Webmasters are reportedly shocked to discover that Word documents, Excel files, and other material they make available through public Web sites can now be found by searching at Google. There’s even the further concern that some of these documents might contain sensitive information, such as credit card numbers or password information.

The reality is that Google hasn’t created a security problem with these documents. It has simply exposed them. Any document that is made available on an Internet server (be it Web, FTP, Usenet, etc.) can be found by anyone. People can (and do) even create their own spiders to seek documents of particular types, such as email harvesters that roam the Internet in search of email addresses.

If a document is sensitive, don’t place it on the Internet, period. What if you must expose it to the Internet, so that selected individuals outside your company or organization can access it? Then establish a password protection or “authentication” system for your Web server, and make these documents only available to those who have a username and password.

Authentication systems will stop crawler-based search engines in their tracks. It’s an even better solution than using a robots.txt file, because listing sensitive data that you don’t want indexed by a spider in your robots.txt file is essentially a menu for any human who reads the file to find that information. An authentication system reveals nothing, and it has the added plus of keeping humans out as well.

Keep this in mind: None of the major search engine spiders will try to access authenticated information. However, a custom spider or a nefarious human may still try to hack in. Authentication is a barrier to them, but not absolute protection.

Subscribe to get your daily business insights

More about:

Read the next article

Explore Tech Talks

Lucy

Lucy helps organizations leverage knowledge for in... View Tech Talk
TVSquared

TVSquared is the global leader in cross-platform T... View Tech Talk
Grata

Grata is a B2B search engine for discovering small... View Tech Talk

Whitepapers

US Mobile Streaming Behavior

Whitepaper | Mobile

US Mobile Streaming Behavior

Streaming has become a staple of US media-viewing habits. Streaming video, however, still comes with a variety of pesky frustrations that viewers are ...

View resource

Winning the Data Game: Digital Analytics Tactics for Media Groups

Whitepaper | Analyzing Customer Data

Winning the Data Game: Digital Analytics Tactics for Media Groups

Winning the Data Game: Digital Analytics Tactics f...

Data is the lifeblood of so many companies today. You need more of it, all of which at higher quality, and all the meanwhile being compliant with data...

View resource

Learning to win the talent war: how digital marketing can develop its people

Whitepaper | Digital Marketing

Learning to win the talent war: how digital marketing can develop its peopl...

Learning to win the talent war: how digital market...

This report documents the findings of a Fireside chat held by ClickZ in the first quarter of 2022. It provides expert insight on how companies can ret...

View resource

Engagement To Empowerment - Winning in Today's Experience Economy

Report | Digital Transformation

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Customers decide fast, influenced by only 2.5 touchpoints – globally! Make sure your brand shines in those critical moments. Read More...

View resource

Mastering voice search optimization: Talk like a local, rank like a pro

Search Marketing

Mastering voice search optimization: Talk like a local, rank like a pro

1m ClickZ News Staff

Mastering voice search optimization: Talk like a l...

Forget typing, voice search is booming. Businesses need Voice Search Optimization (VSO) to rank for conversational queries and secure top spots in sea...

View article

How to Create Impactful SEO Reports that Drive Business Success

2m ClickZ News Staff

How to Create Impactful SEO Reports that Drive Bus...

Wielding graphs and analytics has its place. But to truly capture executive attention in today’s impatient digital arena, we must step into the shoes ...

View article

How Google's Search Generative Experience (SGE) is Reshaping SEO

2m ClickZ News Staff

How Google's Search Generative Experience (SGE) is...

As the search giant delves deeper into the realm of artificial intelligence (AI), it is clear that SGE will have a profound impact on the future of SE...

View article

The secrets to getting the best SEO traffic without even ranking

11m Daniel Tannenbaum

The secrets to getting the best SEO traffic withou...

Did you know that there are ways to get to the top of Google without ranking your own site? You can still get lots of good organic traffic using alter...

View article

How SEO is changing because of ChatGPT

11m Daniel Tannenbaum

How SEO is changing because of ChatGPT

When ChatGPT was introduced in 2022, it changed the internet. Today, we speak to some startups and experts to understand how ChatGPT is changing SEO R...

View article

Winning at search: why vigilance and strategy alignment are necessary evils

Data-Driven Marketing

Winning at search: why vigilance and strategy alignment are necessary evils

11m Prasanna Dhungel

Winning at search: why vigilance and strategy alig...

As brands and agencies struggle to prioritize visibility of ever-changing SERP features, here's how they can build effective, holistic search strategi...

View article

What role does page speed play for SEO?

SEO

What role does page speed play for SEO?

1y DebugBear

What role does page speed play for SEO?

Page speed has been a ranking factor for a long time, but it has increased in importance over the last two years. Learn about Google’s Core Web Vitals...

View article

iOS 14 uncovers measurement vulnerabilities for business

322023

iOS 14 uncovers measurement vulnerabilities for business

1y Jamie Bolton

iOS 14 uncovers measurement vulnerabilities for bu...

How will marketers handle the advertising industry upheaval in regard to data and measurement? Read More...

View article

Follow us

Beyond HTML: Security Concerns With Google

Subscribe to get your daily business insights

Read the next article

Explore Tech Talks

Whitepapers

Whitepapers

US Mobile Streaming Behavior

US Mobile Streaming Behavior

Winning the Data Game: Digital Analytics Tactics for Media Groups

Winning the Data Game: Digital Analytics Tactics f...

Learning to win the talent war: how digital marketing can develop its peopl...

Learning to win the talent war: how digital market...

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Related Articles

Mastering voice search optimization: Talk like a local, rank like a pro

Mastering voice search optimization: Talk like a l...

How to Create Impactful SEO Reports that Drive Business Success

How to Create Impactful SEO Reports that Drive Bus...

How Google's Search Generative Experience (SGE) is Reshaping SEO

How Google's Search Generative Experience (SGE) is...

The secrets to getting the best SEO traffic without even ranking

The secrets to getting the best SEO traffic withou...

How SEO is changing because of ChatGPT

How SEO is changing because of ChatGPT

Winning at search: why vigilance and strategy alignment are necessary evils

Winning at search: why vigilance and strategy alig...

What role does page speed play for SEO?

What role does page speed play for SEO?

iOS 14 uncovers measurement vulnerabilities for business

iOS 14 uncovers measurement vulnerabilities for bu...