Monday, December 11, 2006

Google Tag To Remove Content Spamming

Content spamming, in its simplest form, is the taking of content from other
sites that rank well on the search engines, and then either using it
as-it-is or using a utility software like Articlebot to scramble the content
to the point that it can't be detected with plagiarism software. In either
case, your good, search-engine-friendly content is stolen and used, often as
part of a doorway page, to draw the attention of the search engines away
from you.

Everyone has seen examples of this: the page that looks promising but
contains lists of terms (like term - term paper - term papers - term limits)
that link to other similar lists, each carrying Google advertising. Or the
site that contains nothing but content licensed from Wikipedia. Or the site
that plays well in a search but contains nothing more than SEO gibberish,
often ripped off from the site of an expert and minced into word slaw.

These sites are created en masse to provide a fertile ground to draw
eyeballs. It seems a waste of time when you receive a penny a view for even
the best-paying ads - but when you put up five hundred sites at a time, and
you've figured out how to get all of them to show up on the first page or
two of a lucrative Google search term, it can be surprisingly profitable.

The losers are the people who click on these pages, thinking that there is
content of worth on these sites - and you. Your places are stolen from the
top ten by these spammers. Google is working hard to lock them out, but
there is more that you can do to help Google.

Using The Antispam Tag

But there is another loser. One of the strengths of the Internet is that it
allows for two-way public communication on a scale never seen before. You
post a blog, or set up a wiki; your audience comments on your blog, or adds
and changes your wiki.

The problem? While you have complete control over a website and its contents
in the normal way of things, sites that allow for user communication remove
this complete control from you and give it to your readers. There is no way
to prevent readers of an open blog from posting unwanted links, except for
manually removing them. Even then, links can be hidden in commas or periods,
making it nearly impossible to catch everything.

This leaves you open to the accusation of link spam - for links you never
put out there to begin with. And while you may police the most recent
several blogs you've posted, no one polices the ones from several years ago.
Yet Google still looks at them and indexes them. By 2002, bloggers
everywhere were begging Google for an ignore tag of some sort to prevent its
spiders from indexing comment areas.

Not only, they said, would bloggers be grateful; everyone with two-way
uncontrolled communication - wikis, forums, guest books - needed this
service from Google. Each of these types of sites has been inundated with
spam at some point, forcing some to shut down completely. And Google itself
needed it to help prevent the rampant spam in the industry.

In 2005, Google finally responded to these concerns. Though their solution
is not everything the online community wanted (for instance, it leads to
potentially good content being ignored as well as spam), it does at least
allow you to section out the parts of your blog that are public. It is the
"nofollow"
attribute.

"Nofollow" allows you to mark a portion of your web page, whether you're
running a blog or you want to section out paid advertising, as an area that
Google spiders should ignore. The great thing about it is that not only does
it keep your rankings from suffering from spam, it also discourages spammers
from wasting your valuable comments section with their junk text.

The most basic part of this attribute involves embedding it into a hyperlink
as . This allows you to manually flag links, such as those embedded in paid
advertising, as links Google spiders should ignore. But what if the content
is user-generated? It's still a problem because you certainly don't have
time to go through and mark all those links up.

Fortunately, blogging systems have been sensitive to this new development.
Whether you use Wordpress or another blogging system, most have implemented
either automated "nofollow" links in their comment sections, or have issued
plugins you can implement yourself to prevent this sort of spamming.

This does not solve every problem. But it's a great start. Be certain you
know how your user-generated content system provides this service to you. In
most cases, a software update will implement this change for you.

Is This Spamming And Will Google Block Me?

There's another problem with the spamming crowd. When you're fighting search
engine spam and start seeing the different forms it can take - and,
disturbingly, realizing that some of your techniques for your legitimate
site are similar - you have to wonder: Will Google block me for my search
engine optimization techniques?

This happened recently to BMW's corporate site. Their webmaster,
dissatisfied with the dealership's position when web users searched for
several terms (such as "new car"), created and posted a gateway page - a
page optimized with text that then redirects searchers to an often
graphics-heavy page.

Google found it and, rightly or wrongly, promptly dropped their page rank
manually to zero. For weeks, searches for their site turned up plenty of
spam and dozens of news stories - but to find their actual site, it was
necessary to drop to the bottom of the search, not easy to do in
Googleworld.

This is why you really need to understand what Google counts as search
engine spam, and adhere to their restrictions even if everyone else doesn't.
Never create a gateway page, particularly one with spammish data. Instead,
use legitimate techniques like image alternate text and actual text in your
page. Look for ways to get other pages to point to your site - article
submission, for instance, or directory submission. And keep your content
fresh, always.

While duplicated text is often a sign of serious spammage, the Google
engineers realize two things: first, the original text is probably still out
there somewhere, and it's unfair to drop that person's rankings along with
those who stole it from them; and second, certain types of duplicated text,
like articles or blog entries, are to be expected.

Their answer to the first issue is to credit the site first catalogued with
a particular text as the creator, and to drop sites obviously spammed from
that one down a rank. The other issue is addressed by looking at other data
around the questionable data; if the entire site appears to be spammed, it,
too, is dropped. Provided you are not duplicating text on many websites to
fraudulently increase your ranking, you're safe. Ask yourself: are you using
the same content on several sites registered to you in order to maximize
your chances of being read? If the answer is yes, this is a bad idea and
will be classified as spamdexing. If your content would not be useful to the
average Internet surfer, it is also likely to be classed as spamdexing.

There is a very thin line between search engine optimization and spamdexing.
You should become very familiar with it. Start with understanding
hidden/invisible text, keyword stuffing, metatag stuffing, gateway pages,
and scraper sites.

About The Author:
http://www.chauy.com/2006/07/googles-tag-to-remove-content-spamming/

No comments:

Blog Archive