|
Attention Webmasters and Web Site
Owners
You have been deceived!
Most of the information on SEO
web sites and forums about hidden
text on web pages is not true!
Search engines have the technology to detect hidden text,
but they do not cover 100% of all
possible ways to introduce hidden
text. This is simply because it's
computationally expensive for search
engines to analyze all pages for all
kinds of SEO spam.
Instead,
Search engines rely on
general public to report spammy
sites. Sites using nontrivial
techniques to hide text are reviewed
and penalized manually - only when
reported to search engines.
We provide
no matter what tricks were used to
introduce it! We encourage you to
try our service to look for hidden
text on your and competitors' sites.
We believe that our service is
unique. It uses sophisticated
OCR
technology to actually "look" at the
screen and to compare it with the
actual HTML. Even though this
technique does not yield
instantaneous results, the generated
report will even cover such texts
that some people have trouble
reading because of poor choice of
colors/backgrounds.
|
What is hidden text on web pages?
People want their web sites
to rank highly and to respond to
many more keywords than others.
Unfortunately, building search
engine popularity takes time, and
good placement is never guaranteed. |
 |
To speed things up, people
sometimes hire SEO
(Search Engine Optimization)
consultants. This is
where things can get ugly.
Many SEO consultants use
questionable techniques that often
fall in the realm of SEO spam.
SEO Spam may only yield
temporary results. Long term effects
are either reduction in page rank or
complete removal from search index.
If your site is found to be spammy
and got reported by the competition
the chances are it will be
completely de-listed. |
Why hide text from user's view?
The goal is to make search
engines index as much content as
possible. That is, to associate as
many keywords as possible with a
particular site. For example, if a
site offers cell phone accessories
and someone is looking for a
particular accessory for a
particular device it would be a good idea
to bring a potential customer to a
web page that lists all cell
accessories available for sale. However, a web
page listing all accessories for all
devices would be unreadable. |
Hidden text on web pages for the
purpose of deceiving search engines
is illegitimateSearch engines are in
business of providing accurate
search results. Taking users to
irrelevant sites hurts search
engines' image.
This is why all major search engines
have a way of reporting SEO spam
with hidden text being one of the
major offenses. |
Search engines are perfectly
capable of detecting sites that use these techniques
Search engines analyze various
aspects of data
they index. Some of this analysis is
dedicated to spam detection. SEO
Spam detection is a computationally
tractable problem.
Different spam techniques, however,
require different computational
resources. |
What does it take to effectively
detect a spammy page?
SEO Spam techniques usually fall
into one of two problem domains
Space domain
|
A page needs to be
compared with other pages on
the internet. Examples
include detection of cloned
sites. This is not very CPU
intensive, but the
complication is to be able
to scan though a large index
which is bound by available
memory (disk).
|
Effectiveness
improves with
|
Ability to
parallelize |
|
Available memory |
Excellent |
|
Complexity domain
|
Structure analysis is
performed on a single web
page. The page is
reconstructed in memory
along with its corresponding
scripts and stylesheets.
Hidden and barely visible
text analysis is performed.
This is a very CPU intensive
operation, and to yield good
results it requires a lot of
CPU cycles.
|
Effectiveness
improves with
|
Ability to
parallelize |
| CPU
speed |
Poor |
|
|
What techniques are used to put
searchable, but not visible text on
a web page?
In the past, webmasters simply
matched the color of the text with
its background. It was possible to
see this text with a simple Ctrl+A
(select all). This is no longer
viable simply because there are too
many ways of hiding searchable text.
More >> |
Detecting hidden text efficiently
is a hard thing to do
Because there are so many
ways to introduce hidden text in
modern browsers search engines face
an increasingly hard problem of
keeping track of them. Only some of
the more obvious techniques are detected
easily. Most others require complex
algorithms that take a lot of CPU
cycles. |
Not every page is scanned for
hidden text
Because of CPU constraints it is
only practical to analyze pages
occasionally. That is, a set of
factors will determine whether a
site will be analyzed for possible
spam content, and to what degree.
These are somewhat complex
probability rules that are based on
many factors.
- The exact rules and their
inputs are kept secret
- They often change with
software updates
- They are optimized for speed
while giving few false-positives
- They are trying to keep up
with browser capabilities to
hide spam
- They are steadily becoming
more sophisticated as search
engine companies hire more smart
people
|
I got my site optimized by
someone else. How do I know it's
free of hidden text?
The textual content of a page (visible
HTML when you do
'view source') should not exceed
what you see on the screen. Keep in
mind that the content does not have
to be pretty formatted for viewing
by a human. Often, the majority of
content can appear outside the
normal viewing area of a notepad
which is approximately 120
characters wide when maximized. Make
sure you have "word wrap" turned on
and have checked the HTML content in
its entirety.
Or you can try our
service >> |
|
|