Site-specific stop words in Google: what they tell us about the indexation quality

This is the 4th article in the Googleometry Project Series.

Saturation is usually defined as the number of indexed pages in a website. However, supplemental results can be a significant part of the indexed pages, with no ranking value whatsoever. So, a deeper analysis of saturation and indexed pages is needed.

We define 3 kinds of poorly indexed pages:

– Foreign Pages: pages not assigned to any known language, so it show only if the searcher uses “all the Web” in the Language Preferences.

– Pages non associated to keywords: the only appear in the listings when you request site:domain.com, but they have no keywords associated with them. So, they are useless.

– Pages in the Reduced Indexation Set. Those pages are shown when a Stop Word appears in the search query. This indexation is probably limited to the page Title alone.
We researched the effect of combined searches such as:

site:domain.com keyword1 OR keyword2

Experiments were performed along several days, but data sets need to be obtained in the same day, because there is some day-to-day variation.

We found these consistent indicators of website indexation quality:

– number of pages within English filtered pages, versus pages for all the Web. This setting is modified in the Preferences section of Google. Google not only supplies English pages, but also quality-filtered pages. Most pages in any Web search are discarded in the English-only search, although they are in perfect English. These works equally for Spanish or French pages.

For some reason, the English searches tend to place the Supplemental results inside a link, the well known: “In order to show you the most relevant results, we have omitted some entries….”, while the Web searches directly add the Supplementals at the end of the regular organic results. 

Stop Words are specific for each site.

Ask us for specific experiments that you want us to run…

Please bet on this article final Diggs and win…

Do you know how to Digg this article? It is easy enough: Go to Digg.com, get an account (only need is email) and vote for it. This is the URL:

Digg here

The number of Diggs in the first 24 hours is critical for the article positioning in Digg. The most voted articles go to the top, and eventually make it to the front page, gaining lots of visibility.

However, Digg.com and the other sites using a similar engine, are very hard to tame. Most of the incoming thousand articles per hour get no diggs, and are quickly sent to nowhere land and forgotten.

The article needs to be very attractive and well written to achieve some readership.

The positive aspects is that it is FAST. You can climb from non-existance to fame in 10 minutes. And popular, because there are millions of readers that come every day to the site to check the news.

I have tried in this blog several strategies to seduce the Diggers. Or the Top Diggers, those users with good in-site reputation (sometimes called Karma), and heavy voting power. The Digg-like News Aggregators are non-democratic: some users have valuable votes, while most of them have negligible voting power.

This article uses bets, a popular way to draw public. Second only to naked girls, which was the subject of my previous post in DomainGrower.com…

So, you need to bet on How Many Diggs will This Article have 72 hours from now, this is, Saturday December 1st, at 2 PM London time.

Your votes should be Comments to this article.

You are not allowed to artificially inflate Diggs.

The winner will get 3 Digg accounts, at least 1 year old, and 10 Diggs to his/her own story.

Why I have 2 accounts in each of these 80 social networks

I want to be able to test every social network for its promoting power for my stories. Some of them are going to be more receptive than others, depending on its size, difficulty, subject and to the importance they give to old, reliable accounts.

Social networks give a value to each user, sometimes called ‘karma’, and that value is useful for promotion of stories, either the own user stories or stories from ‘friends’ and strangers.

I am sticking to 2 accounts per network because it is well known that they detect some features that could point to spam, namely IP. Of course, IP can be defeated by using a navigation proxy, but that needs information, expertise and a potentially self-destructive desire to spam the sites. The second account is used if the first one loses value, or to start polemic discussions that are often followed with more attention.

I read about a “snowball” effect while promoting stories in the Digg-like sites, started from a minor network, where it should be easier to get noticed, and bringing users/friends/voters to the other sites. It would be useful if the home of your stories included the links pointing to the other social networks where the visitor can vote you. For that, I included a couple of plugins in my WordPress blog.

Stories are improved by user feedback and testing as they pass thru networks.

I am starting to test the power of this promotion technique, not too fast because I need my accounts to be mature enough. An account is mature when it had some time and healthy activity in the networks. As in real life networks, you cannot arrive, post your story and expect everyone admire you.

It is also good if your stories refer to the same subject, and if you develop virtual ‘friends’ that show their trust in you. It is important to complete a profile and include a photo.

This is the partial list of the networks where I am now. If this story gets enough Diggs, Propellers, Reddits, and so on, I plan to add the age, votes and karma of all the accounts, to help value them.  Continue reading “Why I have 2 accounts in each of these 80 social networks”

I want to get my ideas accross the Web, and make money from them…

I have been studying the way to communicate my good bizz ideas across the Web, and maybe find a buyer, a partner, an investor or other kind of supporter.
I have some expertise in SEO, so I can rank my sites quite well in Google and Yahoo. However, there are social networks that are faster and probably more targeted.
So, I am experimenting with Digg, Meneame and many others.
It is not easy to get a news promoted by those sites, unless you have a lot of time to spend increasing your karma.
This is done by reading many news every day and voting the best ones.

The links that I obtain by publishing in those sites are very helpful for my medium-term efforts of ranking into Google. So, both strategies are concurrent.

Our AlgoCracker Software

We are preparing different experiments with the current version of AlgoCracker. We first want to establish the value of different TLDs on Google ranking. If non-standard TLDs are not discriminated against, domains with keywords should be very valuable for ranking purposes.

The problem with TLDs from different countries is that we cannot establish the value of page quality. The .com pages tend to be of better quality, because they are more demanded and thus more expensive.

Which keyword should be analyzed? It should be a keyword of neutral value to the TLD. 

To solve that, we will use the names of the capitals of countries.

More later.

What is Algo Cracker?

It is a module in a software that used to be named Keyword Thief. This product started as a spider that extracted keywords from the top-ranked sites in Google.

In a second stage, it incorporated 3 modules:

– Metatag Thief, which does the same as Keyword Thief, but for all metatags

– Algo Cracker, an analyzer of keyword density in different parts of the Google top-ranked websites. It uses ranges (clusters) of pages, to allow averaging and comparing

– Metatag Analyzer, a spider that enters a single site, and checks the contents of all metatags.

This products are now sold together, but can later evolve separately.

See http://keywordthief.com.ar for details and free download.