Algo Crack -- This blog had a very old platform and is being moved to Web Promotion Service Blog

September 8, 2008

These business proposals are Incredibly Smart or Incredibly Stupid, with nothing in between

I happen to conceive ideas that seem maybe to good to me. Seems I could be missing something huge.
For instance, I want to raise termites, feed them with disposed cellulose (paper, wood, crops remains) and use the collected termites to feed chicken, pigs and fish. See Termites Blog at http://netic.com.ar/termites .
I also want to invent Own Ink, ink with individually formulated chemical tracers, for extra safety in signing or personal printing. See http://OwnInk.com
Finally (for now), I want to create Bendable Trees, for easy fruit collection. See http://netic.com.ar/BusinessBrokering/bendable.htm
Similar strange ideas are published in http://business-ideas.com.ar
General0 Comments/Trackbacks

August 26, 2008

Blog coding and SEO

For some time we used this old blog code with decent results. However, it proved to be a bad election, because the product was discontinued and everyone seemed to be moving to Wordpress. Wordpress has a number of advantages, and thus we moved to it 4 years ago. This old blog was scarcely used, and all visitors are encouraged to move to the new Promotion Blog (see menu on the top right and link in the header).
The urls in our new blogs are keyword, not number oriented. The new blog has a save feature, to avoid being disconnected from the scrap screen and losing the drafts. It has many plugins available, original templates and page-related metatags.
Consider changing to Wordpress if you keep any old or 3rd. party-hosted blog. Ask us about it.
0 Comments/Trackbacks

September 24, 2007

How Does Copyscape find Plagiarism?

Copyscape is the leader in plagiarism detection in the Web. It keeps a Web index almost as extensive as Google's, and users can compare any text against its index for free. Plagiarism is instantly flagged.

Notice Copyscape and Google detect content duplication, not necessarily Plagiarism. A judge in an Intellectual Property Court could not necessarily agree with the algorithms used by Copyscape or Google. Justice would use more subjective and often blurry criteria.

We set out to determine how the Copyscape Algorithm works. It was necessary for our web content generation operations, that tend to be more or less automated. Unless this article, who took 7 days to research and 2 days to write, our standard web content is generated in minutes and by the kilogram.

Beating Copyscape is not exactly equal to avoiding plagiarism, but it takes care of the main enemy of Web Content mass-producers. If Copyscape does not detect your content duplication, chances are nor Google nor the original owner will.

How were the experiments

We took indexed texts from the Web and started to modify them, one at a time. We took strings of different lengths and substituted synonyms for common words, manually or with a synonymizer software.

There were two substitution modes: global, where a long text suffered replacements in 15 to 35% of its words, and short-ranged, where one every 3-8 words was replaced.

The edited texts were uploaded in a server for Copyscape testing. We overcame the free service limitation of 10 tests per month by switching domains.

Copyscape watches short text strings

We used Phrase Mixer, one of the synonymizer software features, to see if copyscape is cheated by phrase scrambling. It is not.

Synonym replacement is a good alternative because it does not destroy the meaning of the phrase. However, any word will do. Copyscape does not discriminate between a meaningful replacement and a senseless one. Anything that prevents a 4-6 word text string from being an exact duplication of another prevents infringement. You can use any word or letter, but punctuation marks will not do the trick.

It is even possible to insert a small black i over almost black background, every 3 words in a long plagiated text and Copyscape will not notice it. There could be a problem with Google, because that technique has been used in the past by SEOs for keyword-stuffing purposes, and it is now against the Google TOS. However, if the colors are not identical there should not be a problem.

Numbers can be used to mask the identity of a text, replacing the letter 'i' for the number'1', 'o' for '0', 'G' for '9', 'g' for '6' or even 's' for '5'. But keep in mind that you still need to replace one every 3-4 words.

Short vs. long text

Copyscape starts looking for plagiarism in texts longer than 14 words.

If the duplicated text is more than 70% of the total document, you will need 1 every 6 words replaced. It your duplicated (borrowed, stolen, pirated, plagiarized) text is over 70% of the total web page, you will need 1 every 4 words replaced.

More experiments to follow...
0 Comments/Trackbacks

September 9, 2007

Web Promotion Experiments

Being in the Web Promotion field since 2000 or before, I receive lots of promotions for new products. Every one of them promises "Instant google ranking", "Fast money making", "Be First or your money back", "Overnight Targeted Traffic" and the like. However, most of them fail to comply with the big expectations.
Many times I feel tempted to try them, and I do some exploring in the SEO and SE Marketing forums. Most experts vote against the magic products, but few of them have actually tried.
I did not find any dedicated "Promotion Tool Testing Lab" that will buy the product, try it, and inform the subscribed audience about the results.
The main impartiality condition should be that the Lab would not be affiliated with the product. And raw results be made public.
I proposed an Experimenter Meeting Point in an earlier posting here, and I am running several SEO experiments with our dedicated Algo Cracker tool. We have some interesting raw data and preliminary findings, partly described on Algo Cracker.
I have now a concrete experiment to perform. I intend to buy a product named Atomic Blogging, http://www.atomicblogging.com/.
This product, with excellent graphic design and strong marketing talk, offers 'a solution to post articles on any blogs and instantly update them with all major Web 2.0 sites' (exclamation signs removed). Apparently it is a complete compilation of all the sites where you can ping your news and publish a short notice about your blog or latest blog posting.
It costs about 50 dollars, which is not such a large amount. However, I would say 95% of these products are scams, and chances are you will waste your money and 5 hours of your time. For that reason, I want to do the experiment with your help, and report the result to those who helped me cover the costs.
I will make a full description of how this product works, show my own promotion experiments and draw the conclusions.
This report will be U$D 5 before buying the product, with a 20 day delay until the results are obtained. Or ten times that value afterwards.
Check Webprom Testing Labs for the prices, description and status.

Write us at Domain Grower contact if you are interested.
0 Comments/Trackbacks

September 4, 2007

A statistical approach to SEO

When trying to solve the site ranking logic used by search engines, we tried several tools. For keyword density, market leader IBP analyzes the top ten ranked sites under a certain keywords. By comparing the top ranked sites with the site to be ranked, the webmaster obtains useful data.
IBP analyzes keyword density in all parts of a web page: metatags, alt text, body text, header text, from H1 to H6, anchor text and other parts.
However, we felt that this was cumbersome, restricting and not very effective. While some factors pointed out by IBP are true, there are several that are not. Or maybe they were true at some time, but Google and Yahoo changed their algorithms to avoid copycat sites in their top results (SERP).
Our approach was to create a similar product that would consider not only the top ten ranked sites, but all of them. This is 1000 sites for Google, because beyond that is tricky (but not impossible) to get SERPs.
The huge amount of data obtained in that way needed a statistical tool in order to produce useful information. Thus, we prepared our tool to take page ranges and calculate averages. Pages ranked in SERPs 1-10 are compared to 20-30, 40-50 and so on. In that way, we can find those keyword densities that correlate with rank.
Our tool answers questions such as:
- are keywords in the URL important for ranking?
- in the domain?
- in the subdomain?
- in the page name?
- in the different metatags?

Other interesting questions that can be answered with Algo Cracker:

- Which are the optimal keyword densities for the different parts of the web page?
- Does Google penalize/favor sites that use non-standard TLDs?
- Does Google penalize/favor sites that any string of text or code, like .php, .css, Javascript?


The results are beginning to flow. We have now hard data to prove or discard SEO myths or facts, and we are ready to discuss them with other SEO professionals.

We can offer custom raw data to those who have the math ability to extract useful conclusions from them and share them with us.

We also intend to publish some Excel data in this blog, for those who see money at the other end of this thread...

Check Algo Cracker.
0 Comments/Trackbacks

August 24, 2007

Our Algo Cracker already brings useful data

Our Beta version of Algo Cracker is already spidering the Web and bringing us useful data.
We are able to select differently ranked sets of pages from Google, analyze them, average keyword density values, and show them in Excel format.
After checking keyword densities in different parts of the pages (domain, subdomain, directory, file name, metatags, page body) we know the optimal values associated with top rankings, medium or lousy ones.
We also drew some conclusions about where keywords are important and where not.
Stay tuned.
0 Comments/Trackbacks

August 17, 2007

Experimenter Meeting Point

Experiments are usally hard to do without a significant structure. Even while simple experiments can be performed by an individual, significant results are only obtained by qualified individuals or teams, with good planning, coordination and result analysis.

A succesful experiment needs a good team, and assembling one is complicated. Experts are not always available for hiring, and a new approach, as it is often needed, might require a whole new set of specialized technicians or workers.

Measuring the quality and appeal of the main idea is paramount for success. If the idea does not appeal to enough powerful people, the experiment dies before maturity.

I propose a Web platform for Experiments, where the promoter will define a project, the requirements, the cost and the expected results. If enough people sign up for participation, the experiment is born, and after the predefined time, the results are published.

The sponsors can provide funding, expertise, or simply approval votes. Votes help the experiment climb to the top of the list. Succesful experiments provide good rankings to those who voted for the experiment. They also provide their full results to those who actually participated in the experiment, and a share of the profits, if that was part of the agreement.

As some people play in virtual Wall Street sites, buying and selling virtual shares, some other will vote for their favorite experiments and help them be born.

Experiments can be of all sorts: technical, commercial, scientific, Web, medical, financial.

This data will be included for every new experiment:

Name - Category - Description - Requirements - Timeframe - Short Result (public) - Long result (private).

As the e-Xperiment is born, new fields are added:

Voters - Contributors - Contributions - Feedback

When the experiment is finished, the Short Result is published in the site, and those Voters who guessed the outcome are mentioned and see its ranking improve. At the same time, the contributors receive the Full Report, and maybe a partnership is born in order to further develop the e-Xperiment.

I recently read an article about the revolutionary news site, digg.com, being developed for only $200.

The person who conceived the idea found a developer in a programming marketplace. This is an example of a succesful experiment, like many others in the web. However, the list of failed Web experiments is so long that I can myself fill a book. (Actually, I already did. See my Cibernegocios (CyberBusiness) book, at http://cibernegocios.netocios.com).

I always wanted to set up a virtual laboratory to test new website promotion tools. Those tools come up very often, and they promise huge results with little cost and effort. A few of them are probably useful and worth its cost. But someone should test them first.

At this point I am not sure on how expensive can this e-Xperiment.com website be. Probably under $5000, and it will be ready in 45 days. Those how participate in this First Experiment of the e-Xperimenter.com site, namely the creation of the site, will own 50% of it. Write me if you want to join in.

------
The closest approach to this idea is a News Aggregator, like Digg.com or Meneame.net, only for business ideas. See our own implementation already alive: . Business-ideas.com.ar
Many ideas there are the ones originally published here.
1 Comments/Trackbacks

May 31, 2007

New Study on Google Ranking Factors

I read this nice article posted by L.Odden on May 22 in:
http://www.toprankblog.com/2007/05/new-study-on-google-ranking-factors/

Which page elements offered the most influence on rankings

  • Keywords in the title tag

  • Targeted keywords in the body tag

  • Keywords in H2-H6 headline tags seem to have an influence on the rankings while keywords in H1 headline tags don't seem to have an effect.

  • Using keywords in bold or strong tags - slight effect

  • Keywords in image file names

  • Keywords in image alt attributes

  • Keyword in the domain name - although, using domain names as link text may explain this

  • Web pages that use very few parameters in the URL (?id=123, etc.)

  • PageRank

  • Inbound links - The top result on Google has usually about four times as many links as result number 11.



Additional notes:

Keywords in the file name don't seem to have a positive effect.
The file size doesn't seem to influence the ranking of a web page on Google although smaller sites tend to have slightly higher rankings.

We will keep this in mind. It mostly agrees with our previous insight and experiments.
0 Comments/Trackbacks

May 23, 2007

Should I have won the 21st. Century Journalism Contest?: Validated News Language

Simple News Content Description

Validated News Language

Simple News Content Description

News in the near future will be obtained from a Web MultiDimensional Map where news can be located. For that, we need a Standard Hyper News Description Language, in XML format.

The MultiDimensional aspect of the news map can currently be expressed with dynamic symbols.

The proposed language is referred to the contents of the News, not to its publishing format. A News Publishing Format Standard already exists, and covers a few aspects of news publication.

You, as a Human News Reader, will open the browser and you will see your Personal News Map, with color symbols representing News Variables.

You will navigate to the News you want applying a Filter Set to the News Variables. The Filter Set will be easily saved in the News Browser, and can shared with other users.

Validated News Descriptors

Some Variables need Validation. For instance, when Source is proclaimed, an ID code can be necessary. When Support Material is available, a web address should keep the materials, or specify how to obtain them.

If a news claims Public Domain Status, someone will need to validate that Status, at least doing searches in the proper databases.

Each Validation-required Variable will need a special method for proper Online Validation. Agencies, Journalists or other News Producers will need access codes to the Validation mechanisms.

These are some proposed variables or fields:

 

Category

Description

Example

Filter example

Validation

 

 

 

 

 

Source

Who is the author of the News unit.

Newspaper, news agency, company, individual, website

Check boxes for every accepted source

The source needs to be validated by a company official, website owner or trade association. Phone, email or address are required for some validations of a news source.

Geographical location

Use the mouse to assign levels of interest to several cities, countries or regions.

street address, city, country or area

Only news from a certain area

Author

Time of occurrence

Timeframe for the fact : start, climax, end. Can be pinpointed or diffuse. Day, week, month, year.

events occurring on a certain date or period, past, present or future

look for weekend events, historical facts or next-year projections.

Author

Timeframe

Refers to the period where news will be current . Short-lived news are event announcements, weather reports, sports forecasts. Long-lived news are deep reports, opinion, editorials

A meeting call will be current until its planned occurrence time. A forecast will be current until the fact actually occurs.

Urgent or Last Moment News are current for short spans. Analysis or deep reports are alive for a longer period.

Author

Credibility

Depends on the source. The credibility is established by history and voting from qualified referents

The Wall Street Journal has 10 points, while False-News.com has 0 points.

Only news with high credibility.

Credibility needs to be certified by independent, registered entities.

Support material available

For news that have associated hard data: more photos, tapes, stats, signed declarations, serious sources or other

Unsustantiated, documented, rumored, believed

Exclude or include rumors or unsupported news

Author

Interactivity

News about events where the reader can interact.

Movies, shows, rallies, voting, conferences, online forums or blogs

Show events for the weekend in my town.

Author

Reader feedback

Some news can be associated by surveys. Companies providing online survey mechanisms will need to adhere to standards for collecting and displaying information.
The requirements for participating in a survey need to be specified in a standard form.

Weather data can be validated by local residents; artists can be rated by the public; crimes can be witnessed; politicians can be supported or discredited.

Reader feedback can be turned on or off. Light surveys can be accepted, while time-consuming ones can be filtered-out.

Author

Likeliness

Weird, unusual, unpredictable news, as opposed to predictable news

A weather report or sport result is predictable or probable, while a crime is not

Look for unusual news, like a person biting a dog.

Author – Independent validator

Personal

when you look for news where the protagonist is an important component

Name, age, sex, national origin.

News from an artist, politician or neighbor

Author

Reader age

news are sometimes oriented to young or adult audiences

Children, young adults, adults, senior citizens.

Fantastic news, music events or interactive meetings are for the young, while credible, long-validity news are usually for the adults.

Author

Reader gender

news directed to specific audiences, by gender or sexual preference: male, female, gays.

Magazines oriented to women are a good example of women-oriented news..

Filtered for emphasis on male, female or gay news

Author

Matchmaking

when you look for persons or companies with compatible needs

Love, friendship, relationships, in dances, parties or  bars

Filtered by area, time or personal criteria

Author

Commercial

when you look for commerce

Sales, garage sales, auctions, business appeals.

Look to buy, look to sell.

Some commercial news will need a tax ID

Language

The language the original news is written, or its accepted translations

English, Spanish, French.

Check boxes for every accepted language

Author

Subject

A thematic tree, like in the web directories

Technology – Computers – Internet – blogs

Assign a level of acceptance for most subjects: 10 is very interesting, 0 is I do not care.

Thematic trees need to be provided by a standard News Subject Directory

News type

About the news unit itself

editorial, announcement, report, press release, infomercial, other

Check boxes for every accepted news type

Author

News format

About the news unit itself

text, image, sound, video, music, software, website

Check boxes for every accepted news type

Formats will be standardized as much as possible

News rights

intellectual property

creative commons, copyrighted, public domain, other

Check boxes for every accepted news type

Some rights types have specific requirements, like a piece of HTML code.

Ideology

for ideologically biased or oriented news

left, right, liberal, conservative, religious, ecological, others.

Check boxes for every accepted news type

Author – Independent critics

Originality

Some news has exclusive ownership, while others are widely known.

A news protagonist will offer an exclusive interview to a certain media, providing 100% originality.  Press conferences with wide attendance have low originality.

Look for original stories, or look for any good story

Author

Price

Some news will require payment for complete access or reproduction rights.

Financial news, stock data and detailed metheorology data.

Allow news with less than a fixed price, or within a certain budget.

Author

Advertisement

News carrying obvious ads will be flagged.

News carrying an offer for paid extra info, ads for a book.

It will be necessary to disclose affiliation with the seller

Author – Independent critics

This is the first step towards a News Metatag Language.

0 Comments/Trackbacks

May 11, 2007

Assassinations for $2000 publicly announced

Have gun, will travel everywhere...

Reading free classified ads I found a disturbing announcement for Inexpensive Hit Men. The add links to a free hosting service in Spain, where the hosted page details the offer in great detail. Free email accounts are provided for contact, and the publishers promise "we end with your enemies" and "we provide you the peace that you long for" .



The website is in Spanish, and it offers "testimonials from satisfied clients in South America". I am within reach, so I am cautious about my writings.

The website alone does not mention killings, but the ads do.

I wrote to the webmaster of the hosting service, but I did not get an answer. Thus, I am writing this posting to see how the online community reacts to it.

I erased the contact data from the reproduced pages, but they are available for law enforcement officers.

I wonder where lies the responsibility of webmasters and search engines for a posting like this. The combined effect of a website, free ads and indexation creates a potential deadly business.

As a webmaster, I often need time to clean spam from comments or forum postings. But I do not police all of them, and I risk hosting dangerous ads like this.

Is a Web Police the solution? A WebPolice website? A complaint to search engines? Please comment.



We also need a Web police to stop dDos attacks, virus spread, piracy, bad porn... But that is another story...