Google falls on dots

After years of great satisfactions today I found Google missing an important feature: correct punctuation marks evaluation inside queries.

I was looking some information about getting comments in RSS from .Text blogs (my old blog was hosted on a .Text site and I want to import the comments). So, I ask Firefox to show my homepage (= Google) and I wrote down this query .text rss comments. I get many result pages containing text, -text and other but no one with .text in the first 30 results (out of 107,000,000). Ok – I thought – I must use quotation marks, and I tryed with “.text” rss comments. No way: exactly the same results I got with the first try.

I spent some minutes reading Google documentation and I found this (full article here):

Google doesn’t recognize special characters such as exclamation points, question marks, or the @ sign. These types of characters are so common that including them in a search would greatly slow the delivery of the search results. Additionally, the use of punctuation on the web is so inconsistent (for example, there’s no obvious way to decide between Mr. and Mr) that including it in the query often does more harm than good to the relevance of your search results.

No guys! Generally speaking punctuaion has great importance, and in some languages much more than in English. Some times the meaning of sentence can be totally different with or without a comma or a dot. In italian, for example, punctuation marks can be so important that we have a proverb referring to this concept: “Per un punto, Martin perse la cappa!” (translated: “Because of a period, Martin lost his post!“). And the explanation can also be found on Wikipedia:

(the origin of this proverb is a tale, in which an acolyte monk, Martin, was told to write the latin phrase “Porta patens esto, nulli claudatur honesto”: “Be the door (always) open. Be not closed to any honest (person)”, referring to the door of the monastery. He instead supposedly wrote on that door “Porta patens esto nulli. Claudatur honesto.”: “Be the door open to no one. Be it closed to honest (people).” Thus, he lost “the cape” (i.e.: the right of taking vows as a monk) because of a period, or dot (Italian language uses the same word). That to symbolize how little details make a big difference in meaning or results.)

Another proof? Try this: “.net” learning. 7 of the firts 10 (and 2 of the first 3!!!) results are not related to the .Net Framework because Google handles “.Net” and “Net” the same way.

Update: September 1, 2006 – 18.30
Altavista and Lycos act the same way. But still I’m convinced that this is the is wrong way. The assumption “… the use of punctuation on the web is so inconsistent (for example, there’s no obvious way to decide between Mr. and Mr) that including…” shouldn’t ever be made. The search engine should not decide in this case, it should simply do what I want: search for “Mr” when I ask for “Mr” and for “Mr.” when ask “Mr.“.

Share this post: Email it! | to! | digg it! | reddit! | Furl it! | to any service


~ by Matteo on September 1, 2006.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: