Archive for the ‘Search’ Category
Internet Explorer’s Trailing Comma Woes
Internet Explorer is notorious for breaking on trailing commas in JavaScript array declaration. e.g.
var obj = {
a: 1,
b: 2,
};
fails on IE, while all other browsers just ignore the innocuous trailing comma after second element.
Weeding out these commas from JavaScript code is absolute PITA. However, here is a regular expression search string I wrote to search such instances in the code.
,\s*\n+\s*[\}\)\]]
Even better,
,\s*\n+(\s*\/\/.*\n)*\s*[\}\)\]]
matches multiple new lines and comments.
Robots-Nocontent for Page Sections
From my relatively little but significant “web-crawling” experience, one of the major problems is to scavenge meaningful content from the page- which requires that no navigation crap, menus, javascript and adverts should be indexed. Since there is no standard way web-devs design navigation, menus etc. it is impossible to code a parser that works 100% and is a big PITA.
However this piece on Yahoo! Search Blog is welcome news
webmasters can now mark parts of a page with a ‘robots-nocontent’ tag which will indicate to our crawler what parts of a page are unrelated to the main content and are only useful for visitors.
If the trend catches on, and becomes a standard (has to get Google’s support), it would be greatly helpful.
Google ditches its SOAP API
I noticed this on the very next day, when I was looking for some documentation on proxy auto configuration scripts support in the SOAP API. The replacement AJAX api not only has limited application (website only), but also it promises to show google ads beside the results.
Not that Google Search API has ever been very stable – it works almost only 80% of the time, so one has to pray and hope that it works with every call. Now even the support has been dropped and usage samples along with FAQ have been removed.
Damn! not good at all.
My Little Experiments with Google Search
I’ve been doing little experiments with Google lately. This blog is up for little more than two weeks now, since I imported all posts from my old blog. That old blog used be #1 result for ‘Reverberations‘ ( #2 as of now) and ‘brajesh‘ ( eh! I do my share of egosurfing) on Google, probably because of my slashdot backlinks, though this blog has none yet. yahoo! has been less benevolent to me, but MSN been pretty favorable.
I acted link-conscious by linking this blog with ‘reverberations’ from the blogger and elsewhere. For the first week this blog was on the first page* in the results for ‘reverberations’ as well as ‘brajesh’ (*my default preference is 100 search results per page). But, by the second week, this blog surprisingly disappeared from first page. What’s happening here? I guess, it’s to do with some google algorithm for unnatural linking and/or Google Sandbox.
There are some other experiments I did with search engines, e.g. deliberate spelling mistakes, cross linking. I’ll write about them sometime later perhaps.
Links are the new (or not so new) currency of the Web. That is why Mike Arrington at TechCrunch can ask for contributions in exchange for linking back.
By the way, it looks like that there has been some major updates in Google PageRanks recently.
Peer-to-Peer Techniques in Searching
So far searching web has been mostly centralized affair (Google et al). There have been few attempts to develop a distributed search framework, Sun’s JXTA being a prominent one.
A new European Commission funded research project aims to produce a prototype p2p search engine that’s been described as an attempt to create a “P2P Google“.
“The aim is to produce a system that offers higher quality results and more robustness than a centralised system such as Google,” says Dr David Hales from Bologna University, a researcher working on the project. Professor Gerhard Weikum from the Max Planck Institute, who’s leading the effort, and which involves computer scientists from across Europe, says “We’re in the early stages of the project but are making rapid progress.”
Koders: code search on the web
Recently came across Koders BETA while I was seraching for something on piece tables.
This is what it says about itself
“Koders is a search engine for source code. It enables developers to easily search and browse source code in thousands of projects hosted at hundreds of open source repositories.”
Perhaps this was a much needed initiative in terms of reducing the time and effort involved in the relevant code searching. Google search is just not good enough. You get all kind of furnitures, chairs, chess, some Katmandu dinner set for “piece table”, all on the first page, but to get what you are looking for you need eyes of a hawk.
Koders requires the code website to make itself listed to be searched, which is pretty reasonable, considering the ease of search that follows. Perhaps more of search niche search engines will come up in near future.
mind your desktop Part-II (can’t help it)
Well…I have been using the Copernic Desktop Search tool for sometime now. It has some amazing features apart from being very “light” on the system. While Google Desktop Search requires 1GB of space on C: to conveniently locate the index of your entire electronic existence, I didn’t find any such requirements for Copernic. Copernic doesn’t index secure web pages in its index, which Google does. Probably it is a good thing, you don’t want your visits to secure web pages to be available at the first place..meant to be secure include bank websites, Gmail and other e-Commerce sites.
So by its design itself Copernic seems to offer more security(?)…whatever that means. But overall I found myself more comfortable with Google Desktop search.
Any way, instant desktop search is a revolutionary concept, whatever the tool be…so, bye bye old days of Windows Search Companion which used to take painfully long time to search a mp3 or a file containing a phrase with unreliable results.