Last week, Google released their new toy: the Google Code Search. This lets us search any code posted on the Internet for a regular expression. Yes, it allows us to search on a regex! I thought regex indexing wasn’t possible or at least not practical. It would require a MASSIVELY HUGE index compared to the data it tries to represent.
Or maybe they aren’t using index at all? Without indexing, a search would take a very long time. Take, for example, simply grepping the entire Linux kernel source tree here takes about 5 minutes. A search on Google Code Search, however, return its result in less than a few seconds. And let’s consider their haystack is supposedly the entire source code that get posted on the Internet.