Archive for the 'Google' Category

Dear Google, Please fix Reader!!!

I’ve been a loyal Google Reader user now for a quite a while. And I’m about to quit it, forever, because of a very STUPID decision by the Google engineers and product managers. You see, despite it saying so, there isn’t really an option to “keep unread” your, well, unread items. I often queue items up to read later, or keep them as read at the bottom of the new posts for later reference (say, software I want to try but don’t have time for yet.)

Well now Reader has marked many of those items read. It seems that after 30 days, even if I kept checked to “keep unread”, they get marked as read. So, Google is lying to me. Furthermore, when I have the option to receive email feeds or RSS, I choose RSS out of convienence with the assumption it will act the same as email, i.e. unread messages are unread until they, well, aren’t.

So Google, you are about to lose a Reader customer, and considering I use many other services of yours, believeing that Google prioritizes the users data and preferences first, I’m not sure I can continue trusting your services in the same way. I mean, you didn’t even warn me.

So now I must go back through hundreds of blog posts I’ve already read, to find the 4 you’ve taken away from my unread list, despite me telling you explicitly not to, as you asked me to do. Shame on you, Google.

1 Trillion and Counting…only 3.3 years until Google’s Index Reaches Infinity

OK so I’m kidding about the infinity thing, but this I’m not: a couple of Google search engineers announced today that Google’s search index had reached a historic milestone: 1,000,000,000,000 (trillion) unique URLs on the web!

We’ve known it for a long time: the web is big. The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark. Over the last eight years, we’ve seen a lot of big numbers about how much content is really out there. Recently, even our search engineers stopped in awe about just how big the web is these days — when our systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!

What I find really cool, is that their massive supercomputer is able to index all of this content, 1 trillion pages and new additions, nearly continuosly:

To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google’s index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it’d be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.

As you can see, our distributed infrastructure allows applications to efficiently traverse a link graph with many trillions of connections, or quickly sort petabytes of data, just to prepare to answer the most important question: your next Google search.

I am surprised by that number. I figured maybe in a couple years…but wow. I don’t know about you, but my money says Googlenet is already more powerful than Skynet.

Read more on the Google Blog.