What is Your Quest?

Recommendations are the holy grail of aggregation. That's what Brad asked for in his post on the subject, that's what Mark Pilgrim was getting at. Feed recommendations have always interested me since I first saw Mark's application (now offline). One of the first projects I worked on at NewsGator was to improve the recommendations page.

The recommendations page was fairly frustrating to change; the original implementation ran very slowly but came up with decent, if a little boring, recommendations. I cleaned up the SQL for the query and it ran much faster (in the average case), but the recommendations were horrible. Everyone got the same 10; New York Times, BBC, standard news outlets, and maybe a sprinkling of A-list bloggers. The basic implementation hadn't changed; NewsGator maintains a sparse matrix correlating how many people subscribe to a given pair of feeds; if there are N feeds in the system, the matrix is some fraction of N^2/2. I wrote a query to select the highest correlated feeds, given an input list of feeds from your existing subscription list. The realization I had was that NewsGator's signup process tended to encourage people to pick from a list of well-known feeds, so NewsGator has a very pronounced A-list effect towards those feeds. Now, the A-list effect has been a problem in every recommendations engine I've tried, but in this case it was particularly bad. I did some research and found out some pretty interesting facts about NewsGator users' subscription lists, but I couldn't translate that into producing better recommendations.

Around that time Chris Anderson's Long Tail article was published. That article had lots of interesting observations about marketing (some of which have turned out to be somewhat fallacious). But that article catalyzed my thinking around the subject: what if we cut off the A-list? So I altered the query to chop off correlations higher than some arbitrary cutoff. I don't remember what number I settled on, but it's a couple orders of magnitude below the subscriber count for the most-subscribed-to feeds. There were some small changes in seeding the recommendations (if you've rated any posts 4 or 5 stars, those feeds get added to your initial seed list), but the A-list cutoff was the key. Suddenly, recommendations started producing something like a compelling result; at least, something that correlated to my actual interests. There were a few problems; people automatically go looking for recommendations when they first sign up for NGOS and the recommendations process flat doesn't work if you don't have at least a few (~10) subscriptions. But it was pretty effective in the general case.

I got busy with other projects and forgot about recommendations for a while, but just recently started looking at them again. I was really surprised by what happened; NewsGator's subscriber base has multiplied a few times since the original work was done in January, and the network of subscriptions has grown considerably. My recommendations used to be a B-list of .NET developers, which I'm interested in, but not to the exclusion of all else (like, say, Lisp developers :-) ). That's definitely changed, and what's more, the recommendations change daily (technically, a little more frequently as we cache the results for 18 hours). So the NewsGator Recommendations RSS Feed becomes much more useful as a result.

Of course, the weak link in all this is the fact that you have to commit to subscribing to a feed. The point isn't finding interesting subscriptions, what people really want is interesting content. Keyword and URL search feeds are crude ways of achieving that; but it's more subtle than that; people use words to mean different things (look at some of the overloadings on tags used on del.icio.us), and people use multiple words to refer to heterogeneous concepts. And many concepts don't fit neatly into one or two words. The obvious suspicion is that linking is the obvious place to start, and maybe Bayesian categorization (or maybe not). But it's clear that this sort of searching is the holy grail of aggregation.

— Gordon Weakliem at permanent link

Do You Need Influencers

Dare offers a counterpoint to jr conlin's Your Target Audience Isn’t Who You Think It Is. Dare demonstrates that the sentiment goes both ways. I met Dare in person a few weeks ago and he talked a lot about MSN Spaces; the one thing that stuck with me was his reaction to the initial criticism of Spaces: "we didn't make that service for Scoble, we made it for the 15 million people who've signed up in the last 6 months". It's a great point, if Spaces had held up release to please Robert Scoble, it would have put the project launch back 6 months, in order to win over one user.

To be fair, jr conlin's point was that if you want to reach "influencers", you need to bear in mind that these people differ significantly from the general public. The question is whether or not it's worth trying to reach the influencers. MSN Spaces probably has near 0% penetration among the influencers of the world, but you'd be wrong to say Spaces isn't a success story.

— Gordon Weakliem at permanent link