Bing is an Improvement over Live, but Still Not Google Quality: Evaluating Bing With Mechanical Turk
June 10th, 2009 by Lukas BiewaldMicrosoft’s new search engine, Bing, has recently gotten a lot of attention. Several people have already built tools to compare Google and Bing.
Since all the engines are fairly similar, it’s hard to separate true quality from our preconceptions. For example, one of Google’s internal tests is reported to have shown that “users still prefer the results with the Google logo, even if they’re not Google results.”
Are the new Bing search results really better than the old Live search results? Are they better than Google?
We took 100 random real-world queries and showed their results from each engine to workers on Mechanical Turk. For a single query, we showed the results from two engines side-by-side and asked workers to judge which result set was better. For each query, here’s the aggregate judgment from several workers:
Bing versus Google
Bing (Microsoft today) versus Live (Microsoft as of March)
Summary
We found that Google is statistically significantly preferred to Bing (p < 0.04), though the difference is rather small: Google is preferred on 55 percent of the queries, and on average it scores two tenths of a standard deviation better than Bing. (0.141 on a four-point scale.)
On the other hand, we found that users preferred Bing's new results to the older Live search results 55% of the time. But this result wasn't statistically significant -- they're virtually tied in aggregate.
In conclusion, Bing's quality seems to be improving, but hasn't yet caught Google. Of course, relevance is just one component of a search engine user experience, and it's clear that all the major engines are quite close, and there exist a large set of queries where Bing significantly outperforms Google.








