Archive for the ‘Miscellaneous’ Category

Crowdsourcing Artwork

Monday, April 14th, 2008

Note: Unlike the other projects, this one was not done by Dolores Labs, but it was too interesting not to share.

Sheep

In 2006, Aaron Koblin used Mechanical Turk to produce 10,000 hand drawn sheep. You can check them out (and buy some) at http://www.thesheepmarket.com.

Recently, he and Takashi Kawashima worked together to make an art project called Ten Thousand Cents, where they broke a one hundred dollar bill into ten thousand pieces and had turkers copy each piece.

Hundered Dollar Bill Example

The result is this (you can buy a copy for $100):

One Hundered Dollar Bill

It’s a cool project on multiple levels. It struck me how visually obvious it is who is taking the task seriously and who isn’t (the boxes that look grainy in the above picture are probably examples of people who didn’t really do the stated task). Yet even with the noise there’s a very clear signal that comes through. In fact it looks like they made such a good replica bill that Google checkout shut down their orders.

-Lukas

Search engine relevance - an empirical test

Thursday, April 3rd, 2008

Search engines control the information we see and use. Their key component is a ranking algorithm that tries to determine the most relevant web pages for your query. How good are these algorithms? Publicly, there’s a lot of hype, while privately, all the big engines run proprietary quality evaluation efforts. But there’s virtually no real data out there for the rest of us.

Using Mechanical Turk, we can evaluate engine relevance. We tried an experiment where we took five hundred queries and ran them against the top 4 English language web search engines: Ask, Google, Live, and Yahoo. The queries were a random sample from a real-world set of search queries. We had annotators rate the relevance of the top five results for each engine. Our results:

engine_comparison1.png

Ask clearly performed the worst. The other three engines were in a statistical tie. Their ordering was Google, Yahoo, then Live, but the differences were miniscule: the top 3 engines all answer about 80% of queries effectively.

What do these results mean?

People often talk about Google as being the most relevant search engine, with the best algorithms and the like. This study finds little evidence to support that. Sure, our methods are preliminary and could be improved in any number of ways; we can probably shrink those error bars and find more statistical differences. However, it is the case that for 500 typical queries, a rough but pretty objective measurement of search quality found that Google, Live, and Yahoo all performed about the same.

Note that these results don’t speak to the entire user experience. To be able to compare between engines, we extracted only the core web results with their titles, urls, and snippets. But a search engine also includes much more: the presentation, branding, video and image results, ads, etc. We only tested the relevance of core web search.

Many more details below.

(more…)

Crowdsourcing to find media bias: Hillary vs. Obama

Thursday, March 27th, 2008

As anyone who follows political races knows, different sources can report the same event in very different ways. We took nearly six thousand recent articles over the past month about Clinton and Obama and sent them to Mechanical Turk to be classified as favorable or unfavorable for the respective candidates. We scraped the articles from Google News restricted to several sources, and threw in front page headlines from Digg.

Here is the graph for favorability scores, aggregated by source. We found that Digg was far and away the most favorable for Obama.

obama-hillary-bysource3.png

The next graph tracks overall news favorability by date. To provide some context, we compared it with the change in Obama stock on the Intrade prediction market.

obama-hillary-overtime2.png

More details after the jump:

(more…)

Less white people, more football: Sports Illustrated covers since 1954

Thursday, March 13th, 2008

Human annotators are great at providing basic information about images. We were wondering if we could find something interesting about magazine covers. Stumbling upon 2800 Sports Illustrated cover images going back to 1954, we sent them to Mechanical Turk, asking people to identify the race and gender of the person featured (if any), and what sport was depicted. There are lots of interesting things in this data; this post will touch on just a few we’ve had time to whip together some graphs for.

Here is a historical graph of the frequency of how often people of different races appear on the cover of Sports Illustrated. The story is simple and striking:

Next: which sports get featured on the cover? Here’s a chart for several sports over that same time.

It might be possible to find links between the careers of famous athletes and rises and falls their sports’ popularity; for example, boxing peaks in the 70’s (Muhammad Ali?), basketball peaks in the 90’s (Michael Jordan?) and golf bounces back in the 90’s after a long decline (Tiger Woods?).

Many other sports appear in the data, too; for this chart, we made sure to pick the three most common, and a few other particularly interesting ones. Percentages don’t add up to 100% because we didn’t plot all the other sports, including things like horse racing which used to be much more popular. If you’re really curious, here’s the full chart of all sports we asked about, including many of the smaller ones.

-Brendan

The Manifesto

Thursday, March 13th, 2008

The first time I used Amazon’s Mechanical Turk it was at a search engine startup, Powerset, and I used it to compare the quality of a few versions of our early internal algorithm with Yahoo and Google. We were thinking we would have to hire a team of people that would spend their entire day comparing the quality of results.

As an experiment, I set up a task with no quality control, put in about fifty bucks and let it run overnight. The data that came back was noisy but I was able to find meaningful differences between the search engines. Completely on my own. I didn’t have to get approval to hire people, put my experimental design through a committee and wait a month for the results to come back. I could design the experiment empirically, doing meta experiments on the data collection process itself.

Back when I was thinking about what machine learning papers to write at Stanford, the conversation always hinged on what kind of data sets were available. We’d go research what data was out there and then figure out what we wanted to do. We’d spend a ton of time wrangling data designed for one purpose into another. I think it’s the same in lots of disciplines that use data.

Here at Dolores Labs, we’ve built tools and processes to quickly and efficiently collect lots of data on Mechanical Turk and other places. I hope that this blog gives us a chance to play with our technology. Back when I made my first AMT jobs, I thought about all the crazy experiments I wanted to run. Overnight, could you figure out what airline carrier was the cheapest? Could you find the exact threshold where what most people call “red” becomes what most people call “orange”? Could you quantify the difference in sentiment between FOX news and NPR?

When I was in college, I had an art teacher who made everyone draw twenty pictures a day. I hope these experiments are like those pictures. Sloppy and fun and occasionally brilliant.

We’ve been brainstorming experiments that we’d like to run, but if there’s any data set that you’d like to suggest send us an email. Maybe we can make this deal: if you have a cool idea, we’ll collect the data for you, and you guest post a short analysis.

Our first experiments will be posted shortly, and many more to come. I hope you enjoy em!

-Lukas