Login

Login

The CrowdFlower Blog

Wisdom of small crowds, part 3: another worker visualization

by

This is a follow-up to the previous post on individual workloads and rates. Here are the submission times and durations for every worker on the same graph. Each worker is one horizontal line. An assignment is started at a dot, and its duration is for the line segment extending to the right.

submission-durations-wide1.png

The particular data set isn’t the same as in the previous post, but was for a similar task and exhibits a similar structure. Worker rates substantially differ. Some workers do a few HIT’s, but others work on as many as are available. Some work rapidly with breaks (19, 36). Some assignment durations are as long as 5-10 minutes (13, 37). Some work very intermittently (29).

This view makes the parallelism of AMT apparent. At any vertical timeslice you can see how many workers are active at that time. The entire job ends on the right side when the available HIT’s run out.

[ This article is part of a series, Wisdom of Small Crowds, on crowdsourcing methodology. ]

0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.

Comments

  1. Excellent demonstration of worker times (both this one and the previous post).

    Have you thought of examining more closely the HITs that tend to take longer to complete than the rest? I am wondering if they are “more difficult” than the rest, or if they fall into some specific category.


  2. Also, can you post the code for these visualizations? They are pretty cool and very revealing at the same time.


  3. Thanks! The code is all in R and pretty minimal, so I’ll just put it inline here.

    This is working off of the new CSV format from the new AMT interface, which has one row per assignment:

    a = read.csv("amt_assignments_file.csv")
    # some lame datetime cleanups - amazon uses strftime("%c"), totally dumb...
    lame_convert < - function(x)  strptime(x, "%A %b %d %T")
    for (c in c('AcceptTime','AutoApprovalTime','CreationTime','Expiration','SubmitTime'))
     a[,c] = as.POSIXct( lame_convert(a[,c]) )
    

    Here's the parallelism plot:

    worker_parallelism_plot < - function(a, w_pos=NULL, ...) {
      if (is.null(w_pos)) {
        w_starts = (dfagg(a,a$WorkerId, function(x) min(x$SubmitTime - x$WorkTimeInSeconds)))
        w_pos = rank(w_starts, ties='first')
      }
    
      plot(a$SubmitTime - a$WorkTimeInSeconds, w_pos[a$WorkerId],  type='p', ...)
      segments(a$SubmitTime - a$WorkTimeInSeconds, w_pos[a$WorkerId],   a$SubmitTime, w_pos[a$WorkerId])
      # text(sort(w_starts), 1:length(w_pos), sprintf("%s", 1:length(w_pos)), pos=2)
    }
    

    The one-box-per-worker plot in the other post is just

    library(lattice)
    xyplot(WorkTimeInSeconds ~ SubmitTime | WorkerId, data=a)
    

  4. oops, that code uses a function (“dfagg”) from the utility file http://github.com/brendano/dlanalysis/tree/master/util.R


  5. Oh, as for HITs that take longer — haven’t looked at that too much. I’ve only done this timing analysis for a task that’s really easy for all HIT’s.


Leave a Reply

Comment


Why CrowdFlower?

How it Works What it Means Scalability Technology Innovation and Expertise

Documentation

Requester Interface Gold CrowdFlower API CML Channel API Image Moderation API

Solutions

eCommerce Online Media and Publishing Data Providers Daily Deals & Local Search Brand Management Self-Service

Products

read more...

Customers

read more...

About

Team Press Resources Jobs Contact

Law Talk

Privacy Policy Terms of Service ©2011 CrowdFlower