<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Wisdom of small crowds, part 1: how to aggregate Turker judgments for classification (the threshold calibration trick)</title>
	<link>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/</link>
	<description></description>
	<pubDate>Wed, 07 Jan 2009 00:05:09 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
		<item>
		<title>By: brendano</title>
		<link>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-440</link>
		<dc:creator>brendano</dc:creator>
		<pubDate>Fri, 15 Aug 2008 18:28:18 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-440</guid>
		<description>Bob - yes, indeed some of the high-disagreement cases are genuinely hard.  Not all of them though.  I've only looked at this anecdotally.</description>
		<content:encoded><![CDATA[<p>Bob - yes, indeed some of the high-disagreement cases are genuinely hard.  Not all of them though.  I&#8217;ve only looked at this anecdotally.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-438</link>
		<dc:creator>Bob Carpenter</dc:creator>
		<pubDate>Thu, 14 Aug 2008 22:27:02 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-438</guid>
		<description>I'd expect that blending in an estimate of the accuracy of the users would help separate out the confident assignments from others.  This might not work in this setting, though, where annotators only have a go at very few data points.  Also, estimating the prevalence of a given category can also provide helpful information.

Have you looked at uncertain or borderline cases for which there is a lot of disagreement to see if they are in fact hard cases?</description>
		<content:encoded><![CDATA[<p>I&#8217;d expect that blending in an estimate of the accuracy of the users would help separate out the confident assignments from others.  This might not work in this setting, though, where annotators only have a go at very few data points.  Also, estimating the prevalence of a given category can also provide helpful information.</p>
<p>Have you looked at uncertain or borderline cases for which there is a lot of disagreement to see if they are in fact hard cases?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brendano</title>
		<link>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-339</link>
		<dc:creator>brendano</dc:creator>
		<pubDate>Tue, 17 Jun 2008 21:01:22 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-339</guid>
		<description>Logan: It's all about the R!  I wrote the routines myself for the separation dotplot and the confusion barplot.  I've never seen anything quite like the confusion barplot before; I'd be curious if anyone else has used it.  (It would need modifications for problems where the class distribution isn't as balanced; for example, where true negatives dominate, just ditch the TN bars.  I have a bunch of plots here where I tried all possible stacking orders; it's fun, they emphasize different ratios among the confusion categories.)  The code is currently at http://github.com/brendano/dlanalysis but it's rather tangled up with other things.

I really recommend the ROCR package (in R) as a well-done, reusable package for all the standard confusion plots, e.g. P/R and ROC, expected costs, etc etc.  I was thinking of porting my plotting routines into their system, eventually...</description>
		<content:encoded><![CDATA[<p>Logan: It&#8217;s all about the R!  I wrote the routines myself for the separation dotplot and the confusion barplot.  I&#8217;ve never seen anything quite like the confusion barplot before; I&#8217;d be curious if anyone else has used it.  (It would need modifications for problems where the class distribution isn&#8217;t as balanced; for example, where true negatives dominate, just ditch the TN bars.  I have a bunch of plots here where I tried all possible stacking orders; it&#8217;s fun, they emphasize different ratios among the confusion categories.)  The code is currently at <a href="http://github.com/brendano/dlanalysis" rel="nofollow">http://github.com/brendano/dlanalysis</a> but it&#8217;s rather tangled up with other things.</p>
<p>I really recommend the ROCR package (in R) as a well-done, reusable package for all the standard confusion plots, e.g. P/R and ROC, expected costs, etc etc.  I was thinking of porting my plotting routines into their system, eventually&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Logan</title>
		<link>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-338</link>
		<dc:creator>Logan</dc:creator>
		<pubDate>Tue, 17 Jun 2008 18:24:49 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-338</guid>
		<description>Great explanation. What did you use to make the red and blue confusion matrix plot? Is that in an R package? Nice way to visualize it.</description>
		<content:encoded><![CDATA[<p>Great explanation. What did you use to make the red and blue confusion matrix plot? Is that in an R package? Nice way to visualize it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul (from Belgium)</title>
		<link>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-330</link>
		<dc:creator>Paul (from Belgium)</dc:creator>
		<pubDate>Tue, 17 Jun 2008 09:50:28 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-330</guid>
		<description>Hey Luke and the others,
Now I get the service that your company is providing on top of Amazon Turk, and it's quite cool I must say!</description>
		<content:encoded><![CDATA[<p>Hey Luke and the others,<br />
Now I get the service that your company is providing on top of Amazon Turk, and it&#8217;s quite cool I must say!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
