<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments for Dolores Labs Blog</title>
	<link>http://blog.doloreslabs.com</link>
	<description></description>
	<pubDate>Sun, 06 Jul 2008 06:39:03 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
		<item>
		<title>Comment on Our color names data set is online by فروشگاه اینترنتی</title>
		<link>http://blog.doloreslabs.com/2008/03/our-color-names-data-set-is-online/#comment-353</link>
		<dc:creator>فروشگاه اینترنتی</dc:creator>
		<pubDate>Fri, 04 Jul 2008 15:44:12 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/03/our-color-names-data-set-is-online/#comment-353</guid>
		<description>thank you for this post. i will add this blog to my favorites list.</description>
		<content:encoded><![CDATA[<p>thank you for this post. i will add this blog to my favorites list.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on FaceStat scales! by Brandon Franklin</title>
		<link>http://blog.doloreslabs.com/2008/06/facestat-scales/#comment-341</link>
		<dc:creator>Brandon Franklin</dc:creator>
		<pubDate>Sat, 21 Jun 2008 22:00:31 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/facestat-scales/#comment-341</guid>
		<description>Ohhh I get it, I guess the Ruby people don't understand that "running on a JVM" doesn't mean "coding in Java".  Might wanna do a little research first, peeps.  See:  JRuby, Jython, and Groovy.</description>
		<content:encoded><![CDATA[<p>Ohhh I get it, I guess the Ruby people don&#8217;t understand that &#8220;running on a JVM&#8221; doesn&#8217;t mean &#8220;coding in Java&#8221;.  Might wanna do a little research first, peeps.  See:  JRuby, Jython, and Groovy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on FaceStat scales! by Brandon Franklin</title>
		<link>http://blog.doloreslabs.com/2008/06/facestat-scales/#comment-340</link>
		<dc:creator>Brandon Franklin</dc:creator>
		<pubDate>Sat, 21 Jun 2008 21:58:03 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/facestat-scales/#comment-340</guid>
		<description>LOL I'd like to see anybody code the medical imaging client/server application I work on in Ruby instead of Java and get it to perform at all!

Besides I'm not talking about Java vs. Ruby.  I'm talking about Grails vs. Ruby on Rails.  Groovy != Java.

As far as I can tell, Grails is more beautiful and elegant than RoR.

It is amusing to see Ruby fanbois on the attack though.  I'll admit I've never seen that before.</description>
		<content:encoded><![CDATA[<p>LOL I&#8217;d like to see anybody code the medical imaging client/server application I work on in Ruby instead of Java and get it to perform at all!</p>
<p>Besides I&#8217;m not talking about Java vs. Ruby.  I&#8217;m talking about Grails vs. Ruby on Rails.  Groovy != Java.</p>
<p>As far as I can tell, Grails is more beautiful and elegant than RoR.</p>
<p>It is amusing to see Ruby fanbois on the attack though.  I&#8217;ll admit I&#8217;ve never seen that before.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Wisdom of small crowds, part 1: how to aggregate Turker judgments for classification (the threshold calibration trick) by brendano</title>
		<link>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-339</link>
		<dc:creator>brendano</dc:creator>
		<pubDate>Tue, 17 Jun 2008 21:01:22 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-339</guid>
		<description>Logan: It's all about the R!  I wrote the routines myself for the separation dotplot and the confusion barplot.  I've never seen anything quite like the confusion barplot before; I'd be curious if anyone else has used it.  (It would need modifications for problems where the class distribution isn't as balanced; for example, where true negatives dominate, just ditch the TN bars.  I have a bunch of plots here where I tried all possible stacking orders; it's fun, they emphasize different ratios among the confusion categories.)  The code is currently at http://github.com/brendano/dlanalysis but it's rather tangled up with other things.

I really recommend the ROCR package (in R) as a well-done, reusable package for all the standard confusion plots, e.g. P/R and ROC, expected costs, etc etc.  I was thinking of porting my plotting routines into their system, eventually...</description>
		<content:encoded><![CDATA[<p>Logan: It&#8217;s all about the R!  I wrote the routines myself for the separation dotplot and the confusion barplot.  I&#8217;ve never seen anything quite like the confusion barplot before; I&#8217;d be curious if anyone else has used it.  (It would need modifications for problems where the class distribution isn&#8217;t as balanced; for example, where true negatives dominate, just ditch the TN bars.  I have a bunch of plots here where I tried all possible stacking orders; it&#8217;s fun, they emphasize different ratios among the confusion categories.)  The code is currently at <a href="http://github.com/brendano/dlanalysis" rel="nofollow">http://github.com/brendano/dlanalysis</a> but it&#8217;s rather tangled up with other things.</p>
<p>I really recommend the ROCR package (in R) as a well-done, reusable package for all the standard confusion plots, e.g. P/R and ROC, expected costs, etc etc.  I was thinking of porting my plotting routines into their system, eventually&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Wisdom of small crowds, part 1: how to aggregate Turker judgments for classification (the threshold calibration trick) by Logan</title>
		<link>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-338</link>
		<dc:creator>Logan</dc:creator>
		<pubDate>Tue, 17 Jun 2008 18:24:49 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-338</guid>
		<description>Great explanation. What did you use to make the red and blue confusion matrix plot? Is that in an R package? Nice way to visualize it.</description>
		<content:encoded><![CDATA[<p>Great explanation. What did you use to make the red and blue confusion matrix plot? Is that in an R package? Nice way to visualize it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search engine relevance - an empirical test by brendano</title>
		<link>http://blog.doloreslabs.com/2008/04/search-engine-relevance-an-empirical-test/#comment-337</link>
		<dc:creator>brendano</dc:creator>
		<pubDate>Tue, 17 Jun 2008 17:23:46 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/04/search-engine-relevance-an-empirical-test/#comment-337</guid>
		<description>Jeff, thanks for the suggestion.  I agree that the metric is extremely simple.  I actually experimented a bit with other metrics that fit into this absolute judgment framework, including the count of high-relevance documents in the top five as you mentioned.  The results are about the same.  I think the raw AOL query set is just pretty easy for standard search engines today -- lots of 1 or 2 word topic-oriented queries.

I skimmed through the Carterette paper and it's interesting.  My concern with pairwise setup is, in order to get comparability among query-result pairs, you need to get annotators to do an O(N^2) amount of work.  (Unless you do something horribly complicated with partial orders.)  The absolute judgment task scales linearly, of course.  Given the AMT environment and a fixed budget, if I stay in the smaller-volume task, instead of spending a lot on a quadratic taskload, I can simply get a higher number of workers per result and boil out more noise.  Of course, if it's true the pairwise judgment task is easier -- as the paper claims -- that might make my spending more efficient.  But since it's polynomial, no matter the cost/benefit ratios, there has to be a tipping point where, for a given data set size, you'd always want to switch back to absolute judgments.

Absolute judgments are just so much easier to compute with -- both for analysis and to use as machine learning training data.  I really don't want to have fancy utility inference or stopping rule schemes just to know the relative ranking of my data.  (And I think real-valued scores will always become a necessity.  Theoretical microeconomists have made boatloads of theorems about representing preferences by pairwise comparisons.  It turns out that when you add enough rationality assumptions -- e.g. the sort that are demanded of search engine ranking tasks anyways -- then your fancy ordering can always be mapped back to real-valued utility function.)

I'd be most interested in a paper that compares real-valued scores derived from some sort of pairwise comparison task, versus absolute judgments, and is mindful of the cost tradeoffs in service of an actual goal, like ranking algorithm training.</description>
		<content:encoded><![CDATA[<p>Jeff, thanks for the suggestion.  I agree that the metric is extremely simple.  I actually experimented a bit with other metrics that fit into this absolute judgment framework, including the count of high-relevance documents in the top five as you mentioned.  The results are about the same.  I think the raw AOL query set is just pretty easy for standard search engines today &#8212; lots of 1 or 2 word topic-oriented queries.</p>
<p>I skimmed through the Carterette paper and it&#8217;s interesting.  My concern with pairwise setup is, in order to get comparability among query-result pairs, you need to get annotators to do an O(N^2) amount of work.  (Unless you do something horribly complicated with partial orders.)  The absolute judgment task scales linearly, of course.  Given the AMT environment and a fixed budget, if I stay in the smaller-volume task, instead of spending a lot on a quadratic taskload, I can simply get a higher number of workers per result and boil out more noise.  Of course, if it&#8217;s true the pairwise judgment task is easier &#8212; as the paper claims &#8212; that might make my spending more efficient.  But since it&#8217;s polynomial, no matter the cost/benefit ratios, there has to be a tipping point where, for a given data set size, you&#8217;d always want to switch back to absolute judgments.</p>
<p>Absolute judgments are just so much easier to compute with &#8212; both for analysis and to use as machine learning training data.  I really don&#8217;t want to have fancy utility inference or stopping rule schemes just to know the relative ranking of my data.  (And I think real-valued scores will always become a necessity.  Theoretical microeconomists have made boatloads of theorems about representing preferences by pairwise comparisons.  It turns out that when you add enough rationality assumptions &#8212; e.g. the sort that are demanded of search engine ranking tasks anyways &#8212; then your fancy ordering can always be mapped back to real-valued utility function.)</p>
<p>I&#8217;d be most interested in a paper that compares real-valued scores derived from some sort of pairwise comparison task, versus absolute judgments, and is mindful of the cost tradeoffs in service of an actual goal, like ranking algorithm training.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search engine relevance - an empirical test by Jeff Dalton</title>
		<link>http://blog.doloreslabs.com/2008/04/search-engine-relevance-an-empirical-test/#comment-331</link>
		<dc:creator>Jeff Dalton</dc:creator>
		<pubDate>Tue, 17 Jun 2008 12:55:15 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/04/search-engine-relevance-an-empirical-test/#comment-331</guid>
		<description>My biggest suggestion is to work on the evaluation metric used.  Precision @5 is the number of relevant retrieved / retrieved for the top five.  Your metric of having at least one highly relevant in the top isn't p@5 and seems easy to attain.  

For the future, I would suggest using pairwise preference judgments as an alternative (Here or There: Preference Judgments for Relevance by Ben Carterette, et al.).</description>
		<content:encoded><![CDATA[<p>My biggest suggestion is to work on the evaluation metric used.  Precision @5 is the number of relevant retrieved / retrieved for the top five.  Your metric of having at least one highly relevant in the top isn&#8217;t p@5 and seems easy to attain.  </p>
<p>For the future, I would suggest using pairwise preference judgments as an alternative (Here or There: Preference Judgments for Relevance by Ben Carterette, et al.).</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Wisdom of small crowds, part 1: how to aggregate Turker judgments for classification (the threshold calibration trick) by Paul (from Belgium)</title>
		<link>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-330</link>
		<dc:creator>Paul (from Belgium)</dc:creator>
		<pubDate>Tue, 17 Jun 2008 09:50:28 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comment-330</guid>
		<description>Hey Luke and the others,
Now I get the service that your company is providing on top of Amazon Turk, and it's quite cool I must say!</description>
		<content:encoded><![CDATA[<p>Hey Luke and the others,<br />
Now I get the service that your company is providing on top of Amazon Turk, and it&#8217;s quite cool I must say!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on FaceStat scales! by James Higginbotham</title>
		<link>http://blog.doloreslabs.com/2008/06/facestat-scales/#comment-329</link>
		<dc:creator>James Higginbotham</dc:creator>
		<pubDate>Tue, 17 Jun 2008 01:54:13 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/facestat-scales/#comment-329</guid>
		<description>Brendan,

I was just curious how you found Slicehost to work in this situation and if Slicehost had a different solution that they weren't announcing publicly for situations like yours. 

I have personally steered away from the round robin DNS approach in the past. If one of your frontline servers is down, it can take a little time to propagate DNS updates to remove the server from the round-robin. Some DNS servers will cache the old IP for a period of time, thereby preventing access to your app until their cache is flushed or until the slice is operational once again. RR DNS is not the best solution, but the best one with Slicehost and it gets the job done in a pinch. A reverse proxy would be optimal to prevent this as you mentioned. 

NAT is the one feature I'd love to see Slicehost add, as it would allow larger customers to add new slices behind their single public IP without depending upon a single slice to do the reverse proxy work or stale DNS situations when using RR DNS. I noticed that Amazon EC2 now has this, though I haven't investigated it fully.</description>
		<content:encoded><![CDATA[<p>Brendan,</p>
<p>I was just curious how you found Slicehost to work in this situation and if Slicehost had a different solution that they weren&#8217;t announcing publicly for situations like yours. </p>
<p>I have personally steered away from the round robin DNS approach in the past. If one of your frontline servers is down, it can take a little time to propagate DNS updates to remove the server from the round-robin. Some DNS servers will cache the old IP for a period of time, thereby preventing access to your app until their cache is flushed or until the slice is operational once again. RR DNS is not the best solution, but the best one with Slicehost and it gets the job done in a pinch. A reverse proxy would be optimal to prevent this as you mentioned. </p>
<p>NAT is the one feature I&#8217;d love to see Slicehost add, as it would allow larger customers to add new slices behind their single public IP without depending upon a single slice to do the reverse proxy work or stale DNS situations when using RR DNS. I noticed that Amazon EC2 now has this, though I haven&#8217;t investigated it fully.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on FaceStat scales! by brendano</title>
		<link>http://blog.doloreslabs.com/2008/06/facestat-scales/#comment-328</link>
		<dc:creator>brendano</dc:creator>
		<pubDate>Tue, 17 Jun 2008 01:02:03 +0000</pubDate>
		<guid>http://blog.doloreslabs.com/2008/06/facestat-scales/#comment-328</guid>
		<description>James: that's right, we currently use round-robin between web server slices.  Every slice has its own public IP; I think I'm not understanding your concern.  We certainly could point our DNS records at a single slice and run a reverse proxy on there.

PJ, Chris: To be fair, Grails is in the Groovy language, not Java.  I rather liked Groovy the last time I used it; the biggest issue was that the compiler was hideously slow, though I'm sure it's faster now.  Probably F#/.NET is the most advanced dynamic language platform anyway :)</description>
		<content:encoded><![CDATA[<p>James: that&#8217;s right, we currently use round-robin between web server slices.  Every slice has its own public IP; I think I&#8217;m not understanding your concern.  We certainly could point our DNS records at a single slice and run a reverse proxy on there.</p>
<p>PJ, Chris: To be fair, Grails is in the Groovy language, not Java.  I rather liked Groovy the last time I used it; the biggest issue was that the compiler was hideously slow, though I&#8217;m sure it&#8217;s faster now.  Probably F#/.NET is the most advanced dynamic language platform anyway <img src='http://blog.doloreslabs.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
</channel>
</rss>
