<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Search engine relevance - an empirical test</title>
	<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/</link>
	<description></description>
	<pubDate>Fri, 12 Mar 2010 04:18:51 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
		<item>
		<title>By: Naoma Lowing</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-2450</link>
		<dc:creator>Naoma Lowing</dc:creator>
		<pubDate>Tue, 02 Mar 2010 09:28:22 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-2450</guid>
		<description>Fantastic post, I bookmarked your blog so I can visit again in the future, Thanks, Naoma Lowing</description>
		<content:encoded><![CDATA[<p>Fantastic post, I bookmarked your blog so I can visit again in the future, Thanks, Naoma Lowing</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: seo edinburgh</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-2360</link>
		<dc:creator>seo edinburgh</dc:creator>
		<pubDate>Thu, 18 Feb 2010 10:58:58 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-2360</guid>
		<description>Great article I've just added to my bookmark list.</description>
		<content:encoded><![CDATA[<p>Great article I&#8217;ve just added to my bookmark list.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: search engine expert</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-2128</link>
		<dc:creator>search engine expert</dc:creator>
		<pubDate>Thu, 14 Jan 2010 12:54:35 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-2128</guid>
		<description>It's quite interesting article. I'm just curious how long are in interested in this subject ? I saw many blogs but Your's it's really informative.</description>
		<content:encoded><![CDATA[<p>It&#8217;s quite interesting article. I&#8217;m just curious how long are in interested in this subject ? I saw many blogs but Your&#8217;s it&#8217;s really informative.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: 0ajz9d</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-1209</link>
		<dc:creator>0ajz9d</dc:creator>
		<pubDate>Tue, 09 Jun 2009 05:37:05 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-1209</guid>
		<description>hjhggg6642 test test</description>
		<content:encoded><![CDATA[<p>hjhggg6642 test test</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: q8qrr0</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-1135</link>
		<dc:creator>q8qrr0</dc:creator>
		<pubDate>Wed, 13 May 2009 23:54:30 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-1135</guid>
		<description>yugygu6756 tyu hffdrtd y guyg ug</description>
		<content:encoded><![CDATA[<p>yugygu6756 tyu hffdrtd y guyg ug</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: csxk16</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-1134</link>
		<dc:creator>csxk16</dc:creator>
		<pubDate>Wed, 13 May 2009 23:02:53 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-1134</guid>
		<description>harb45 test test544343</description>
		<content:encoded><![CDATA[<p>harb45 test test544343</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vitaly</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-860</link>
		<dc:creator>Vitaly</dc:creator>
		<pubDate>Wed, 04 Feb 2009 04:43:05 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-860</guid>
		<description>"Search engine relevance - an empirical test" - Just do search with a search engine, cut URL and paste it to an Insert Text or URL box in www.ClassEngine.com which a free online document/web content classification tool. You will see if returned categories are close to the search query.</description>
		<content:encoded><![CDATA[<p>&#8220;Search engine relevance - an empirical test&#8221; - Just do search with a search engine, cut URL and paste it to an Insert Text or URL box in <a href="http://www.ClassEngine.com" rel="nofollow">http://www.ClassEngine.com</a> which a free online document/web content classification tool. You will see if returned categories are close to the search query.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Liam</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-457</link>
		<dc:creator>Liam</dc:creator>
		<pubDate>Tue, 26 Aug 2008 15:57:44 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-457</guid>
		<description>Outstanding Brendan! Get you a case of beer for that one.</description>
		<content:encoded><![CDATA[<p>Outstanding Brendan! Get you a case of beer for that one.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brendano</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-337</link>
		<dc:creator>brendano</dc:creator>
		<pubDate>Tue, 17 Jun 2008 17:23:46 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-337</guid>
		<description>Jeff, thanks for the suggestion.  I agree that the metric is extremely simple.  I actually experimented a bit with other metrics that fit into this absolute judgment framework, including the count of high-relevance documents in the top five as you mentioned.  The results are about the same.  I think the raw AOL query set is just pretty easy for standard search engines today -- lots of 1 or 2 word topic-oriented queries.

I skimmed through the Carterette paper and it's interesting.  My concern with pairwise setup is, in order to get comparability among query-result pairs, you need to get annotators to do an O(N^2) amount of work.  (Unless you do something horribly complicated with partial orders.)  The absolute judgment task scales linearly, of course.  Given the AMT environment and a fixed budget, if I stay in the smaller-volume task, instead of spending a lot on a quadratic taskload, I can simply get a higher number of workers per result and boil out more noise.  Of course, if it's true the pairwise judgment task is easier -- as the paper claims -- that might make my spending more efficient.  But since it's polynomial, no matter the cost/benefit ratios, there has to be a tipping point where, for a given data set size, you'd always want to switch back to absolute judgments.

Absolute judgments are just so much easier to compute with -- both for analysis and to use as machine learning training data.  I really don't want to have fancy utility inference or stopping rule schemes just to know the relative ranking of my data.  (And I think real-valued scores will always become a necessity.  Theoretical microeconomists have made boatloads of theorems about representing preferences by pairwise comparisons.  It turns out that when you add enough rationality assumptions -- e.g. the sort that are demanded of search engine ranking tasks anyways -- then your fancy ordering can always be mapped back to real-valued utility function.)

I'd be most interested in a paper that compares real-valued scores derived from some sort of pairwise comparison task, versus absolute judgments, and is mindful of the cost tradeoffs in service of an actual goal, like ranking algorithm training.</description>
		<content:encoded><![CDATA[<p>Jeff, thanks for the suggestion.  I agree that the metric is extremely simple.  I actually experimented a bit with other metrics that fit into this absolute judgment framework, including the count of high-relevance documents in the top five as you mentioned.  The results are about the same.  I think the raw AOL query set is just pretty easy for standard search engines today &#8212; lots of 1 or 2 word topic-oriented queries.</p>
<p>I skimmed through the Carterette paper and it&#8217;s interesting.  My concern with pairwise setup is, in order to get comparability among query-result pairs, you need to get annotators to do an O(N^2) amount of work.  (Unless you do something horribly complicated with partial orders.)  The absolute judgment task scales linearly, of course.  Given the AMT environment and a fixed budget, if I stay in the smaller-volume task, instead of spending a lot on a quadratic taskload, I can simply get a higher number of workers per result and boil out more noise.  Of course, if it&#8217;s true the pairwise judgment task is easier &#8212; as the paper claims &#8212; that might make my spending more efficient.  But since it&#8217;s polynomial, no matter the cost/benefit ratios, there has to be a tipping point where, for a given data set size, you&#8217;d always want to switch back to absolute judgments.</p>
<p>Absolute judgments are just so much easier to compute with &#8212; both for analysis and to use as machine learning training data.  I really don&#8217;t want to have fancy utility inference or stopping rule schemes just to know the relative ranking of my data.  (And I think real-valued scores will always become a necessity.  Theoretical microeconomists have made boatloads of theorems about representing preferences by pairwise comparisons.  It turns out that when you add enough rationality assumptions &#8212; e.g. the sort that are demanded of search engine ranking tasks anyways &#8212; then your fancy ordering can always be mapped back to real-valued utility function.)</p>
<p>I&#8217;d be most interested in a paper that compares real-valued scores derived from some sort of pairwise comparison task, versus absolute judgments, and is mindful of the cost tradeoffs in service of an actual goal, like ranking algorithm training.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Dalton</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-331</link>
		<dc:creator>Jeff Dalton</dc:creator>
		<pubDate>Tue, 17 Jun 2008 12:55:15 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comment-331</guid>
		<description>My biggest suggestion is to work on the evaluation metric used.  Precision @5 is the number of relevant retrieved / retrieved for the top five.  Your metric of having at least one highly relevant in the top isn't p@5 and seems easy to attain.  

For the future, I would suggest using pairwise preference judgments as an alternative (Here or There: Preference Judgments for Relevance by Ben Carterette, et al.).</description>
		<content:encoded><![CDATA[<p>My biggest suggestion is to work on the evaluation metric used.  Precision @5 is the number of relevant retrieved / retrieved for the top five.  Your metric of having at least one highly relevant in the top isn&#8217;t p@5 and seems easy to attain.  </p>
<p>For the future, I would suggest using pairwise preference judgments as an alternative (Here or There: Preference Judgments for Relevance by Ben Carterette, et al.).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
