Monday, October 20, 2008

qn re: the use of Rocchio method




while doc-doc similarity computation can be costlier than doc-short-query similarity, it is still considered small (afterall you will have to do that to cluster documents, for example).

Having said that, the most obvious place the current search engines use this method is when you click "similar results" link under any result. (see also the discussion in http://nlp.stanford.edu/IR-book/html/htmledition/relevance-feedback-on-the-web-1.html )

rao



On Mon, Oct 20, 2008 at 10:58 AM, Liang Sun <sun.liang@asu.edu> wrote:
Dear Prof.
  I have one question about the Rocchino method. In Rocchino method, we use document to update the query. Initially, the query is very short. However, after feedback, the query is very long, and its length is comparable to that of the document. Thus, it may take a long time to compute the similarity between the new query and the documents. My question is, is Rocchino method used in real IR system, e.g., Google?  Thanks.


On Mon, Oct 20, 2008 at 9:23 AM, Subbarao Kambhampati <rao@asu.edu> wrote:
Recall that in relevance feedback, we show K docs to the user, who marks r of them to be relevant (and thus K-r of them to be irrelevant). We then compute the new query as alpha* old query +  beta*  (sum of relevant doc vectors)/r + gamma * (sum of irrelevant doc vectors)/(K-r).

Since we are showing only *one* document to the user (K=1) , and the user says it is relevant (r=1), we have to assume that the number of irrelevant documents that we have shown until now is 0; which means that gamma factor becomes zero.
[You can't assume that 1023 are irrelevant documents--we haven't shown them to the user--so how do we know that she would have found them to be irrelevant?]

3.[3pt] Suppose the user is shown D in response to the query Q, and the user says
that D is relevant to his query. If we now use relevance feedback to modify Q, what
will the query vector become? Assume that alpha, beta and gamma are all 1.


rao

On Wed, Oct 15, 2008 at 8:59 AM, Farooq Khera <fkhera@asu.edu> wrote:
You were going to provide some insight about the specimen midterm about the question of Relevance feedback in reference to Irrelevant documents?
I was confused about how you calculate the sum of the irrelevant documents.  It would seem there are 1023 irrelvant documents.
and so we know the size of irrelvant documents, such that part of the formula is Gama/#of irrelvant documents * (Sum of (irrelvant document vectors))

However its not well defined what the irrelevant document vectors are?
Unless somehow you can just take the whole corpus as a vector and then just subtract whatever the relevant document parts... which makes sense now i think lol...

-Farooq Khera




--
Liang Sun
Research Associate
Department of Computer Science & Engineering
Arizona State University


No comments: