Occasionally, I will send you "thinking cap" questions. These are broad questions (some very simple, some somewhat deep) for which, if you have an answer or opinion, you should post it on the blog (as a comment to the question).
Here go the first batch:
1. I mentioned that any two vectors, not just the orthogonal ones, can be used as coordinate axes to define a space. We know that if we have a vector
P= (4,7), its "coordinate" in Y direction is 7 and X direction is 4. If our coordinate axes are not X and Y axes but rather two arbitrary vectors u and v (assume u and v are unit vectors), then what are the coordinates of P in this space? What is the restriction on u and v such that we can always give coordinates to any point in the space this way?
2. How does stemming affect precision and recall?
3. I pointed out two of the problems with the canned relevance judgments used in TREC. That they represent just one user, and they typically represent "binary" judgements. Recalling the form of the R( ) function we dicussed in the class, can you think of other ways these judgements are not really enough to evaluate a ranked list of results?
Here go the first batch:
1. I mentioned that any two vectors, not just the orthogonal ones, can be used as coordinate axes to define a space. We know that if we have a vector
P= (4,7), its "coordinate" in Y direction is 7 and X direction is 4. If our coordinate axes are not X and Y axes but rather two arbitrary vectors u and v (assume u and v are unit vectors), then what are the coordinates of P in this space? What is the restriction on u and v such that we can always give coordinates to any point in the space this way?
2. How does stemming affect precision and recall?
3. I pointed out two of the problems with the canned relevance judgments used in TREC. That they represent just one user, and they typically represent "binary" judgements. Recalling the form of the R( ) function we dicussed in the class, can you think of other ways these judgements are not really enough to evaluate a ranked list of results?
26 comments:
2. Stemming always increases recall because it retrieves all relevant documents which are retrieved without stemming and, probably, some other relevant documents. And the total number of relevant documents is always the same.
I think that stemming usually decreases precision because the number of relevant morphological forms is usually less than total number of forms. Certainly, there are such users and queries (or users with queries) for which precision increases.
3. TREC judgments might not be good for various types of queries. Relevance is query-dependent and the fact that a system is good for one query does not tell us anything about other ones. I think that we can just hope or have theoretical considerations that a system will work as good for other queries. Probably, with some success we can build domain-specific query tests (scientific, medical, car-selling, etc.).
1.
P=(4,7), x=(1,0), y=(0,1), u=(u1,u2), v=(v1,v2)
P=(4,7) is in fact P_xy, ie P in the xy base. To find P coordinates in the uv base, or P_uv, we just need a base change.
A=(u,v) is the matrix of base change from base xy to base uv (A is the vectors u and v written in the xy base). We know that P_uv = Aˆ-1 * P_xy, so we need to compute Aˆ-1.
A=(u1 v1 ; u2 v2) so Aˆ-1 = 1/(u1*v2 - u2*v1) * ( v2 -v1 ; -u2 u1 )
Then P_uv = Aˆ-1*P so P_uv = (p'1, p'2) with
p'1 = 1/(u1*v2 - u2*v1) * (v2*4,-v1*7)
p'2 = 1/(u1*v2 - u2*v1) * (-u2*4,u1*7)
As long as u and v re not collinear, we can always give coordinates to any point in the space this way.
2. I agree with Dmitry : stemming increases recall, decreases precision. The effect on recall is not very important, but the effect on precision might be (cf the used car example).
For 1, I agree with pierre. This may help to understand:
http://en.wikipedia.org/wiki/Coordinate_vector
2) Its interesting to see how over-stemming and under-stemming affect these measures. Over-stemming decreases precision by increasing the recall and under-stemming increases precision by decreasing recall. Whats 'over' and whats 'under' is again subjective.
3. Since most IR systems also have a learning module, one important aspect of an evaluation system is to test Learning as well. It might happen that the system doesn't perform well on the initial set of queries and improves over time. TREC would fail to evaluate this behavior.
3) The same search technique may return different results when the search space is as enormous as the web. There are other considerations that come into play when we search the web, like efficient indexing, ranking etc which may fail miserably with a huge search space.
Stemming would decrease the precision, as roots of different words(words which differ in meaning) may be the same. Hence, there may be considerable number of incorrect results.
On the other hand, stemming would usually increase recall as word variants of query are also matched.
2) The fact that stemming increases recall and decreases precision is particularly true from a mathematical point of view (by looking at the equations of each one).
In other words, and as a conclusion from what has been expressed, stemming increases the volume of documents retrieved, since this technique can be considered as a "generalization" of certain words of a document. This increase may, at the same time, increase the number of retrieved relevant documents, ergo increasing the recall and decreasing the precision.
3. TREC would use fixed set of queries on fixed set of documents and would expect the IR system to return the "so-called" right documents:
In reality the user can give any sort of query outside the TREC set of queries.
A document can also be partially relevant. Binary judgement would be not practical as there is always an element of fuzziness. Ranking based on fuzzy logic could be more relevant than binary.
Stemming increases recall and decreases precision as a large no. of documents are retrieved.
But if we talk of precision in terms of only the high ranked pages (say the first 20 pages), then stemming may give the most relevant pages at the top depending upon the query.
Consider this, if you want to learn to multiply numbers stemming the query 'number multiplication' to 'number multiply' would increase precision in the top ranked pages. (I'm sure there are better examples that this one :)
I can see that Q2 has been over-analyzed but there was a point about recall I wanted to discuss.
Stemming, no doubt would decrease precision as we are looking at words with lower accuracy and (over)fitting them into some categories. If the user gave one word out of such a group in the query, we would have no way to know which one she exactly wrote because we treat them all the same and return all in the group. Impure results obtained thus would decrease precision.
If we were to apply stemming in a perfect Database-like world where the user is absolutely sure that she is looking for the exact keyword she typed and nothing else(eg. something like a Chemical Inventory), recall may not even be affected by stemming. In this case, the search without stemming would have already returned all the relevant results. Adding stemming to this perfect world will only return few more irrelevant results but the recall will still be the same as "all" the relevant documents are returned in both cases. In the imperfect web-world however, recall would probably increase because users dont exactly know what they are looking for and might give ambiguous queries. Stemming may help in fetching some result with keywords the user didnt explicitly mention but was expecting or found relevant later.
Well, guys, one example where stemming increases recall.
Imagine, I am looking for something "beautiful" but in the whole corpus of documents there is no one with this word. At the same time there are some docs containing a word "beauty". These docs are good (relevant) for me because there is something beautiful in them.
Thus, in case without stemming I am getting zero documents relevant to my query but in the presence of stemming I have something (with "beauty" in them). So, stemming increased precision from zero to something non-zero.
Moral: don't say "never" or "always" talking about precision-recall. At least, you can never get inside my head or have a sense of my mood. (I mean you as a developer of IR system)
Another "crazy" example of IR system: one which always returns TOP N docs only. I am curious how stemming changes output of this system.
Probably, we should make more clear specifications of systems (and users :-)) before our judgments. Or we can start reading books :)
// Arguments are welcome.
It's interesting to note that at one point in time, Google was not using stemming technology. Their user help page used to read
"To provide the most accurate results, Google does not use "stemming" or support "wildcard" searches. Rather, Google searches for exactly the words that you enter into the search box."
However, things have changed as their page currently reads
"Google now uses stemming technology. Thus, when appropriate, it will search not only for your search terms, but also for words that are similar to some or all of those terms. If you search for pet lemur dietary needs, Google will also search for pet lemur diet needs, and other related variations of your terms. Any variants of your terms that were searched for will be highlighted in the snippet of text accompanying each result."
I agree with Dmitry Except with the typo in first line :) (an example stemming increases "precision" not recall) . It is true that documents retrieved by stemming is a superset of documents retrieved without stemming. Depending on the ratio of added documents in stemmed result sets are relevant to the user, stemming might increase/decrease precision (recall will never decrease). Since we are not sure if the user wants the variations of a word also, it is safer not to do stemming, if number of documents retrieved is sufficient without stemming, since precision might go down. For most search engines with a corpus of billions of documents, this is often the case.
The comment on Google stemming is interesting too, but search engines like Google and MSN uses hundreds of parameters to retrieve search results (Page Rank is one of them), stemming might be effective in some classes of queries and documents
2. I believe there are different ways of stemming and many algorithms for the same. Here I assume stemming is just replacing the suffix of each and every query word.
Stemming might decrease both recall and precision in some cases - for example if a query "matrices" is stemmed to matrix and searched then the search engine would return results about the movie and not the matrix (mathematics). But there are only few such examples. As the number of query words increases, the ambiguity between 2 or more different context becomes lesser. So if the query words are stemmed the matched result would be high. This increases the recall but the precision on the other hand tend to decrease as some result would be out of context.
1.
I go with pierre.
[P]u,v = A * [P]x,y
Here A is the transformation matrix from x,y to u,v and A. A^ -1 = I.
Now all the points in x,y system can be represented in u,v system. This will fail if u and v are linearly dependent i.e u = (Lamda) . v where Lamda belongs to R
3.
In my opinion the judgment is always affected by different factors like knowledge of the person etc., One user or a small group's judgment is not a good idea because there are millions of people using it. Each one would have a different opinion like the comments in this blog :). The sample space used for judgment should be more.
The documents given in a TREC for search engines could be in different languages or with multiple mistakes. For such documents, indexing becomes a problem.
I agree with Dmitry : The search engine might be good for TREC queries. But reality is each of us express same thing in different ways. A good search engine should try to satisfy all types of queries not the TREC queries alone.
2. I agree with raju. Stemming might increase/decrease precision but recall would never decrease. Let me explain with an example.
As we know
Recall = Intersection of Relevant docs and Docs Retrieved / Relevant docs
Precision = Intersection of Relevant docs and Docs Retrieved / Docs retrieved
let us assume for a query Relevant docs = 10, Docs retrieved = 20 and intersection of both is 5 (without stemming)
then recall would be 5/10 = 0.5 and precision would be 5/20 = 0.25
now after stemming if the docs retrieved is 30 and intersection still remain 5 then recall would still be the same but precision would decrease.
consider the case for precision increase. If the docs retrieved after stemming all become relevant then precision would become 10/30 i.e .33 which is an increase in precision.
In all other cases, an increase in the number of docs retrieved is proportional to decrease in precision. In all these cases we believe the user's thinking is stable :)
2.
Stemming, if done uniformly (all possible query words in the corpus being stemmed down) will usually lead to a decrease in precision and almost always, an increase in recall.
However, if stemming is performed only in cases where needed, an increased recall with reasonable precision can be assured.
For example, a size limitation to the master set of docs can be made (only as a means to increase precision) or selective stemming (keyword based) can be employed.
3.
The state of the corpus supplied for the TREC competition is almost static, compared to the ever evolving web. Also, the TREC retrieval system need not and cannot be configured to find documents that the "so called relevance judges" have already viewed in relation to the query, which contributes very much to relevancy !!
3)Assuming that the TREC people use a representative sample of the web user population and ignoring the issue of binary judgements, there still remains an aspect that they cannot test. How an intelligent search engine uses user responses (clicks, document viewing history)and maybe user profile too,to modify its relevance function over time cannot be gauged. But it is an important factor in determining the quality of a search engine(which is why Google earns big bucks :))
Coming back to binary judgements, the documents cannot have some benchmark rank according to a query(or a relative importance metric).Ideally, the search engine should show the user some diversity in documents in the top-k results, and then act upon feedback(if possible).This (diversity in top results)too cannot be measured by TREC.
1. P=(4,7) in the X & Y space.
Now, u & v are 2 vectors in the X & Y direction respectively and not necessarily orthogonal.
Now, to find the coordinates in the u-v space, we need to get the angle between u & i=(a) and the angle between v & j=(b).
Hence the X coordinate of P(in the uv space) would be 4/Cos(a) and the Y coordinate of P(uv space) would be 7/Cos(b).
This is true for u & v lying on both sides of i & j, since Cos is an even function.
The restriction on u & v is that we need to define the angle they make with the i & j in the X & Y direction respectively.
2. Whether stemming improves the precision or recall, really depends on which word the stemming is being performed on. Stemming when performed always improves the recall, as it gives a larger set of target items to choose from, but it does not necessarily improve precision. It only depends on the word, whether the precision is improved or worsened by the stemming effect. Hence, retrieval schemes have to decide on what words, precision will be improved by stemming action.
Answer for Q2:
I know this question has been tackled well enough, but I would like to add that stemming is nothing but Query Expansion, where the initial query is recreated to improve search results. Doing this helps retrieve more documents that match the expanded query. Hence stemming increases recall but precision is compromised.
Post a Comment