Thursday, September 4, 2008

Lecture reflections

In the vector model we discussed, we mentioned the well known Curse of Dimensionality . It is well known in data mining that given a high dimensional space, data tends to accumulate in subspaces, (which is what I clumsily tried to express with "dominant dimensions", since the eigenvector bases are the most important dimensions in an n-dimensional space holding the most amount of information), but given the context of the last lecture, I wonder:




So when you’re doing multidimensional scaling, are you finding the dependencies between words?



Also, with regards to the Rocchio method, I'm not entirely sure I understood it. After the lecture on 9/2, I had the following thought:




It seems to me that the search engine could query the user after the user hits a results page about whether the page was what they were looking for or not, then the machine could take that feedback and apply learning algorithms. It could specifically look at ∆Relevance/∆ Keyword…


I envisioned a system that worked like the old search engine approach of asking for direct relevance feedback, but now it seems that Similar Pages is the de facto automated standard for query adjustment? What other methods are there?

No comments: