Friday, September 26, 2008

Thinking cap questions: Social networks..

Here are a couple of questions you can discuss (feel free to add other questions you want discussed too). You might want to look at the
blog to see other comments already made before you add your own. Also feel free to only answer the hard ones..

A. We considered three types of measures of centrality and prominence: based on degree, based on "closeness" to other entities in the network and
based on "between-ness"--i.e., how often is the entity going to be on the geo-desic path between other actor pairs. W.r.t. these, think of the following:

A0. Consider three canonical networks of size n: a star network, where one entity has (undirected) edges to all other entities; a circle network; and a line network.
 Think of what are the most "important" nodes in each of the networks, if we measure importance by degree based, closeness-based and between-ness based
notions respectively.

A1. are all these measures equally applicable to both directed and undirected social networks?

A2. In the case of directed social networks, can you attach different notions of importance to in-degree and out-degree?

A3. Can you think of generalizing the  in- and out- degree based notions to transitively take into account the importance of the actor from/to
which the edge is directed? (basically, you somehow want to say that the importance of a link from a node is proportional in some way to the
importance of the node itself).

A4. In the questions above, we kept silent about the label of the edge. Suppose in one network, the edges signify the relation "respects" and in
other network they signify the relation "hates". Does this difference throw a wrench into the kind of analysis the measures in 3 are trying to compute?

A5. Can you think of cases where closeness and between-ness measures of importance could be useful in the web-based networks (social networks, blogs or
page networks)?

B. We talked about small-world networks where the largest geo-desic path between any pair of
connected nodes in the network, also called the diameter of the network, is log of the size of the network.

B0: which of the networks in A0 are small-world networks by this definition?

B1: Suppose you are an actor in a small-world network of size N, and you only have local information (i.e., have access to only the links
incident on you). Suppose you are trying to look for a path to another actor/entity in the network with a certain "goal" property g().
We are interested in characterizing how much work you will need to do, in the worst case and the best case, to find this actor. Answer the

B.1.1  What is the difference between this scenario and the usual graph-search problems you learned about in 310 or 471?

B.1.2. What is the best and worst case complexities (you decide how to measure complexity) of the task?

B.1.3. How does your answer to B.1.2. change if you have access to the whole network (i.e., you can see all the connections between all the nodes)

========that is all for now================


Dmitry Voronov said...

A3. One way of generalization.

Let every node has its own importance I(n) (now unknown). If a node links to N other nodes its importance I is distributed equally among them so that each of them gets I/N on a link (if all links are equal; or distributed proportional to links' weights otherwise). In this way we defined importances of nodes through other importances. In essence, it is a set of linear equations. Solving it we can find importances of all nodes I(n).

I guess (though I am not sure) that pagerank is based on a model like this one.

Anonymous said...

It can be said that a node with a high in-degree could be more important than the nodes with a lower in-degree in the case of a network referring to citations in scientific papers or blog popularity for example (meaning that the more links directed to a node, the more cited or more popular that node is). Using the same types of network, a node with high out-degree could be more important than the rest, if we think of it as a source of valuable information for the rest of nodes (meaning, the more links coming out from a node, the more information it provides to the network).

Retaking the example of the network containing nodes providing valuable information to the network, let’s assume that the valuable information consists of “steps to get to other nodes”. Then, a node containing lots of links coming out from it will have a little or no importance if it does not have information on how to get to other nodes passing through it. On the other hand, a node with only one link coming out from it, but containing vital information on how to reach other nodes passing through it, will have a paramount importance.