Tuesday, July 26, 2011

Egonet visualization in igraph

Just for fun I thought I should implement the egonet visualization from yesterday in the igraph package as well (yesterday I used the network package). Merely 2 hours later I came up with the right recipe... :-) Basically, the fun was motivated by the not so overwhelming visualizations of my last blog. So, I had the idea to export the resulting ego-net-subgraph into some Gephi-readable format.In short, we want to go from the first figure  to the second figure. Yes, I'm sure it is really the same graph!!




I could not find any helpful write/export-function in the network package, but  there is a flexible write.graph-function in the igraph package. Both packages are quite similar, thus a quick transformation would do the trick. So, let's look at the necessary transformations:

tableEmail <- read.table("email_Arenas.txt")
emailNW <- graph.data.frame(tableEmail, directed=T)
randomSample <- sample(0:vcount(emailNW)-1, 10, replace=FALSE)
neighs <- vector()
for(x in randomSample){
 neighs <- c(neighs,neighborhood(emailNW, 1, x, mode="all")[[1]])
}
subgraph <- subgraph(emailNW, neighs)


The replacements are:
  1. graph.data.frame instead of network. It creates the igraph object out of the data frame.
  2. random sample from the interval [0:n-1] where n is the number of nodes, explained later.
  3. neighborhood instead of get.neighborhood. It makes life a bit easier since it naturally includes all nodes in distance smaller than the given order, in this case 1. Thus, we do not need to explicitly include x itself. Careful, the result is a list, so make sure to only append the first entry of the list to the vector neighs
  4. subgraph instead of get.inducedSubgraph.
The plot at this time point looks similar to the other one, all vertices are red. Now, the fun begins! How can we color the random seed nodes? In essence, what we did in the network package was to prepare a list of colors where we assigned a different color to the vertices from the random sample (as identified by their index). We then restricted this vector to those indices which are still present in the subgraph, as identified by the function network.vertex.names(subgraph). It can thus be seen that in the network package the induced subgraph keeps the old vertex IDs as vertex names:

color <- rep(2, times=network.size(emailNW))
color[randomSample] = 3
plot(subgraph, vertex.col=color[network.vertex.names(subgraph)])
However, igraph does not make it as simple for us. First of all, it re-assigns vertex IDs in the subgraph to make them subsequent. Second, these indices run from 0 to (number of nodes -1). This is already the case in the first igraph-object that we created, namely emailNW. This leads to the first surprising behavior. Let's look at the random sample:

> randomSample
> [1]  670   97  352  346  465   53   37 1092  726   74

Let's look at the vertices in the resulting induced subgraph with the function V(subgraph):
 
> V(subgraph)
 [1] "2"    "3"    "5"    "7"    "18"   "19"   "21"   "22"   "23"   "27"  
 [11] "31"   "38"   "40"   "41"   "45"   "49"   "51"   "54"   "69"   "72"  
 [21] "74"   "75"   "76"   "87"   "98"   "112"  "124"  "143"  "148"  "152" 
 [31] "183"  "185"  "187"  "189"  "191"  "195"  "231"  "233"  "237"  "241" 
 [41] "254"  "267"  "268"  "270"  "275"  "280"  "290"  "314"  "316"  "329" 
 [51] "330"  "331"  "333"  "344"  "345"  "346"  "347"  "348"  "349"  "350" 
 [61] "351"  "352"  "353"  "354"  "355"  "356"  "362"  "378"  "392"  "396" 
 [71] "454"  "462"  "463"  "464"  "465"  "466"  "467"  "468"  "501"  "523" 
 [81] "538"  "549"  "556"  "557"  "558"  "559"  "560"  "561"  "568"  "578" 
 [91] "590"  "598"  "627"  "635"  "636"  "638"  "671"  "680"  "711"  "727" 
[101] "743"  "746"  "748"  "765"  "778"  "836"  "841"  "940"  "941"  "942" 
[111] "943"  "944"  "945"  "946"  "954"  "1030" "1031" "1092" "1093"

There is not a single vertex from the random sample set but, suspiciously, for each of them there is a vertex with an ID increased by one. What happens is that igraph uses the random sample as indices to the vertices that are themselves labeled from 1 to n (number of nodes in the graph). I.e., if we address the first ten nodes of emailNW by [0:9] we will get vertices labeled 1 to 10:

> V(emailNW)[0:9]
 Vertex sequence:
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

But (you knew there would be a but, didn't you?) all other attributes assigned to the node set of the graph are indexed by 1 to n. For example, the character with the names (labels) of the vertices is indexed 1 to n:
 
> V(emailNW)[0]
 Vertex sequence:
 [1] "1"
> V(emailNW)$name[0]
 character(0)
> V(emailNW)$name[1:10]
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
> V(emailNW)[0:9]
 Vertex sequence:
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Beautiful. With this it is now a piece of cake to do the coloring in igraph as well:
 
color <- rep(2, times=vcount(emailNW))
color[randomSample] <- 3
V(subgraph)$color <- color[as.numeric(V(subgraph)$name)-1]
V(subgraph)$layout <- layout.fruchterman.reingold(subgraph)
plot(subgraph)

Now, the figure will have 10 green nodes as planned.


Yeah, I know, beautiful layout. That is WHY we need to export it to GEPHI and beautify it there.

Btw, the layout assignment will throw two warnings but please don't ask me why. I hope I will not have to enquire that as well... ;-)

You might want to check that the right nodes got the right color:
 
V(subgraph)$name[V(subgraph)$color==3]
 V(emailNW)[randomSample]

Gephi's version
So, here is the Gephi layout. Of course, a hairball is a hairball - but: did you see the isolated component beforehand?

No comments:

Post a Comment