IIB #25 Gephi Instructional Videos

You guys apparently really want this ...

My first experience with link analysis was with Maltego back in 2010, in 2012 I took Lada Adamic’s Social Network Analysis class on Coursera, and that is where I learned to use Gephi. The class hasn’t been offered since 2014 and I am unaware of any alternative that starts at the same level.

I recorded a pair of long videos a couple years ago of me demonstrating how I use Gephi to visualize Twitter mention networks.

Things have evolved a bit since these were made. There is a companion video to these two that shows how to collect the data for the mention network, but the software it references has been replaced several times over by the current Netwar System. Back then we had flat files containing text and CSV relationship data.

Today most of the precursor content is stored in our Elasticsearch cluster. Instead of importing a two column CSV, the system uses flat files containing friend/follower numeric IDs, which it then enriches using Elasticsearch queries. We added a composite metric to simplify layout and directly write a GML file that Gephi can read.

But Gephi has not kept up with the times. Their last update was nearly three years ago. They still require OpenJDK 8, which is becoming a bit archaic. If you dig into the developer chatter, they have been talking about a major shift, to some sort of daemon accessed by a web browser, but that is an enormous leap from the current app. The package makes no use of GPU acceleration and it’s the visual portion of the system that breaks down when facing larger networks. If you zoom out before running a layout like Force Atlas 2, the performance can be a couple of orders of magnitude better(!)

Conclusion

The world is headed towards something like Graphistry, a GPU accelerated visualization tool. Even Gephi’s own plans are aimed that direction. The thing that is missing are the rich set of filters, metrics, and layouts available in Gephi. You need something like Neo4j on the back end to do what Gephi’s Data Labratory does and that requires a whole additional investment and skill set.

The open, flexible nature of Gephi fosters not just visualizing data, it lets you get right into the middle of things and play, which is vital when learning network analysis. Even if we get our transition to Neo4j done, I imagine I’ll always reach for Gephi when facing a new problem space to visualize.

So … should I produce a serious of three to five minute videos on various aspects of Gephi?