Visualizations and Data

When I talked to people about the idea of the site and asked for some preliminary comments, I was a little surprised by how often people said something along the lines of "Oooh! You could do a visualization of X!". That took me aback a little, since I was so much not trying to do visualization. I was trying to mix up the text by blowing up the TEI and reformulating it online.

But, maybe, the requests for visualizations is an intuitive reaction to a blown-up text? Explicitly, my mission was to take a very traditional, scholarly, text and re-express the relationships within it -- relationships between scholars, the text, and the presentation of it. Maybe "ooh! visualization!" -- not just here, but throughout DH -- is ultimately an expression of us seeing all of our texts blown up, and wanting an intuitive way to get a handle on the having-been-blown-up-text? Little reason to explore the relationship between knowledge and "seeing" here, but it seems like there's something related to that going on here.

And, I'm not especially visual-minded. When somebody says "visualization", I really still have no connection to what that means, implies, or could be useful for.

So, in the spirit of hacking, I winged it. Easy enough to push out some CSV data based on what was in the online-textual connections.

Also, I'd heard of -- and people mentioned to me -- Gephi for building visualizations and analyzing them. So I figured it was time to play with Gephi, and that meant pushing the data into GEFX, Gephi's graph XML format. I still don't know what it means, but it was time to play in that area, so here's what I've got.

I put some visual things together -- charts built in LibreOffice of some CSV files, and some exports from Gephi. I'm also including the datafiles here so others can play with them. Feel free to request different combinations of data. License for all the data file is Open Data License. Hope that works and is appropriate!


The primary mission of this project was getting a different perspective on the scholars and scholarship around Comedy of Errors. So we've got, for example, this look at the scholars most "in conversation" with other scholars. "In conversation" here means scholars who are cited within the same commentary note or the same paragraph of an appendix. The chart here only includes those who are in conversation with 20 or more scholars.


We've also got this Gephi visualization, which connects scholars and the speeches that they comment on. Scholars, Speeches, and Commentary are all nodes in the graph. Not sure if that's doing it right or not.


Notes and Speeches

For some example data about speeches, here's a chart of the speeches, with the ratio of their length as counted by lines and the number of commentators on that speech. The numbers on the X-axis are the starting lines of the speeches. I didn't expect as much variation in the ratio. I have no idea why. And I'm really not sure what this measures (if anything). Is it a measure of a "density" of language in the speech? Or is it a slightly different guide to where scholarship has focused?

Speeches CSV

On the Gephi side, here's a graph where the nodes are speeches (blue) or scholars (red). The edges connect a speech to a scholar.

speeches graph speeches.gexf

The above are really just examples of the kind of data that can be pulled out. I'm not enough of a data / distant reading / network analysis person to really do an analysis. The data is here under Open Data license. Let me know if you would like me to pull data differently.

Appendix Paragraphs

This looks toward data gleaned from the appendix, broken up by paragraph. Again, I don't know why, but this intrigued me. Why the spikes? The X-axis is just by the paragraph number in the appendix, so it would be interesting to look for a pattern to where in the appendix these spikes occur. The CSV file this comes from contains the HTML text of the paragraph, so there might be further work to do there.

Citations By Appendix Paragraph Appendix Paragraphs.csv

The Gephi data, similar to notes, includes the paragraphs and scholars as nodes, with edges between them.