Yesterday, if:book helped present the STORY + conference at Brisbane Writers Festival. The following is an excerpt from a presentation by Simon Groth about the 24-Hour Book, its data, and Willow Patterns. The purpose behind Willow Patterns is to uncover some of the book’s inner workings, to explore the creative and editing process, and to use the entirety of the collected data set to inspire and drive new creative work, true creative remixes and responses that don’t just riff off a finished article.
By the 12 June 2012, we had created a database as close as we could get to the complete quantitative story of a single book’s creation. Pressbooks allows you to export the book in a database format, but that database includes only the final text and reader comments. For some reason, Pressbooks never considered that anyone would want to extract every version of every chapter. I mean, what’s up with that? You mean we’re using tools in ways that were never intended by their designers? That’s crazy talk.
Cracking open the Pressbooks data was much more complicated than initially conceived and took around six months to complete. But at the end we had a shiny SQL file that unfortunately I had no way to open or inspect.
We turned the file over to IceLab, a data and design studio who built a wonderful, glorious web site that allows anyone to browse through every version of the text, download data sets, and establishes an API, an application programming interface, that leaves open the possibility that anyone with a bit of coding skill can readily tap into the data and create visualisations, apps, or whatever they fancy. It’s all there and it’s all accessible. Have at it.
As a coder of the decidedly N00bzy kind, I chose a rather more low-tech approach, downloading the data and compiling it into a massive spreadsheet of doom.
In my defence, I’ve heard this is how Nate Silver works too.
So allow me to share with you a few of the stories within the story, the hidden pictures of how this book was made.
An obvious starting point is word frequency. Working off the final text, the top ten most frequent words in the book is actually a pretty uninspiring bunch: articles and prepositions. Bloody English. The interesting stuff happens further down the list. The first general nouns (not pronouns) begin at around the hundred mark.
Strange coincidences in the coupled words happen everywhere and remain a testament to the human ability to look for patterns in the random.
But my favourite word frequency activity was to construct automated poetry from the list. Take a series of words and you have a word frequency in alphabetical order.
Present it like this and Ern Malley, eat your heart out.
So let’s go a bit deeper into how the authors did their thing.
These are scatterplots showing each time a change was made to a chapter. On the Y axis you see the number of words added or removed with each change. On the X axis is time.
Steven made changes infrequently. He worked offline and pasted text into Pressbooks. He was, in that time though, pretty consistent at 500 words each hour. I’ll leave it you to make the even Steven link. Oh wait, never mind.
Nick shows a similar save pattern and a pretty similar word rate, though with a gargantuan effort towards the end. As, the trend line shows, Nick steadily worked faster as the day progressed.
Rjurik was a little more interesting, making changes and deleting a chunk of text early in the process and finishing at 9:15, one of the earliest authors to hand over (but not our number one overachiever in that regard).
Pam was pretty consistent throughout before a break and a harder push towards the end. You can also see Pam is one of the few authors who re-read and edited her work for a good 45 minutes or some before handing over.
Geoff sails along at a fairly good clip, but he takes some time to find his mojo. After 8PM, he hits the accelerator including this incredibly impressive 1500 words in a little over an hour. He also has an interesting pause before midnight where he reads the text without making any significant changes before adding a final flourish and leaving for bed. Keith and I were getting progressively more crabby around about here.
Okay, the pattern developing here is of pretty consistent work rate that grinds to a halt about 7:30, then takes off like a rocket at about 8:30 or 9:00. Two things happened in that break. First we ate pizza and drank wine/beer. Priorities, people. Second: we shared our progress. A meeting that was supposed to last fifteen minutes went for about 45 as we simply went around the room sharing our story so far and getting an idea of where we might end up. From what a few writers have told me since, this was when we realised we might actually pull this thing off. It was also incredibly fun and enjoyable.
So, what about the authors working directly in Pressbooks? Whose every move was recorded and preserved?
Angela worked at a furious pace in bursts of around a hundred words or so. I’m not sure and the data is unclear whether Ange was working in pressbooks most of the time or moving between a word processor. What seems pretty certain is that ange ended up offline and dropped a chunk of text at the end before saying, I’m out of here.
Krissy's graph is much more detailed and demonstrates her incredible work ethic. The data reflects what I saw on the day. It was like she never stopped typing. Well except for dinner and that break at 2 when she went down to the library café. Also note the time Krissy finished. First as precisely 9PM. Seriously, take a closer look. Krissy was apparently primarily editing from about 6PM. Say hello to our overachiever of the day. Krissy’s chapter is genuinely weird and slightly shocking, but with deliciously beautiful language. That she did this in nine hours without so much as a glance in the rear view mirror makes me eternally impressed. And slightly jealous.
So being confronted with your own workflow is bit weird, but here I am. I delete stuff a lot and I save irrationally. I particularly like the section where I go bananas adding text, then delete it all back out again. Good times. The longer I look at this graph, the more I see the slow, grinding process of extracting word after word. Strangely though, in my memory, the hardest part was around 5PM, but the count shows I was actually at my busiest then. Go figure.
And finally: what’s interesting about Chris’s piece is in the deletions. What did we say at that meeting that cause him to immediately start dumping so much work? Whatever it was, I apologise.
The Subtlety of the Edit
One of the things I was very keen to do with this data is draw out information about the editing process. We devoted nearly half the project to this mysterious process and numerical data does a terrible job of capturing the nature of the edit.
How do you quantify something that defies quantification? What we had to fall back on always finding a way to quantify its unquantifiableness.
Stay with me here.
The X axis here represents the number of revisions made to the text for each chapter. Blue bars are saves made by the author, green by the editor. Here we can see some clear differences between the individual editors. Unlike the authors, all nine editors worked in Pressbooks, so the number of revisions is a reasonably accurate representation of the work undertaken.
What’s interesting: mine and Chris’s chapters that both show a lot of deletion throughout the authoring process receive very light edits. There may be a link between those two facts or it could be random correlation. It’s still interesting…
This is a slightly different metric: the time spent by each author against the time spent by each chapter editor. Again blue for authors, green for editors. Authors are pretty consistent, but the editors spent anywhere from two hours to nearly ten hours on their chapters.
It’s worth noting that these were not yet professional editors, so some variation in their workflow is to be expected. I wonder what these graphs would show with nine highly experienced editors. Would there still be this huge variation?
Finally, the quantifiable impact of the edit on the text. What you see here in the number of words added or deleted over the life of the chapter. You know it’s blue for authors, green for editors.
Where did the editors go? The numbers don’t stack up and the editor’s role is lost. I don’t for second want anyone to think that the editor’s role is therefore unimportant. That’s the worst kind of reductive thinking. Everyone who has read the lengthy word pileups that populate the unedited web will know exactly what I’m talking about. The mark an editor leaves is subtle, so subtle in fact that it is ideally invisible. In an environment of economic rationalism, that’s unfortunate. The true measure of an editor’s alchemy can’t be found in a raw word count. We’re going to keep digging through the data to see what more we uncover.
The full presentation's visual is presented below.