
The Processed Book Project did not come into the world fully planned, but rather evolved over a period of several years. As a consequence, some of the terms associated with the project are not necessarily as unambiguous as we might have liked. In particular, the phrase "processed book" is used to refer to several different but related things. To clarify this:
The idea of the Processed Book is a synthesis of many ideas from many people, and an alert reader will pick up on many things that sound familiar. The specific event that sparked the idea took place several years ago, when I happened to see for the first time the Bloomberg online financial-information service. On the screen were various kinds of financial data, organized as tables and charts; the data could be presented in various ways. If you could do this with data for Wall Street bond traders, I wondered, what could you do with newspapers, reference works, even novels? Bloomberg made information seem so malleable. The question I had was whether manipulating information in this way, in particular information that was organized in strings of text, could result in something meaningful, or would the end result simply be gibberish. Casual familiarity with the area of computational linguistics suggested to me that large collections of text could yield emergent properties, just as the mining of quantitative data can point to patterns in such things as retail customer activity. In other words, applying computer processing to texts could teach us something new about those texts.
In 2002 I was working on a consulting project for Hewlett Packard in the area of digital file conversion. In a series of conversations with John Burns of HP Laboratories, the idea for the Processed Book came into focus. The first draft of the essay "The Processed Book" was put together at this time and circulated to a number of people (including Burns, of course) for comment. A revised version was then submitted to Ed Valauskas of FirstMonday, who agreed to publish the article after some editorial changes were made. The article finally appeared in March 2003.
After the publication of the article, my longtime friend and publishing-industry veteran, Charles Levine, suggested that the essay be expanded into a book. (If this book were ever to see the light of day, there would be yet another iteration of "processed book": The Processed Book. Perhaps a major motion picture deal is next.) Although I was skeptical that a publisher would take on the project (I argued that it was simply not a big enough idea to fill the pages of even a short book), with Levine's prompting and editorial assistance we circulated a book proposal. The proposal did indeed get interested nibbles, but ultimately no one bit, though one publisher said she would consider taking it on if we included software that "enacted" some of the ideas put forth in the essay. This prompted Carter, in his capacity of consultant to the Hewlett Foundation, to bring the idea of the project to the attention of Cathy Casserly and Marshall Smith of Hewlett, who agreed to fund a demonstration project. The Hewlett grant was made to the Monterey Institute for Technology in Education, run by Gary Lopez, at the beginning of 2005, and we were underway.
To create the software for the project (PBOS), I made an arrangement with Lynn Brock of Prosaix, a software development and consulting firm. Brock designed the project; the code was implemented by Wayne Davison. In addition to creating the software, Prosaix has agreed to host the service for three years. After that it is our hope that others will carry the ball, downloading and enhancing PBOS and setting up a number of growing and increasingly diverse Processed Book sites. Downloads are available at its sourceforge project page.
The objectives of the Processed Book Project and the essay "The Processed Book" were very different. The essay was written pretty much as a goad, with the target audience primarily being traditional publishers. Goading traditional media companies has become something of a national pastime, of course, but I had a very specific aim in mind, namely, that many publishers seemed to believe that they had "figured out" the Web and that all they had to do was put their properties into a digital format (usually locked PDFs) and watch out for copyright pirates. The idea that electronic publishing had matured troubled me; I knew it wasn't true. While publishers were busy distributing PDFs, which precisely mirror printed text, the Wikipedia was blossoming. In the scientific, technical, and medical (STM) journals world, cries for Open Access to all scientific materials were growing. The blogosphere was emerging as a political force, and countless community-based sites (not to mention the omnipresent real-time communications of instant messaging and cell phone "texting") were altering the way information was created, consumed, passed along, read again, tweaked, and passed along yet once more. Like the poor translator who had managed to get the book out of French but had not properly rendered it into English, traditional publishers had gotten their publications out of print, but had not fully gotten them into a robust digital form.
So goad them I did. For this reason "The Processed Book" was deliberately over the top in parts and rhetorically extravagant, and it was larded with literary references to make an old-line publisher feel at home, a home that I hoped to make tremble. I suspect that what was obvious to people in the technology sector (that the essay essentially took the commonplaces of software attributes and mapped them onto literary texts) went largely undetected by readers from the publishing world. The strategy of having the article comment on itself in a spiraling, self-referencing frenzy was designed to have readers look at the essay with a great amount of critical distance, just as I wanted them to view the traditional books they knew and loved (as do I).
The Processed Book Project, on the other hand, had a different objective. It was not enough to scold a bunch of traditional publishers and drive them to do something, anything, but do something! The project had to do what the essay was demanding that publishers do—that is, something, namely, to think of a text as something more than a fixed expression of a single author, immutable over time, ideally written in stone, though ink on paper would do. The project had to get beyond the frozen expression of a PDF or it could be said that the essay had called its own bluff.
Going from the essay to the software project was a challenging act of translation. To begin with, computers can't do anything with rhetorical flourishes or humor, nor could some of the ideas in the essay be implemented in a reasonable amount of money and with a modest budget. The question was what could we implement; and if the gap between what we could implement and the promises of the Processed Book were large, would the implementation still be useful?
Part Two of this essay will go further into matters of implementation, but one aspect of this that I want to note here is our decision not to try to make the Processed Book Project into an enduring enterprise. The project is a demonstration of an idea (with a tail, which I will explain in a moment), not the beginning of a long-lived institution. This would not be a noteworthy item except for the current craze for sustainability in the not-for-profit sector. Like motherhood and apple pie, sustainability is the kind of thing that no one can really be against, but its invocation often seems to stop mental processes right there. Some things are simply not meant to be sustainable, nor should they. Part of the pleasure of a spring day is its evanescence. Part of the joy of a commercial novel is its fancifulness, which, like a joke, ceases to thrill upon retelling. Part of the value of a research project is that it can bring some things into a new focus, help us rearrange our priorities, and allow us to move on. Unfortunately, bowing before the god of sustainability, many projects in the not-for-profit world are being created to last forever, with a large staff and commensurate funding. There is no doubt that some such projects should indeed be built to last, but many, including the Processed Book, should simply present an idea, thank you for coming, and shake your hand as you leave for the parking lot.
Accordingly, we deliberately set out with a backwards plan: we did not ask, "What resources will we need to build the Processed Book?" but "What can we accomplish with $50,000?" The figure of $50,000 was pulled from the air, though there was some consideration as to what it would cost to get some code written and to mount a Web site. The feature set for the project was scaled to the amount of money available. The Processed Book Project is thus planned to be incomplete and planned to disappear at some point (literally, in three years, when the hosting arrangement with Prosaix expires). Could we have done more with $100,000 or, for that matter, $1 million? Absolutely. But would the demonstration of a $1 million project be twenty times clearer than a $50,000 one? We don't think so.
But, as noted above, it is a project with a tail. The code for the project is being distributed with an open source license. While the Web-mounting of the project disappears at a fixed time, it is hoped that the open source code will be downloaded by many people interested in the Processed Book concept, who will in turn add to the code and launch their own Processed Books site. We ask only that they acknowledge the original project and the support of the Hewlett Foundation, which made this all possible. One way to regard the two-part essay you are reading now, along with the original "The Processed Book," is as part of the documentation for the open source software.
Downloading the open source code is one of three ways, broadly speaking, that a user can engage the project:
A word is in order about the library. We are not planning to build a substantial library ourselves; we prefer to have our users do that. The texts we have selected to date were chosen for very specific reasons. "The Processed Book" (essay) was included for the sake of completeness and to aid in the demonstration of the ideas behind the project. (It also adds to the self-referencing swirl, which is a key attribute of the project.) But "The Processed Book" is a small essay and cannot hope to attract much in the way of annotations, so we cast about for something that could potentially reach a wider audience; and of course that text would have to be in the public domain, as we had no budget for copyright acquisition. We settled on The 9/11 Commission Report (about which we offer no opinion) because it held the promise of bringing in users who had no particular interest in theorizing about electronic publishing. We also added a draft essay entitled "Project Casaubon," which I wrote. Its inclusion in the library stems from the fact that it is referred to below and we wanted a place to point to it online. As for bringing Moby Dick to the Processed Book Project, well, one white whale deserves another. We have noted that as we worked on the project, we kept finding reasons to mount other documents; so by the time the project goes "live," there will be a number of titles in the library, selected with no governing principal other than whim and opportunism.
Another objective was to create a software environment that was pluralistic with regard to Open Access and proprietary content. By "Open Access" we mean content that is available online without any charge to the end-user. (There are many shades of OA content. We do not wish to enter this debate here.) By "proprietary" content we mean the products of traditional publishing, which generally require a payment or toll for access. The world has always been pluralistic in this regard (some information has been free, some has cost money), and certainly that is true today. Furthermore, it is difficult to imagine any scenario in our lifetimes in which pluralism will not prevail. For this reason, we set out to accommodate both forms or ideologies of content. That being said, it should be noted that all the content created for this project is Open Access, which was a stipulation of the Hewlett Foundation with which we were happy to comply, as is the content of the original essay, "The Processed Book." In addition, the software, PBOS, is open source. So this is an "open media" project, even as it looks to work with proprietary media.
An instance of the pluralistic model can be found in the incorporation of proprietary software. Specifically, one of the features of the project is access to a tool called BizVantage, which is the property of Prosaix, the developers of PBOS. BizVantage appears as an option when text is highlighted. (Using Windows, highlight any text in the library and right-click on the mouse. A menu appears that includes the BizVantage option. This is explained more fully in Part Two.) The BizVantage tool begins by using selected keywords and combining the results for those keywords from several search sources. A reader or reviewer of these results, even if he or she merely reads the results and does not interact with them in any way, causes BizVantage to learn the reader's interests. The results are thus dynamic and become more relevant to their original connection to the Processed Book over time. BizVantage is a very sophisticated piece of software and was expensive to create. The PBOS software permits companies like Prosaix to "bolt" their proprietary software right onto the application. Theoretically, Prosaix could then charge for the use of this feature, which would be an added value for any book placed in the Processed Book library. (The use of BizVantage in the Processed Book demonstration project is free, though there is a limit as to the number of topics that can be researched. This economic arrangement was part of the original agreement for creating PBOS.)
Another potential example of pluralism would be for an organization with proprietary content to employ PBOS. In such a scenario, a Processed Book site would be set up, running PBOS. Access to the site would require registration and payment. In the PBOS library there would then be content that is not Open Access. Thus, although the code to PBOS is open source, it can be used for toll-access documents.
Some readers may be shaking their heads and wondering why we chose the pluralist route. Very simply, the answer is that we care about innovation, not ideology, and any mechanism that gets new things to happen is okay with us. Some people are motivated to create Open Access content and open source software (as we are), and that is fine with us. Others wish to invest in proprietary content and software (as do we). God loves all his children. To insist on one side or the other necessarily means that the amount of capital, both human and financial, brought to any problem would be diminished. No one can make a case for starving the future.
With these objectives in mind, how do we measure success? Ultimately the aim is to change the way people think about books and, more importantly, to change the ways books are created and published—with the ultimate aim that these changes will improve the quality of human (and machine) discourse. That's a hifalutin goal for people of skeptical temperament, nor does it or can it lend itself to measurement. The metric we will be watching most closely is the number of downloads from the SourceForge open source site. Each download represents someone who wants to probe what the Processed Book is and can be; thus 1,000 are better than 10 and 10,000 are too much to hope for. Another metric is the number of times the phrase "processed book" appears online. As of this writing (September 18, 2005), a Google search on "processed book" returns 648 sites, not all of which, of course, refer to "The Processed Book" or the Processed Book Project. (This figure has more than doubled during the period in which this essay was drafted.) Will 290 become 1,000? More? And what will be the rate of uptake? Another metric is the results of a search on a BizVantage topic, in this case "processed book." I initiated such a search several months ago and periodically review the results, deleting irrelevant hits ("word-processed book," "a librarian will take the processed book and return it to the stacks," etc.), but watching with growing interest for when BizVantage returns sites that touch on some of the ideas of the Processed Book without using the specific terms "processed book." Ultimately the growth and depth of the BizVantage topical search would be the best metric of all, though as a matter of the agreement with Prosaix, the number of BizVantage topics is limited to 100. We would like to see readers propose other means of measuring the effectiveness of the project.
The term "discoveries" may be too strong for a project of this kind, which set out to build a demonstration of an idea rather than to investigate an unknown domain, but the fact is that we know more about the Processed Book now than when we started. Much of what we learned has to do with the issues and problems of technical implementation, about which more in Part Two. But we also came to understand more about the Processed Book idea, especially insofar as the project developed into something in the real world as opposed to a rhetorical goad for unadventurous publishers.
The primary discovery is that the malleability of the text, which was at the core of the original impulse behind the project, can be taken into areas that were not anticipated and certainly not intended. The original idea of the Processed Book included the possibility that the text created by an author could slip out of an author's control: over time, the text could be amended and commented on by so many people that the author's original text and intent could be obliterated. In effect, we could have authorless texts, or (to state this more precisely) texts with multiple authors, who arise from the community and participate in the creation of an evolving document.
Alert readers will recognize in this an inadvertent description of the wiki form. (For background on wikis, see this site.) Are wikis a form of Processed Book? Well, yes: the Processed Book is an uber-category that encompasses all forms of interactive text-based media. On the other hand, when we lose sight of the original, the word book doesn't seem quite right. There is much to recommend community-based content development (which we believe is likely to became increasingly prominent in coming years), but when the borders of the book extend beyond recognition, we seem to have stumbled upon an entirely new medium.
This fundamental proposition (A book is not a wiki) played an important role in the implementation of PBOS. Our decision was to hold the original text sacrosanct, though we remain uncomfortable with terms like "sacrosanct," as they pull us back in the direction of the traditional book just as we are trying to escape. A book could be loaded into PBOS and it could be annotated in countless ways, including ways that the developers of the first release of the software could not even begin to imagine. All around that book commentary and new processes could be introduced. But the book itself, the basic text that was uploaded to the Processed Book service, that would remain unchanged. Thus, although we acknowledge the theoretical possibility that a wiki can be a Processed Book, we have chosen to implement a non-wiki-like subset of Processed Books. In our implementation the original text and its author continue to occupy a privileged position.
Whether or not wikis are good or bad is beside the point here. My former colleague, Robert McHenry, wrote a brilliant, incisive, and at times hilarious critique of The Wikipedia a while back, but despite the fact that I nodded in agreement at every word of McHenry's piece, I am not sure that he is right about the inherent limitations of wikis. (The comment in McHenry's piece that earned him his 15 minutes of fame was when he criticized the fact that anyone could edit anything, which McHenry said put the reader into a position like that of someone entering a public restroom, without knowing who had been the last person to use the facilities.) Wikis may not be as deliberate or authoritative as traditional works in some respects, but they are not anarchic. What they are not, however, is a fixed text that others comment upon; they are dynamic texts that evolve with the pressures, partially constrained, of the community overall. For the Processed Book Project, we have chosen to implement the "fixed text that others comment upon," but this is not out of disrespect for wikis.
This matter of the "sacrosanct text" also helped to bring into sharper focus another project, which has been running on a parallel track. That project is called (yes, playfully) Project Casaubon, named for the pedant in George Eliot's magisterial novel Middlemarch. Whereas the Processed Book Project in its first implementation concentrates on the single text, Casaubon is being designed to work with a collection. The hypothesis concerning Casaubon is that although individual texts display certain properties, there are emergent properties when texts are organized into a collection. What those properties are or will be, we do not know; the challenge would be to create a suite of software tools or a platform that would invite others to explore for such properties. A highly preliminary draft of a proposal for Casaubon was written by me about one year ago. That draft has been deposited into the Processed Book library for anybody who is interested in reading it and offering comments.
The tools for the Processed Book bring to mind the parable of the blind men and the elephant. No one sees the entire picture and thus every blind man imagines that the small part of the elephant he experiences must be representative of the whole. For each user different tools will come into focus. For me personally the tools of greatest value (measured by how often I wanted to use them) were the ability to affix comments to the text and to create hyperlinks. Although I recognize, at least abstractly, the value of such things as being able to do a quantitative analysis of all the marginalia or of reconfiguring the display in any manner conceivable, as a verbal individual, the ability to "write" on the text had the greatest appeal. One concern I have is that users who come to the Processed Book site may, like the blind men, conclude too rapidly that they have experienced the whole, when in fact it is the virtually unlimited configurability of the Processed Book that is at the center of the idea.
I want to provide an illustration of this last point before handing the cursor over to Lynn. Yes, PBOS permits a user to add comments, links, and other things (collectively called "annotations") to a particular text, but it also permits the addition of entire new classes of annotations, which could be very extensive and involve a significant amount of processing. Let's think for a moment about the college textbook world, which, in the view of many, is about to enter a period of significant change. An instructor may wish to use an electronic version of a textbook for his or her class. This instructor, however, is probably already using a number of electronic tools, including a course-management system of the type associated with companies like Blackboard and WebCT and, in some rare and specialized instances, the open source project known as SAKAI. Such a course-management system could be "bolted on" to PBOS as a new form of annotation, permitting students and teachers to move from the text itself to information on reading assignments, scheduling, and grades. The instructor may also choose to wrap his or her lecture notes around the text or to offer a critique, all in the form of annotations. The Processed Book in the classroom, in other words, potentially becomes the center of a universe of discourse, whose extension is limited only by the imagination and technical prowess of its participants.