Small logo
















IV

Suppose this essay were written and augmented as a processed book?  How would it differ from what you are reading right now? 

To begin with the obvious point, this essay is a processed book, though the extent of its processing is too conventional to catch anyone's attention.  This document was created in Microsoft Word, which is word-processing (good term) software.  It has been edited and reedited, moved around, and spliced and diced.  It has also been sent over the Internet any number of times to friends who agreed to provide comments.  In sending it around, I chose to leave it in the Word format, which enables editing, rather than freezing the text as a PDF file.  Indeed, one of the alleged virtues of PDF files, namely, that they can prevent tampering with the text, may in the context of a processed book prove to be a liability, as uneditable files are less likely to have a network of comments built around them.  (Some new variants and add-ons to PDF technology permit a base file to remain unaltered even as edited versions, including those with commentary, are displayed alongside.)  An early version of this essay has even been mounted on a Web server, which may in time raise a practical problem for me: How will I make sure readers find their way to the current version?  The answer, for better or worse, is that the processed book inevitably leads to a loss of editorial control.  This makes me wonder if in a world without editorial control, authors may cease to write for attribution. 

But let's work through the five aspects of the processed book and see how they apply to this very document. 

As a portal, "The Processed Book" would include links to many Web sites for further information on people and ideas discussed here.  For example, there would be links to further information on Stevan Harnad, Alan Kay, and Ted Nelson.  There might also be links to earlier drafts of this essay, perhaps including the many e-mails I collected from people who have commented on it, not all of them favorable.  The reference to doing a Google search on "computational linguistics" would be enacted right within the text: click and the search results would appear. 

The essay has already served as a makeshift portal for one reader, a longtime friend who went through it carefully.  He came upon the references to Isabel Archer and Osmond and didn't know what they were.  So he proceeded to Google and soon found himself reading about a novel by Henry James entitled Portrait of a Lady, whose protagaonist, Isabel, marries the scheming Osmond.  Why were these references not spelled out in the text to begin with?  For several reasons.  First, in my (East Coast) circle almost everyone knows who Isabel Archer is, though references to Calvino (for example) require greater explanation.  But you never can tell what the range of reference of a particular reader is; there is, after all, no agreed-upon culture to draw upon, no canon of bedrock ideas; in such a world, which is the one we live in right now, the processed book becomes a means of cultural unification.  One reader (West Coast) will be puzzled by Isabel Archer, another (East Coast) by references to algorithms derived from Bayesian statistics.  A writer has to work with all such possibilities, which is why the processed book-as-portal is inevitable.  Another reason Isabel was not explained in the text was my own anticipation of making the very point I am making now; in other words, I left the reference slightly obscure in order to demonstrate the need for the book-as-portal.  I chose to write the essay under my own surveillance. 

The book-as-portal will become more robust in time.  It is one thing to look up keywords in Google (e.g., proper names), but it is still a considerable challenge to capture some allusions.  The phrase "all men are cousins" appears early in this essay, but what is a search engine to do with it?  An alert reader will pick up the allusion to "all men are brothers," but Google also gives us "all men are pigs," "all men are scum," and "all men are created equal."  (Presumably for some audiences, all these phrases are equivalents.)  The alert reader will also note that the phrase "all men are cousins" is the only time in the document that so-called sexist terms are used—but, the writer protests, I couldn't very well have said "all men and women are cousins," as that would have obscured the allusion to "all men are brothers."  There is cultural content here that would make it very hard for a fully automated process to generate a meaningful link.  But this will improve, and probably soon. 

There is also a reference to Walter Pater in this essay, which I will decline to highlight.  Literary students will pick it up, but for everyone else it will remain hidden.  When the time comes when everyone can find the allusion, the processed book-as-portal will truly have arrived. 

As a self-referencing text this essay could provide comprehensive indexes, which are currently not included.  Besides the aid such indexes would provide to readers, such indexes would also make it easier for search engines to find and classify this essay, which would in turn potentially bring more readers to it, assuming it were posted on the Web, as inevitably it will be.  A self-referencing text would also clear up some possible confusion in the preceding section, where I referenced Harnad, Kay, and Nelson.  Only Harnad has been discussed up to this point; Kay and Nelson are yet to come.  A self-referencing text would permit a reader to see all references to Kay and Nelson simultaneously.  Such a text would also cluster metaphors and categories of information together.  So, for example, all literary references could be highlighted, as could all references to the computer industry.  A harder trick would be to identify all the metacomments, of which there are many.  How can a machine tell the difference between saying something and saying something about saying something?  It may be that self-commentary will be the last bastion of the purely human. 

A self-referencing text could also provide quantitative information.  For example, a publishing friend asked if I intended to publish this essay in book form, which brought up the question of its length.  A word or byte count is a trivial exercise for a machine.  Although it isn't clear why anyone would want to do it, this text could also be analyzed to determine how much space was given to each topic or the distance between literary allusions or the frequency of quotation marks and special characters. 

The value of a self-referencing text grows with the length of the work.  For this essay, self-referencing is not particularly revealing; for Moby Dick, it would be breathtaking.  On the other hand, if all the comments made on the drafts of this essay were to be included as part of the text, self-referencing would become more valuable, as it would trace the evolution of ideas.  This raises the question of whether self-referencing of a text should apply only to a particular network node or to the entire processed network. 

It is very interesting to think of this essay as a platform.  To a small extent, it has already served in that capacity.  One early reader asked me for permission to use one section for a project he was working on, a major reference work in botany.  He wanted to present the book-as-platform idea to the writers; he wanted, in other words, to use a section of "The Processed Book" as material for his own work—he wanted to use the essay as a platform.  Well, this is only a tiny matter of technology: all he needs to do is copy the relevant section and paste it into his document.  But what he perceived is that a book is often copyrighted and that he needed more than cut-and-paste technology to use the essay as a platform. 

If this essay were a platform, it would include tools to enable other writers to "call" its text or a section of the text.  These tools would necessarily include copyright information, without which clearing permissions can become tiresome.  (I don't want to get into the fair use aspect of copyright law, though it is relevant here, as it is complicated and certain to provoke much unproductive argument.)  One company (now defunct) had a technology that is likely to be imitated that had the copyright policies of a particular work pop up on the screen simply by having the mouse pass over the object in question.  What would those policies be?  It depends.  A writer or publisher could take a tough stance on copyright, requiring all uses of the platform to involve permission and fees.  Or there might be a matrix for copyright questions, depending on the size and nature of the use—free for schools, costly for corporations, and so forth.  For that matter, the work could simply be put into the public domain. 

Fascinating work in this area is being put together by Hal Abelson and Lawrence Lessig at their public service organization, Creative Commons.  Among other things, Creative Commons proposes to "brand" the public domain, that is, it is developing a set of signposts so that users will know whether or not a particular information object is under copyright.  As part of this project, a series of intellectual property contract templates is being developed, which will allow the owner of a creative work to determine the copyright status of his or her work.  This is important.  Prior to the work of Creative Commons, much intellectual property was either totally controlled by its owner or not controlled at all, that is, it was in the public domain.  The contracts being developed by Creative Commons would allow me, as the author of this essay, to choose an intermediate position.  I might assert the right for all commercial uses of this essay (not many and not worth much), but I might also stipulate that noncommercial uses require no fees or permissions.  If this were a novel, I might insist that I controlled everything in it, but I might make the characters available to others for free or for a fee for derivative works. 

The point here is that as we think of the processed book, we are not only dealing with what technology can do with content but also about the total set of social and legal issues that surround a work.  Social and business rules can be codified and instantiated within technology.  A reader or user can then draw on these rules without fear of violating anyone's rights.  The book-as-platform may have more to do with copyright law and marketing strategy than with bits and bytes. 

With the book as a machine component, things really begin to get interesting.  Hide as I may try, this essay says a lot about me.  The word choice and syntax are mine, the allusions part of my mental framework.  Words and ideas don't have to be original to say something about the person who uses them.  For example, the fact that I prefer the work of Borges to that of Faulkner, though Faulkner is arguably the superior writer, says something about me, even though I couldn't hope to write a line like Borges; we are, after all, our tastes as well as our expression.  The works of Marshall McLuhan and Ted Nelson are as much a part of me as extraordinary tales of growing up in Fort Lee, N.J.  Computers can take this essay and convert it into a proxy for me through various analyses.  In other words, "The Processed Book" is the raw material that can result in a computer agent. 

What would an agent do?  Just about anything.  I would like a well-crafted agent that would regularly poll the Internet for things of interest and that would also filter out a number of related things.  For example, I am interested in copyright issues on the Internet (as this essay reveals), but hardly want to read all the manifestoes of the information-wants-to-be-free crowd: perhaps an agent can find information on copyright and weed out the histrionics.  An agent could also be used to find things that I don't even know I care about by identifying themes in my writing (e.g., submerged metaphors) and matching them to related themes found on servers anywhere. 

Computer agents are not new.  What is new is the increasing sophistication with which they are being built and their purposes.  All Internet users are familiar with the kind of profiling that ecommerce sites habitually engage in, profiling that says something about the kind of merchandise to offer particular users.  Most of these agents are put together, however, in fairly clumsy ways.  So, for example, the all-important Zip Code is likely to say something about one's household income and education level and many other things besides.  But we all know how imperfect Zip Code analysis is.  On my street we have university faculty, Silicon Valley executives, and (apparently) a couple New Age households made up of students and former students.  And let's not forget the transplanted retirees (this is a beach town).  But what, someone is bound to ask, does that say about me?  By taking a statistical abstract of a person's writings, these profiles can become more intimate, and their uses can become more interesting than determining which digital camera I am likely to buy. 

One intriguing application of the use of personal content is to create spam filters.  Paul Graham (see http://www.paulgraham.com/spam.html) has written a white paper on the use of Bayesian statistics to develop highly accurate filters to catch unwanted unsolicited e-mail.  This works by breaking a user's incoming e-mail into spam and not-spam (the user determines which is which).  Then a statistical abstract is taken from both groups and all further incoming e-mail is measured against these abstracts.  An additional feature is that the filter becomes better the more you use it, as you continue to build a larger database, which makes the statistical measures increasingly accurate.  It is not hard to imagine similar processes to be applied to the content of "The Processed Book." 

Were this essay to become a machine component, its task would be to serve as my virtual representative—it would become, in other words, the soul of the machine.  Such a machine would incorporate human culture (mine) into its processes and thus become more human-like in the tasks it can take on.  And why stop with this essay?  We could add all the e-mail I write (and give it a high ranking), all the Web pages I view (and give them a lower ranking, because reading is not as close to the bone as writing), and anything that is my personal expression.  This is the ultimate goal of the processed book: to inform a generation of robots, not to make the world more machine-like but to make machines more human. 

It should be clear by now how "The Processed Book" would serve as a network node.  All the other four aspects of processing would apply here: the portal, self-referencing text, platform, and machine component.  Each of these aspects contribute to the network.  Commentary would sit somewhere between the portal and platform aspects, depending on which text is doing the pointing and which is being pointed to.  "The Processed Book," in other words, like any written document, develops a community around it.  The relative size of that network depends on the importance of any particular book—a small network for this essay, an enormous one for Ted Nelson's Literary Machines. It is noteworthy that such a network has in fact not sprung up around Literary Machines, despite that work's enormous importance, almost certainly as a result of the author's eccentric decision to self-publish, denying Literary Machines of the marketing clout of even a modestly-sized publisher. 

It is an interesting marketing exercise to consider how to build such a network for "The Processed Book."  Most obviously, the paper should be mounted on a Web server, where it will be indexed by search engines, which will in turn point users to it.  It can also be distributed in various pre-publication forms, some of which will inevitably end up on the Web as well (this is already happening).  It can be sent around to interested (and uninterested) parties as an e-mail attachment.  Links to it can be posted in newsgroups.  The way to market this book, or any book, in networked mode is to let the network do the work.  This means relaxing some common controls.  Digital Rights Management (DRM), for example, which can reduce or eliminate the copying of digital works, may be a good economic decision for Stephen King and John Grisham, but unknown authors—like that of "The Processed Book"—are better off allowing their work to be copied and sent around—and even in some cases to be changed somewhat.  Since a friend posted a draft of this essay on a Web site, which I noted in two newsgroups, I have been astounded by the number of responses I have received.  The network is working. 

Of course, not all network nodes are created equal.  (Imagine for a moment what a computer could do with that sentence.  Besides picking up the reference to "The Gettysburg Address," it would also note the earlier passage in this essay where the phrase "all men are created equal" appears and then back into "all men are scum," etc.  The poor machine!)  The book-as-network is a new phenomenon and we still don't know what the inherent rules for building out such networks are.  Does every node have the potential of building an ever-growing network, or do some nodes have the potential to diminish or even wipe out the network aspirations of other nodes, as the wake of a large ship will overwhelm that of a tiny rowboat?  We don't know the answer to this at this time, but my guess is that in a networked world, the big shall rule and that the diversity of voices that currently characterizes the Internet will increasingly become dominated by the roars of a handful of media empires, barring a regulatory regime.  The processed book of tomorrow will have to fight for attention just as much as yesterday's primal book.