Web Masters Episode #34: Greg Newby


The 6 Best Places to Legally Download Ebooks for Free | NDTV Gadgets 360

Transcription coming

Greg Newby:

My interests were never purely on the technology side, it was always on the side of what technology helps us understand about ourselves and each other. And so when it came to learning about ebooks, there’s kind of an obvious interest, like, wow, not only can I read a book on a computer, which at that time I didn’t do a whole lot of, I was still reading print books for the most part. But to me, it was more important that the knowledge in a book or the words in a book could become easily available to me much more easily on a computer. And by going down to the library or working on a bookshelf, sort of, I want to know something that might be in a thesaurus that I can use on the online thesaurus. I can use the online CIA Factbook and other bunch of reference materials that were digitized in early days.

These days, we were mostly doing fiction and nonfiction, but not so much reference materials back then reference materials for a big deal or something that I just wanted to remind myself of or remind myself of the passage from Alice’s Adventures in Wonderland. Well, that’s a lot easier to do. It’s a text file on your computer, you have to search for some text using a word processor or a Linux grep command or what have you. Then it is to find a book and then page through the book.

So I think we felt the need of ebooks was to see how the utility of an ebook matched in most ways that of a printed book with the limitations that were a big deal back then, like you’re not going to have a network and a computer, on the beach or in the bathtub or something. So you’re limited in some ways, but match in terms of the words on the page, a printed book, and, more importantly, exceed what you could do with a printed book.

Aaron Dinin:

That was Greg Newby, extolling the value of ebooks is not a difficult argument to make these days. After all, while traditional print books are still ahead of ebooks in terms of total sales. The ebook market is closing that gap fast. So clearly, plenty of people seem to recognize their value. But that wasn’t always the case. In fact, considering the first ebook was published in 1971, it took the better part of a half century for ebook adoption to become mainstream. And it took a team of volunteers to lead the charge. They were and still are the volunteers behind Project Gutenberg. And the project’s current leader with a tenure that began back in 2000 is, well, Greg Newby. Are you ready to hear the story? Let’s get dialed in.

[INTRO]

Aaron Dinin:

Hi there, and welcome to Web Masters. This is the podcast that explores entrepreneurship by talking with some of the digital world’s most impactful early innovators. My name is Aaron Dinin. I teach innovation and entrepreneurship at Duke University. I study the history of the internet. And, oh, by the way, I’ve got a doctorate in English literature. I bring that up not because it has much of anything to do with studying and teaching entrepreneurship, but because it means I probably like books more than the average person, and that means I get to geek out about this episode’s guest and topic more than a lot of you listeners. But don’t worry, you’re still going to enjoy the story of Project Gutenberg.

Just in case you’re not familiar with Project Gutenberg. It’s a huge collection of over 60,000 free ebooks. Anyone can access them, the archive is at www.gutenberg.org, be sure to check it out if you haven’t already, or even if it’s been a while, the collection there is constantly growing and is pretty impressive. In a minute we’re going to dig into why Project Gutenberg exists and why all of us should care. But first, I want to take a moment to thank our sponsor.

Web Masters is being brought to you with the help of Latonas. Latonas is a boutique mergers and acquisitions company that helps people buy and sell cashflow, positive internet businesses and digital assets. That includes things like Amazon FBAs, Shopify stores, SAS apps, content, websites, domain portfolios, and any other type of online work from everywhere internet business. If you’ve got a profitable internet business and are thinking of selling it, contact Latonas has their team of experts can help you get it sold for a great price. Or if you’re interested in buying an internet business, be sure to visit the Latonas website where you’ll find a huge collection of listings featuring businesses you can purchase right now. That website is latonas.com L-A-T-O-N-A-S.com.

Time for a confession. Yes, I love books, but I personally avoid reading anything I can’t get on Kindle. In fact, I don’t remember the last non-Kindle book I read. I know, that’s basically heretical for someone who spent more than a decade of his life studying literature. But, well, I’m a practical person, and ebooks are just more practical than physical books, we can download new ebooks instantly, we can take them with us, wherever we bring our phones, they’re easily searchable, they take up less space, and the list goes on and on and on. But those benefits of ebooks come at the cost of the physical form of the book. And people who prefer physical books tend to argue that the physical form is part of what makes a book, well, a book. And that’s the core debate of ebooks—are the words on the page, the book, where’s the book the physical object, whatever you believe, the people behind Project Gutenberg are mostly interested in serving the former.

Greg Newby:

We’re not making these books for scholars, if you’re a scholar, go into a scholarly software library and find a transcription that meets your needs. We’re doing this for readers. And readers don’t care about the exact page layout in some particular print edition. What they care about is enjoying the story.

Aaron Dinin:

That again, was Greg Newby. He’s our guest for this episode. And he’s a little different than our other Web Masters, guests. Usually, we talked with the founders or creators of whatever venture the episode focuses on. Unfortunately, that’s not possible with Project Gutenberg because its founder, Michael Hart, passed away in 2011. Michael is widely regarded as the inventor of the ebook when he digitized a copy of the US Declaration of Independence in 1971. And then distributed it on what was then ARPANET, a precursor of the internet. In doing so, he effectively started Project Gutenberg. We’re going to hear Michael’s story, but we’re going to hear it through his good friend Greg, who also has a great story. Greg has been continuing Michael’s legacy as the CEO of the not for profit Project Gutenberg project, a position he’s held since it was officially established as a nonprofit back in 2000.

Aaron Dinin:

So yeah, Greg isn’t technically the founder of Project Gutenberg. But with over two decades at its helm and another decade working on the project before that, I think it’s safe to say a lot of the credit for the project’s immense success belongs to Greg, and even Michael recognized that.

Greg Newby:

Something Michael told me once is ringing true. He told me back it might have been somewhere around the year 2000. Around the time when I was getting started as the CEO of PGLAF. He said all the work you’re doing as a faculty member, all the work you’re doing as an individual. This is great, and there’s nothing wrong with it. But given what we’re doing with Project Gutenberg, that’s where you’re more likely to be remembered. That’s where you’re more likely to have a long-term impact. And looking back at all my career changes and sort of the pathway I’ve taken through life, I’d say that he was right, that the work that I do, as someone that’s keeping Gutenberg running, is absolutely through a series of good choices, absolutely going to have enduring values long into the future. And that’s its own reward.

Aaron Dinin:

So let’s learn how Greg found himself running Project Gutenberg and to do that, we’ll start with a little background on how Greg first got interested in digital technologies.

Greg Newby:

Yeah, I remember back in junior high school, the Commodore PET arrived in a computer lab, and I would mess around a little bit. My buddies and I would go out to the mall and we would always stop by RadioShack and do a little basic programming on the Tandy TRS-80, or similar computers that were on the shelves or get him to do some fun displays or something like that, shall we play a game, that was very early, when I got the high school they had a 16 bit computer. I don’t remember the model number anymore. But again, that was on the terminal. These were the yellow paper, typewriter-type devices, teletype devices. And so I would use a text editor to get writing programs in basic and sending text messages to my buddies. And we were doing something like chat at that time. This was before something like IRC had been invented. But we were able to send messages back and forth to the terminals in the room. They also had some other computers, microcomputers in the day that were out in the adjoining lab, but I was on the mainframe.

Aaron Dinin:

So did you study computers in college?

Greg Newby:

I wasn’t actually trained as a computer guy, even though I was using them all throughout my life. I was trained going into college as a social scientist. My majors were in communication and psychology. And my college career in the early-mid 1980s, I was working with a professor who was doing essentially text analysis, he was sort of pursuing a theory of human cognition and human communication, in which messages would be treated a little bit like items with maths and items with force would in a physical model. This is called the Galileo method, by the way, but the point there is that it was all computer-based. He was slurping in text from surveys that were given to people about opinions and attitudes. In addition, the analysis consisted of looking at now what they call fair comparison data. And going into a little detail here because it ends up being relevant to Project Gutenberg the fair comparison data, or how words relate or close to each other and perceived as similar or different by people that aren’t getting their opinions.

So I was helping out with this as an undergraduate student, and then continued on as a graduate student, and then actually continued on as a PhD student working on my own, developing this further theory of how you could treat words and numeric characteristics of words as concepts and sort of figure out what that meant. I was working in artificial intelligence areas, and I was thinking that this might be sort of an answer to how machines could have their own self-concept, how computers have their own self-concept, just like people. So sort of an aspect of artificial intelligence. That was all through the 1980s.

Aaron Dinin:

Okay, I see. So you’re interested in studying language in relation to computers, that seems like a logical connection to Project Gutenberg. How did you first get introduced to the project itself?

Greg Newby:

Again, it was in the later 80s. The internet came along with the same computer systems that were on BITNET, which was basically a storm forward system, good for text messages like the email they got on the internet, which, of course, is a packet-switched system.

Aaron Dinin:

And by the way, if you want to learn more about BITNET, don’t forget to check out our episode with BITNET creator Ira Fuchs, that’s Web Masters episode number 32.

Greg Newby:

And so suddenly, we’re able to have interactive communication, and so, for example, the instant messaging that I was doing on a local system with people that were in the same room or on the same campus, I could use that type of system across the wide-area network and that, of course, emerged later things like Internet Relay Chat and some other protocols. So this was all a predecessor work to the modern internet. What I think is interesting and relevant to Project Gutenberg is that during this period, I got an email which was the Millennium Fulcrum Edition of Alice’s Adventures in Wonderland. This was an ebook, it was around 1987. I had never thought about ebooks, I was a big reader, I was always a big reader since I was very young. But I never thought about ebooks, never realized that such a thing might be possible. But there was a book in my inbox that had been going around the internet. And it was forwarded to me by a buddy. So I knew at that time that people were creating books on the internet.

Aaron Dinin:

And what about digital books interested you? What do you think they mattered?

Greg Newby:

My master’s thesis was looking at how to augment artificial intelligence systems with a self-concept. What I was thinking of there was how we take systems and make them more similar to people, not because we want to have, robot apocalypse or something like that but because we want to augment our own human intelligence. And that was actually my doctoral degree, which was something focused on what we call access exosomatic memory, which is when you hold information outside of yourself. This was a science fiction concept when I was working on it in 1990-1991. These days, is what we all do, right? We have this little phone device that we wander around with our memories, and our contacts and our information on it, or it’s a gateway to other information. We store documents, we store other materials on the computer so we can find them later and, of course, most importantly, perhaps, we use these massive search engines as an augmentation to our own intelligence to our own information processing capability.

Greg Newby:

And this is incredibly cool because, and this is getting a little far away from ebooks, but consider what ebooks do for you. They make it so that your human brain, which is no better than brains were 20 or 30 or 50 or even probably 100 years ago or longer. It makes your human brain, with its limited capability, better able to find information that you need, better able to process information that you need. So ebooks are a part of that. But the bigger part is the broader access semantic memory. And the access semantic memory today is not just the stuff I mentioned. It’s also the libraries. So everyone can be a scholar on Moby Dick or someone that can recite Tennyson’s poetry because they can call it up right there.

Greg Newby:

So to me, there’s not a very explicit overlap but a very broad implicit overlap between having not only electronic books, but a digital library, and all these other things that the technology of the day can do for us to augment our own intelligence, expand our own intellectual reach, add capabilities and capacity to our ability to think.

Aaron Dinin:

For what it’s worth, I couldn’t agree more with Greg. In fact, a lot of what he’s describing here overlaps with my own doctoral work which focused on language as a technology of information storage and dissemination. By that I mean, even though language is something very natural to all of us, it’s still a human innovation. It’s not like we came out of the primordial ooze being able to create the kinds of stories you’re listening to in this podcast, language had to be invented. And it was invented as a way of transferring information across generations, which is a huge evolutionary advantage. Put another way, it’s better for the species to be able to tell our children, “Hey, watch out for the pointy thing on that animal’s head. If it stabs you in the stomach, you might die.” Then for our children not to learn that lesson on their own. So anyway, for me, that’s why the core of books is in their content, not their form. You can agree or disagree, and that’s fine.

Aaron Dinin:

Regardless, what makes Project Gutenberg compelling is that it leverages the internet to help distribute the content inside of books. In other words, it gives books even more reach. That value proposition is what got Greg interested in working with Michael.

Greg Newby:

By that time, I was working towards a PhD, ended up with a faculty position at the University of Illinois, where I met Michael, that was in 91. And in my first semester there as a young faculty member, I saw a newspaper article, and it was an article in the local newspaper with Michael Hart was in section two, he was holding up, I think, a CD of ebooks. And it was basically an interview with Michael, and I thought back, and I realized, wait a minute, I heard about this. I got Alice’s Adventures in Wonderland maybe about five years ago. And in the intervening times, I spent an awful lot of time keeping track of what was going on the internet. Before something like Google, you didn’t have a good way of searching, you had to do browsing, you had to look for stuff and do word of mouth and communicate on email lists with people that might have interesting things to share. So I had, for example, lists of FTP, File Transfer Protocol sites that I would use to show my class about all the cool stuff you’ve got to on the internet.

Greg Newby:

We’re using some earlier protocols or something called gopher. There is a search system for a gopher called Archie, as a matter of fact, about the same time. But in 1991, at Illinois, I met Michael and started becoming involved with Project Gutenberg.

Aaron Dinin:

So Greg got involved with Project Gutenberg in 1991, which was 20 years after Michael Hart created the first ebook. By the way, you heard Greg mentioned using a pre-Google search system called Archie, quick plug that we actually talked to Alan Emtage, the founder of Archie here on Web Masters episode number 21. Be sure to check that out. Now, even though Greg wasn’t there from the beginning of Project Gutenberg, he was kind enough to share the somewhat famous story of how it all got started with Michael.

Greg Newby:

The origin story is well-documented. Michael Hart was someone that had affiliations with the University of Illinois, and someone gave him a username on one of the very earliest computers connected to the internet, or at that time, I think it was called ARPANET. But it was the Internet of the day. And Michael did not have programming abilities. He was someone whose father was a Shakespeare scholar, but his father and mother actually met when they were both code breakers during World War II. They were both cryptologists, very mathematical folks but who also worked with words. So Michaels had been given this access, it was leading up to July 4th of 1971, and he was wandering around Urbana, Champaign, Illinois, where the University of Illinois was, went into a Kwik-E-Mart to get a snack of some sort. And because it was 4th of July, or the day before the 4th of July, that he was given in his little bag, a printout of the United States Declaration of Independence, sort of commemoration of independence in the US.

And sort of like a light bulb, we said, “Oh, I can make a book, I can make an electronic book, which would be this text, this Declaration of Independence.” He went back to the lab, he stayed up all night until actually the early hours of July 5th, 1971. And during that time, he used one of those old clunky teletypes, and I was talking about I was using, still years later in the lab, Illinois and typed in the US Declaration of Independence. At that time, there was no lowercase, there was only uppercase. So we had a US Declaration of Independence all in uppercase. And what do they do? Well, of course, he sent it by email to all of the other people that he knew about to run the ARPANET of the day so that was how electronics books were born.

Aaron Dinin:

And so that was the first ebook?

Greg Newby:

There were other people that had done textual analysis using computers, including at least one that I traced the history back about five or six years earlier who was studying Moby Dick. And there were other people at the time that we’re looking at would call Hermeneutics, which is the study of words in the Bible. So there are other electronic uses of books going on. But no one before then that we are aware of that we’ve heard about has come to the realization that this was a way of actually reading a book, of enjoying a book, of experiencing a book. So Michael did that in 1971.

Aaron Dinin:

So we launched the first ebook, the Declaration of Independence. And then what, how long did it take after that for Michael to turn Project Gutenberg into, for lack of a better phrase, a real thing beyond just one text?

Greg Newby:

About 20 years of work, to get another few dozen books, he spent an awful lot of time working with people on the King James Bible. He spent a number of years, actually over 10 years working with his father, who, as I said, was a Shakespeare scholar, working on the complete Shakespeare. And there are a bunch of other interesting stuff, Alice’s Adventures in Wonderland, which is number 11, we’ve mentioned, and they did things like the Federalist Papers, and the inaugural address of Abraham Lincoln, things like this, these are all on the first 100 or so. He also got donations of some books during that time. So this is the birth of Project Gutenberg, the birth is a pretty neat story. And it’s one of someone that was throughout his life able to see things that other people did not see, or that they saw but didn’t quite grasp the meaning. So Michael was someone that saw people sending emails back and forth and maybe knew something about using computers to count words and do some text processing, but made the leap to electronic books back in 1971. That’s sort of the origin story.

Aaron Dinin:

I know Project Gutenberg has over 60,000 books now. But you’re saying it took 20 years to get the first few dozen done? Why do you think it was so slow to get started?

Greg Newby:

During the intervening years, Michael fought constantly to legitimize electronic books, and not that they existed or could exist, but that they should exist. Librarians were particularly on his case that ebooks are just not useful. You can’t take them to bed, you can’t know for sure whether the words match the given printed edition that’s on a shelf, you’re getting rid of typography and dropped capital letters and damages, other things that make up the book, these were all legitimate complaints. But Michael was being essentially attacked, sometimes personally, as someone that was destroying the book, destroying literature, simply by digitizing electronic books, there’s just a lot of detractors. And he spent a lot of time during those 20 or so years, defending the value or the eventual value of ebooks there.

During that time, of course, technology was growing up, Windows 3.1 came out, and the Macintosh came out before then. And suddenly, people were dealing with documents as documents, not just as text. In other words, you can have a graphical view, like WYSIWYG, measured, see what your documents look like and incorporate graphics and stuff like that. But you got to realize that that was well over 10 years after Michael had been working on Shakespeare and other stuff and being criticized for it, he had another 10 years. And suddenly, the idea of ebooks was not so crazy because we had ways of distributing now, we had to gopher sites and the FTP sites and eventually the worldwide web sites of the day, we also had better methods of creating them.

So obviously, lowercase came around during that period. So we could have mixed cases, we could include graphics files, even if they weren’t embedded because we weren’t doing HTML until the mid 90s. But even if the graphics were embedded, we could at least distribute the graphics alongside the books. So they got more usable, just as computers overall were getting more usable at that time. So the persistent sort of these middle years of Project Gutenberg into the 80s and into the early 90s were very much times of still trying to produce books, but also defending ebooks were even worthwhile to do.

Aaron Dinin:

So then you came along in 1991. And it sounds like that was also a turning point not necessarily because of you. I mean, but it sounds like that’s when the project really started taking off. What do you think that was?

Greg Newby:

Sometimes around the time I met him, maybe a year or so before? Michael had an ambition of doubling the production of ebooks, doubling the number of ebooks and Project Gutenberg released every year. And so when I joined, I think we were doing something like eight or so books per year, then we wanted to double it the next year. So he had this notion of getting to 10,000 by some time in the middle of the 1990s. And then this just continuing on. Since then, actually we made that goal. We made the goal of 10000, I think it was 1994. And I think equally as importantly, ebooks became easier to use, easier to find because of other stuff that your podcast got to deal with. So what came along during that period while AltaVista Internet search engines came along, Mosaic, and subsequent web browsers came along. One of the biggest things that I credit here is Windows 95. Suddenly, not only did you have a graphical interface and a much better graphical interface, but Windows 95 came with networking. And you had things like AOL and other internet service providers.

Greg Newby:

So suddenly, the stuff that was really kind of a hobbyist thing you’re getting on the internet, putting together a modem, figuring out how to dial your modem, and terminal programs, and all that stuff. All of a sudden, with Windows 95. It just work. You get yourself on the internet, with your computer and your modem, and your phone line, and so forth. And so suddenly, the presentation mechanisms that we’ve talked about HTML and so forth, coming along, were much friendlier, much more usable ways of experiencing an electronic book, whether or not it was in plain text or HTML or something else.

That’s sort of the middle period of Project Gutenberg’s longest period of time when Michael was fighting for legitimacy. He was at same time continuing to produce. And, equally importantly, the whole site dices what’s now the internet, this started to emerge, everything was coming together, and those middle 1990s.

Aaron Dinin:

Mid 90s? So we’re talking a while ago, why are you still doing this? That’s a long time for any project, let alone a volunteer project?

Greg Newby:

Now Michael died in 2011, I had become the CEO of the corporation that we set up to handle donations and stuff like that back in 2000 after becoming part of Project Gutenberg in 1991. So we had a little bit of an organizational structure going. We were working with a group that we helped encourage and found in 2002, called distributed proofreaders, distributed proofreaders is an early instance of crowdsourcing. This is where people work together to ease work on an element of a task. Now, in this case, the task is to make an ebook. We continued on, we did not continue to double production every year. But we do continue to add about 200 new books per year.

Greg Newby:

And we continue on because we have a core of volunteers, a number of volunteers that are producing the ebooks or posting the ebooks. Obviously, I’m still involved in giving general project oversight, helping to maintain the technologies, we have some great technology hosting partners with the iBiblio at the University of North Carolina. And I think most importantly, well, probably two things are most important. One is that there’s still work to be done. There are at least a million or a couple of million books out there that are eligible to be done.

And the second is that now that we get rewarded, we get told that these are valuable resources that the work that Project Gutenberg does, to digitize in a very high quality way. The printed works of the past add value. It’s enjoyable to read, they’re nicely formatted. They’re not all capitals, they are not without their images, we solved all those technology problems over the years, so they’re really quite enjoyable to read. And of course, you can put them on your ebook, reader, tablet, phone, other types of devices that so many people are using to read the essays. So the growth of ebooks as an industry as the generally accepted way of interacting with the printed word has helped to maintain Project Gutenberg relevancy.

Aaron Dinin:

In my mind, it’s this last part that really speaks to what’s so fascinating about Project Gutenberg as an idea. Personally, I remember encountering Project Gutenberg sometime in the mid 90s. And thinking, this is sort of interesting, I guess. But why would I read a book on a computer? That’s not very practical. And to be fair, it wasn’t practical to read an 800-page George Eliot novel on my big old desktop PC in the mid 90s. But fast forward a few decades, and I’ll happily pull up my phone to read a few pages of that same book, while I’m waiting in line at the grocery store. Michael Hart saw that future all the way back in 1971, which is kind of crazy. And almost equally impressive is the work Greg has been doing to continue fulfilling that vision since he became the CEO of the project in 2000.

Greg Newby:

The Project Gutenberg literary archive foundation is the organization that I’m the Director and CEO of, we have a small board or a nonprofit. And really, I’m the only one of the board members who is actively involved in the day-to-day operation. But I’m actually very actively involved is something that picked up over the years back from when I was first working with Michael I was helping with FTP sites and made a book or two, but these days, sort of the top of the pyramid of this sort of benevolent dictatorship that I inherited from Michael, where we tried to work with some other people and not entirely dictator. But I worked with a small group of people to try to figure out how we’re going to do stuff over time.

So we have in Project Gutenberg, we have a number of volunteers, everyone’s a volunteer, including me. And we have a production team, which is about a half dozen or so people that take the uploads, that take stuff that people submit, and post those as new ebooks to www.gutenberg.org. So we have some technology sort of antiquated, there’re some elements that we’ve been modernizing. But more or less what we’re doing with Project Gutenberg is pushing files around, and making sure that those files, usually it’s a text file, an HTML file, and some associated images and some metadata, make sure those files meet our production quality criteria. So we have automated tools for looking for spelling errors and the types of errors that scanners make when they’re going from a picture page to words. And we do some formatting, we make sure HTML is valid. There’s a bit of a description of this on a website called upload.pglaf.org. So if anyone wants to know some of the details of these tools that we use, and the process and the steps, that’s the place to go.

Aaron Dinin:

How do you choose what books get digitized?

Greg Newby:

This has actually evolved a lot over the years. And we have a collection development policy at www.gutenberg.org, which goes into detail on what we accept. And the short answer is we only accept items that are in the public domain in the United States. Historically, we have all kinds of things that got submitted. And part of this was that in the early days, there weren’t websites and ebook sites all over the place, and you couldn’t like self-publish or very easily get something that you created online, it wasn’t that easy to do. You have a bunch of things in the collection which these days we wouldn’t add to the collection. And we love that stuff. It’s just that we’re focused entirely these days on items that are in the public domain in the US.

Aaron Dinin:

I was wondering about copyright, that has to be a huge issue. How do you navigate it?

Greg Newby:

Copyright is something that Michael Hart didn’t think he would need to be an expert in. And then it turned out he did. And I followed on, and now I’m an expert in copyright, including international copyright, but very much so in the US where Project Gutenberg is based, and all of our servers, all of our infrastructure. The bottom line is that Project Gutenberg has always been very, very diligent about following applicable copyright law. Now, laws have evolved over that period, we had major revisions in 1976 and 1998 to the US copyright law, as we’ve continued to follow it. What that means is that when we were making an ebook, we had to make sure that it was legal to do so. And with a small number of cases, this was by permission. But in most cases, it’s due to having no copyrights.

Its copyright term expired, and it entered the public domain, or in some cases, like with government publications, I mentioned the CIA Factbook because I’ve never had a copyright. But mostly it’s stuff that copyright has expired. So Jane Austen’s Pride and Prejudice, published in the 1800s, entered copyright no later than 56 years after that time. So it was long in the public domain by the time we digitized it.

These days due an extension of copyright to 95 years in most cases, we’re working on items that are almost 100 years old they’re 95 years old, or older, of course, so we had actually a delay 20 year hiatus between 1998 and 2018, when nothing new had its copyright expire in the United States, we continue to work on older materials, but couldn’t get an extra year of stuff to work on in what way that’s moving forward.

Aaron Dinin:

And can I ask since you’ve had to become an expert on copyright law, what do you think about copyright? Here on Web Masters I’ve had an opportunity to speak with lots of innovators who are some of the early digital file sharing pioneers. And needless to say, they aren’t huge fans. So I’m curious to get your take.

Greg Newby:

So we’re all copyright experts. We are not against copyright. The founders of the United States had as article 11 in the Constitution, the notion that copyrights were important to help to expand and sustain the useful arts that they would encourage people to write books and do other things, because they could have a limited term copyright on limited term monopoly through copyright on what these days we call intellectual property and the ownership of that at work. That has a long history going back to the statute of Anne, in England, where they were worried about people stealing works and republishing them themselves. This is a real concern. I don’t have any gripe with the notion of there being a copyright. Unfortunately, in the United States, like most of the rest of the world, copyright is way out of control with a 95 year copyright term. And that’s one of the longest in the world, the 70 years in most other places in the world.

Greg Newby:

With that very long copyright term, what that means is that the author, the creator, isn’t getting any benefit, because they’re dead, the creator is probably long gone, by the time a 95 year term expires. And in fact, the publisher might be gone, I might be nobody trying to sell this work or benefiting from it. So the founders original intentions are not being met by official law and copyright term. Instead, the beneficiaries are either the heirs, so people that didn’t actually create the book, but happened to be lucky enough to be progeny of someone that did have the author or creator, or moreso, the corporations that are walking up that book and selling it decades and decades after the first creator is in the ground. So it’s unfortunate that copyright term is so long. But going back to what I said earlier, Project Gutenberg is very, very diligent and stringent about making sure that anything that we put into our collection is legal to do so. And with our new collection development policy that’s only focused on items that are public domain in the United States.

Aaron Dinin:

That was a well-organized answer. I appreciate it. You’re respectful of copyright. Okay, fair enough. So the other thing I’m curious to know is what kinds of trends do you see about what books are getting read?

Greg Newby:

So every day, if you go to the top 100 page at www.gutenberg.org, you can see the things that were downloaded the most yesterday, the day before, and we can do a trend over a week or a month, and also keep track of authors. What’s interesting is that the top books don’t change that much over time, they only change very slowly, but they do tend to shift quickly, or one or two books will shift quickly based on popular media, especially movies. So when there’s a movie out, then suddenly something will become a top book. A good example of this is Pride and Prejudice by Jane Austen. This is almost always in our top five. And it wasn’t there until the movie came out prior to budgets, and then suddenly it rocketed up to the top. And it stayed there, and it’s been there for years. This is, of course, a great book to read.

Sherlock Holmes tends to be towards the very top. In the top 50 usually, it’s Franz Kafka’s The Metamorphosis, which is a fantastic book, and one which is not subject to a movie that I’m aware of that seems to persist in the top 20 we see Bram Stoker’s Dracula tends to stay up there, and so does Mary Shelley’s Frankenstein, also tends to stay at the top. So it does evolve somewhat over time. But for the most part, the top books are the types of things that you would guess, or they include the types of things that you would guess would be in the most popular works.

Aaron Dinin:

Okay, people are reading Jane Austen and Kafka as a literature guy. I’m happy to hear that. And to all those people using Project Gutenberg or even people who aren’t, is there anything you wish they’d read more of?

Greg Newby:

Well, I think it’s all good, really. I don’t want to encourage people to read particular types of things. I just want to encourage people to read. And I don’t want to encourage people to only read for enjoyment or only read for reference or only read because you’re looking at a phone but not use that same book, when you’re in the bathtub or something like that. I don’t want to tell people how to use these materials. I just want to make sure they’re as available and as usable as possible.

Aaron Dinin:

Okay, so you won’t commit to a preferred text on there. Fine. How about more broadly, what do you wish people knew or understood about Project Gutenberg in general?

Greg Newby:

I think the main thing that people should know about Project Gutenberg is these are unlocked books, they are files. They’re just files, and you can do with them as you will. So if you want to print something, or you want to make the font larger, or you want to grab an extract and send it to somebody, or take an image and put it into a montage or use it as your desktop wallpaper. These are entirely unencumbered files. There’s no digital rights management, there’s no impediment. There’s no custom software, none of that stuff. So we work really, really hard to make these as widely usable for any intended purpose as possible. Now, we do have the ebook format. There’s something called the EPUB and mobi, which is the Kindle format, these are not as easy to sort of unpack and work with. But these are derived automatically from HTML or plain text and HTML. That’s the internet markup language, Hypertext Markup Language, you can save that to your computer, put it into a Word document, post to social media, anything that you want.

So I think that’s the main thing is that, unfortunately, people are used to being limited in what they can do with digital content. And Project Gutenberg seeks to be unlimited, unlimited use, unlimited redistribution, and, of course, unlimited enjoyment that you can have on these books. So I think that’s something that’s important to share as one of the main characteristics that differentiates, say, the Pride and Prejudice that you can get from www.gutenberg.org from the Pride and Prejudice you can buy in the Kindle Store for 12.99. That Kindle store book is going to be digital rights managed, it’s going to be locked. It’s not going to be something that you can share with your family. It’s not going to be something that you can forward on to other people who might be interested in it unless you forward link and they buy a copy.

Ours on the other hand, same words, same story, text, but has none of those limitations, unlimited redistribution, unlimited ability to make derivatives and no limitations on how you can enjoy, work, write and make use of them.

Aaron Dinin:

What Greg is saying here is definitely worth highlighting. One of the most underrated aspects of Project Gutenberg and other resources like it is the freedom they offer. Project Gutenberg has 60,000 completely unmarked texts. And, oh, by the way, many of them happen to be some of the most important texts in human history. And anyone with an internet connection can access, enjoy, and above all, share them, however, they’d like. You can’t do that with a Kindle book. Heck, you technically can’t even do that with this podcast episode, you’re streaming. And at least in my mind, that will always make the resources on Project Gutenberg more valuable than their printed counterparts.

I’m not arguing printing doesn’t have benefits. But mass instant shareability certainly isn’t one of them. And if, like me, the thing you prioritize about books is what’s written inside of them, then being able to share that content with anyone anywhere in the world at any time is immensely valuable. Michael Hart believed that when he launched his first ebook back in the summer of 1971, and Greg Newby continues that legacy today.

Greg Newby:

These days, it’s obvious Michael was right back in 1971, he was seeing decades into the future that people would be using computer-type devices to experience the printed word. So he was right. And it took at least 25 years of him devoting his life to making this reality come about before finally, the shift began to occur. And suddenly people weren’t ridiculing ebooks or criticizing ebooks, they were actually reading ebooks and enjoying ebooks and buying ebooks and giving away books and so forth. So this was, to me quite a story. And, and from my point of view, it was sort of a bit of serendipity that I ended up meeting Michael being in the same university in the town where he was based back in 1991. And yet, also, from my point of view, this is a very tightly aligned story with my own story of trying to empower and enable human intellect through technology, and ebooks, obviously, a big piece of that.

Aaron Dinin:

Well, I’d like to thank Greg Newby for taking the time to tell us about his work with Project Gutenberg. And hey, if you’d like to volunteer, I’m also going to let Greg explain how to do that.

Greg Newby:

These days, most Project Gutenberg volunteers are actually distributed proofreaders volunteers. So if you go to www.pgdp.net through to proofreaders, you’ll find a huge and wonderful community of people that are working on the various stages of producing an electronic book before it gets submitted to the upload portal that I mentioned earlier. This is where people can just do proofreading, which is essentially comparing a scan of a base against the words that have been extracted from that page. There are other people that like to do the formatting, things like the de-hyphenation, the markup on the final assembly. There’s also something called smooth reading. I love smooth reading, smooth reading is where the book has been through all the rounds of proofreading. And all you want to do is have someone enjoy it, someone read it because if you’re reading it, or you’re likely to find some errors that were missed at earlier stages because this proofreading is what we call asymptotic.

In other words, you can make it better and better and better. But it’s tough to get to really perfect. So we tried to have multiple rounds. So through to proofreaders is a play for our most Project Gutenberg volunteers are spending their time, and the books from BP make up the largest portion of new books to Project Gutenberg, meaning they provide most of the new books from proofreaders so that’s really the place to start. Most of the roles that we have within the PGLAF side of project convert, those tend to be somewhat specialized, and not that many of us. So we have a small production crew, we got a small technology crew. We have a few people that are working on the backend technologies, for example, convert HTML to EPUB or mobi format. So people are interested in something that’s more on the technical production side, surely they can get in touch with me. My contact info is on the page at gutenberg.org. But for most people, the place to get started is with distributed proofreaders at pgdp.net.

Aaron Dinin:

So there you have it, you can help continue the legacy of Michael Hart by becoming a volunteer for Project Gutenberg and/or distributed proofreaders. Also, you can help contribute to the legacy of this podcast by making sure you’re subscribed by leaving a great review and by sharing this episode with a friend.

I’d like to thank our audio engineer Ryan Higgs for helping make everything sound great. But I’d like to thank our sponsor Latonas for their support. Remember, if you’re interested in buying or selling an internet business, be sure to check out latonas.com. If you’re interested in sharing thoughts or feedback, find us online.

We’re on twitter @WebMastersPod. You can also find me on Twitter I’m @AaronDinin, that’s A-A-R-O-N-D-I-N-I-N as you can read lots of articles for me about startups and entrepreneurship over on medium.com. Just search for my name to find me. And that should keep you busy until we release our next episode coming soon, I promise. But for now. Well, it’s time for me to sign off.

Leave a Reply

Your email address will not be published. Required fields are marked *