Hi there,

As Democracy Now! turns 29 this month, the need for independent news questioning and challenging those in power is more critical now than ever. Although this is a period of great uncertainty for news organizations like ours, we are unwavering in our commitment to continue to bring you fearless trustworthy reporting on the issues that matter most. If our journalism is important to you, please donate today in honor of our 29th anniversary. Every dollar makes a difference. Thank you so much.

Democracy Now!

Amy Goodman

Non-commercial news needs your support.

We rely on contributions from you, our viewers and listeners to do our work. If you visit us daily or weekly or even just once a month, now is a great time to make your monthly contribution.

Please do your part today.

Donate

Extended Interview: Mark Graham on Internet Archive’s Work Preserving the Web as Gov’t Sites Go Dark

Web ExclusiveFebruary 28, 2025
Listen
Media Options
Listen

Extended interview with Mark Graham, director of the Wayback Machine at the Internet Archive. He is also part of the End of Term Archive for federal websites.

Transcript
This is a rush transcript. Copy may not be in its final form.

AMY GOODMAN: This is Democracy Now!, democracynow.org, The War and Peace Report. I’m Amy Goodman, with this web exclusive Part 2 of our conversation with Mark Graham, director of the Wayback Machine.

This month, President Trump fired the head of the National Archives, the agency that alerted the Justice Department of his alleged mishandling of classified documents after his first term, bringing them to Mar-a-Lago, which led to the criminal case against him. The person Trump fired, Colleen Shogan, was not director at the time of the case.

This comes as thousands of government websites with information on diversity initiatives, hate crimes, vaccines, environmental policies, veterans’ care and major scientific research are being taken down. Gone, according to The New York Times, are, quote, “More than 180 pages from the Department of Justice, including all state-level hate crime data and seven pages discussing anti-LGBTQ hate crimes,” and, quote, “Eight pages from the Department of the Interior, including several detailing environmental policy initiatives.” Other removed pages include information on vaccines, environmental policies, veterans’ care, scientific research and more.

For more on the major effort to preserve all of this information, called the End of Term Archive for federal websites, we continue in San Francisco with Mark Graham, director of the Wayback Machine at the Internet Archive, a nonprofit library of millions of free texts, movies, software, music and websites. They also host all the episodes of Democracy Now!

Welcome back to Democracy Now! Thank you so much for staying with us. Mark. In this part of the conversation, if you could go back? I mean, your bio alone is truly amazing, and it really sets the context for the Internet Archive, internet.org, and the Wayback Machine. Talk about IGC and everything.

MARK GRAHAM: Oh my gosh, you’re really going back here now, back to the '80s. Yeah, well, in the early part of the ’80s, we didn't have the web as we know. We didn’t really have the internet. But there were a lot of people working to make online services available to help what I referred to then as purposeful networking. And so, a number of us in Berkeley, California, had this idea to build a computer network for peace activists. And so we created PeaceNet, and later EcoNet, and evolved it into the Institute for Global Communications, IGC.org. So, that’s something I spent — I spent most of the ’80s working on that, that kind of work.

AMY GOODMAN: And then, talk about what the Internet Archive is, and specifically — 

MARK GRAHAM: Yeah.

AMY GOODMAN: — the Wayback Machine.

MARK GRAHAM: Right, right. Archive.org, the Internet Archive, is — as you said, it’s a nonprofit digital library. It was founded by a man named Brewster Kahle about 28 years ago. And he had this appreciation that this new media, this digital media, was incredibly powerful, and it was going to become very important, but at the same time, it was somewhat fragile and ephemeral. And I think, you know, that intuition back then has proven to be prescient. And so, he set about basically pressing record, if you will, on the web, on websites and related services, and then, over time, just getting better and better and better at it.

Nowadays, for example, on any given day, we add more than a billion URLs to the Wayback Machine. And we don’t just archive this material and collect it and store it away. It’s all available to people by just going to web.archive.org. One of our guiding philosophies is that use drives preservation. And as millions of people every day that use our service, you know, experience what’s there and give us feedback, then that helps us to understand how we can continue to make the services better.

AMY GOODMAN: Now, are you concerned? I mean, as thousands of webpages of the government are being taken down —

MARK GRAHAM: Yeah.

AMY GOODMAN: — or erased, if you can talk about how the Wayback Machine and Archive.org, the Internet Archive, store so much data and don’t rely on cloud services from companies like Amazon, Microsoft or Google —

MARK GRAHAM: True.

AMY GOODMAN: — which controls storing so much information on the internet?

MARK GRAHAM: Yeah.

AMY GOODMAN: And as much as we have access to now, it could be nothing.

MARK GRAHAM: Well, yeah, I mean, it’s true. We do own and operate our own data centers. They are in multiple locations, both in this country, the United States, and outside of this country. And so, the material itself is physically located, copied, mirrored in multiple locations. So, that’s just — it’s one of several measures that we take to help ensure the long-term integrity and availability of the material.

AMY GOODMAN: Internet Archive is Archive.org.

MARK GRAHAM: Archive.org, right.

AMY GOODMAN: But what is Internet.org?

MARK GRAHAM: Well, Internet.org is this thing that — back before Meta was called Meta, it was Facebook. It was something that Facebook had created, and it was part of their program to help bring the internet to people all over the world by providing free access through cellphone deals. But it kind of came with a catch, and that was you got boxed into getting the view of the internet through the lens of Facebook.

AMY GOODMAN: So, Wayback Machine, Internet Archive. What is NARA? And, you know, there have been major layoffs there.

MARK GRAHAM: Yeah.

AMY GOODMAN: It’s the National Archives and Records Administration.

MARK GRAHAM: Right.

AMY GOODMAN: But if you can talk about why it is so important and what’s happening there?

MARK GRAHAM: Well, I mean, NARA is one of the storied institutions in the United States, that’s responsible for, you know, helping to preserve our cultural heritage. And so, it plays a critical role. It obviously has much more resources than the Internet Archive does. We’re a relatively small nonprofit, only about 120 staff members, and we survive through the kindness of the patrons that donate money to power what we do. And as you said, you know, NARA, as in almost every government agency that I’m aware of, is suffering fairly significant cuts at this time. So it remains to be seen what the impact of that will be. Be that as it may —

AMY GOODMAN: Let me ask you something.

MARK GRAHAM: Yeah, sure.

AMY GOODMAN: How does it compare? How does the removal — 

MARK GRAHAM: Right.

AMY GOODMAN: — of thousands of pages from the federal government websites — how does Trump administration efforts to remove them compare to previous administrations?

MARK GRAHAM: Right. Oh, well, I mean, that’s something that we will get a better understanding of over time. But anecdotally, and just maybe obviously, based upon what we can see, the number of websites and webpages that have been removed in the last five weeks is significantly more than has ever happened in any change of administrations. And there’s an organization, as I noted earlier, Environmental [Data and Governance] Initiative, which will be coming out with some reports quantifying that. But, you know, I mean, just, for example, if you go to tech.ed.gov, which is a section of the Department of Education’s website, and look there for PDF files — these are U.S. government-produced reports; our tax dollars paid for these reports, about various dimensions of education — I counted more than 300 that have been removed over the past five weeks. So, you know, it’s — we don’t exactly know the extent of what has been removed, but by all accounts, it’s fairly significant.

AMY GOODMAN: I mean, one thing we do know is that the Justice Department removed the first nationwide database tracking misconduct by federal police. It also removed — 

MARK GRAHAM: Yeah.

AMY GOODMAN: — a database detailing criminal charges and convictions related to the January 6, 2021, insurrection.

MARK GRAHAM: Yeah.

AMY GOODMAN: Will these databases be preserved by the Wayback Machine?

MARK GRAHAM: Possibly. So, the databases which might be driven by back-end interactive services may not have been preserved in the Wayback Machine, but there were many other efforts, underway and that are actually still ongoing, to preserve much of this material. For example, Harvard University’s Library Innovation Lab collected, I think, about 300,000 data sets, and they’re working to make those available. Multiple people took it upon themselves to archive material in bulk, at scale, and upload it to Archive.org.

So, actually, it’s important to note, anyone can upload material to Archive.org. You just go to Archive.org, and on the top right-hand side, there’s an upload button. Press it, and then follow the directions. In addition to that, anyone can save a webpage to the Wayback Machine through a service that we have called Save Page Now. You get to that from the right-hand side of the Wayback Machine, or simply web.archive.org/save. And we made it even easier. If you install the Wayback Machine browser extension, which is available for all the major browsers, then you can simply press a button, and you can archive any webpage that you happen to be looking at.

AMY GOODMAN: So, NARA, the National — 

MARK GRAHAM: Yeah.

AMY GOODMAN: — Archive and Records Administration, stores not only websites, but films, books, other content. Can you talk about Internet Archive’s efforts to digitize physical books and old media, such as 78 RPM, old 78 records, the legal ramifications of all of this, and where the National Archive keeps all of this, the actual physical data? And is that threatened right now?

MARK GRAHAM: I really can’t speak to where the National Archives keeps all their treasures. That’s a great question. I should probably go try to research that a little bit.

But I can tell you that the Internet Archive, with a mission, as I said earlier, of universal access to all knowledge, works diligently to acquire and digitize, if necessary, to preserve, to organize and make available material across a whole variety of media. And you named a few of them.

So, for example, books, we archive — actually, we acquire and digitize somewhere around a million books a year. We’ve digitized, I think, close to 8 million of them so far. And we make these available in a variety of ways, depending upon some rights issues associated with those books.

We archive academic papers at scale. We have a service called scholar.archive.org, and you can search for papers by their DOI, digital object identifier, by their title, etc.

We archive television news from around the world. For example, when Russia attacked Ukraine, we decided to begin archiving four Russian television news channels 24/7. We archive Iranian news, Belarusian, Ukrainian, etc.

We’ve been donated massive quantities of microfiche and microfilm material by universities and others, and have — for example, some of that included the FBIS, the Foreign Broadcast Information Service, or DTIC, Defense Technical Information Center, material.

You mentioned 78s. The Boston Public Library donated more than 400,000 78s to us. Seventy-eights, in case people don’t know, that’s what happened — that’s what came along before vinyl. And they were these long-playing records made out of shellac that spun around at 78 revolutions per minute. They were popularized initially, I think, in the 1890s and fell out of favor in like the 1950s. After we acquired this material, we digitized all of it and make some of it available.

You did ask me about some of the legal challenges. And it’s been fairly well reported that we were sued by Hachette, et al., four — sort of four of the world’s largest publishers, over our practice of controlled digital lending, which is a library practice of making a digital copy of a paper book that it owns available, one copy at a time, with digital rights management. We lost that case. We appealed it. We lost that. And as has been well documented, we decided not to take that to the Supreme Court. You can read about all of the things that I’m talking about through blogs that we publish at blog.archive.org.

AMY GOODMAN: And I should also say you have all 29 years of Democracy Now!, as we celebrate — 

MARK GRAHAM: Excellent!

AMY GOODMAN: — our 29th anniversary, at Archive.org. If you could also talk about how the internet began? Tell us what DARPA is. Why, what, years ago, the joke about Al Gore saying he invented the internet?

MARK GRAHAM: Well, OK, I mean, different people tell different stories about that. The DARPA, Defense Advanced Research Projects Agency, was instrumental in helping to fund some of the initial work that helped to bring about the internet as we know it, and through the development of standards protocols like TCP/IP and others. And then, later, it became much more useful when people like Tim Berners-Lee, for example, created HTML and HTTP and the standards that we use to publish on this platform called the internet.

And, you know, people like to make fun of Al Gore. Al actually did have a fairly significant role, I think, in that time behind the scenes and helping to bring about government support. You know, in the early days of the internet, I remember I had a 56-kilobit connection that I paid for out of Stanford University, and I had to sign a contract. It was called, I think, an acceptable use policy, an AUP. And at that time, you had to actually sign a document that said that you would not use the internet for commercial purposes, in order to get connected to the internet. So, that was back then. Things have changed a little bit since then.

AMY GOODMAN: Let me ask you about archiving social media posts. Earlier this week —

MARK GRAHAM: Yeah.

AMY GOODMAN: — Senator Elizabeth Warren asked X’s CEO to recover more than 24,000 deleted posts from William Pulte — I don’t know if I’m saying his name right — who Trump —

MARK GRAHAM: Yeah.

AMY GOODMAN: — has picked to head the Federal Housing Finance Agency.

MARK GRAHAM: Interesting. I didn’t know that.

So, yeah, well, we archive much of the public web, as I’ve said, and that includes a lot of social media. So, we don’t get everything. We don’t pretend to get everything. There’s far, far too much. But, you know, we do get a lot. So, I can’t comment specifically on whether or not we would have that particular material, but the — you know, it’s a constant battle and effort to try to preserve this material, to try to even understand what’s there. And I’m not just talking about within the United States, right? I’m talking globally, so in all the different languages and all the different platforms that exist. And it’s extremely challenging to do that, for a whole variety of reasons. For example, the rise of paywalls and login prompts make it very challenging. More and more news, for example, is behind paywalls, and so that makes it inaccessible to us. We don’t consider that to be public. We consider things that you can get to without a paywall or a login to be part of the public web.

And at the same time, this media is very fragile, and it’s — much of it just simply disappears. In 2024, Pew Research came out with a report, and they looked at a particular data set of webpages from 2013, and then looked at that in 2023, and they found that fully 38% of the pages in that particular data set were no longer available on the public web. Now, we then — we got in contact with Pew. They shared their data with us. We worked to replicate much of their efforts, and we found that more than half of that material, we had, in fact, archived in the Wayback Machine. So, yeah, that — on the other hand, there was a fair amount that we simply hadn’t gotten.

AMY GOODMAN: That’s really interesting. I mean, I remember before the internet. And then, as the internet developed, librarians around the country, really considered the freedom fighters of our time, really concerned about access to information, were saying everything should be downloaded, because who knows how it will be changed or deleted? Now, that was way back then.

MARK GRAHAM: Yeah, yeah.

AMY GOODMAN: What about the role of librarians and archivists today? Because people are working —

MARK GRAHAM: Yeah.

AMY GOODMAN: — at universities and all over — 

MARK GRAHAM: Sure.

AMY GOODMAN: — the country right now to ensure that we have access to information and it’s not subject to the whims of the ideology of a particular administration, you know, or social platform owner or, you know, cloud owner.

MARK GRAHAM: Right. No, no, that’s important. I mean, this, you know, web archiving and archiving in general, it’s a team sport. And there’s room and need and opportunity for many, many different people to contribute to this effort in a whole variety of ways. I’ve already mentioned, for example, anyone can use the Save Page Now feature. So, if you see something, save something. You may be the first person to see a particular webpage, web resource, and save it to the Wayback Machine, and then it might disappear moments later, like what happened just a couple of weeks ago with a particular webpage on the CDC website related to bird flu.

You know, you asked about the role of librarians. There was a — is a librarian down at Stanford, Quinn Dombrowski, and she started an effort called SUCHO, Saving Ukrainian Cultural Heritage Online. And I remember when she started it. It started out as a call to join a Zoom session one day, and there were maybe 10 of us on the Zoom. And within a week or two, she had more than a thousand volunteers organized, gathering information about material in Ukraine representative of the cultural heritage of Ukraine in museums and libraries and other places that was at risk, and began efforts to digitize that material, if necessary, and also to acquire the born-digital material that was available on the web, literally as the bombs were dropping and buildings were being taken out and web services were being taken down. So, because of that librarian’s effort with SUCHO, much material was preserved that is now simply lost.

AMY GOODMAN: And, Mark Graham, can you talk about your interventions in, for example, the European Union talking about artificial intelligence, talking about AI?

MARK GRAHAM: Sure, yeah. Well, you know, the EU is considering regulation relative — new regulation relative to AI. And that’s a topic of great interest to libraries in general, right? I mean, you know, the role of a library is to help — one of the roles, at least, is to help a society be educated and to have citizens be armed with information necessary for them to fully participate and to use — I think, to use tools, available tools, to maximum efficiency. And these days, that includes artificial intelligence. Frankly, I think we need all the help that we can get to help address some of the critical issues of our time, whether they be climate change or pandemics or the threat of nuclear war, etc.

And so, the Internet Archive is very interested in helping to bring about ways in which AIs can be levered with the best that humankind has come up with, the best books, the best material. But this has to operate within a rights-respecting environment, and it’s got to be something that has regulations associated with it, so that people understand what can be done and what can’t be done. That’s not the case, for example, now in the United States. This is, by and large, being played out in the court. So, because things are — these conditions are not clear in the U.S., you have various players that are taking actions, and then they may or may not be sued. In the EU, at least, there’s efforts underway to bring about some regulations to establish what the ground rules — right? — might be.

So, the Internet Archive, just this week, published a paper with some of our recommendations about this. And I won’t go into the details. It’s about an eight-page paper. But if anyone was interested in learning more about our position on this topic, they could go to blog.archive.org, and they could read a blog post and then link to our position paper.

AMY GOODMAN: And finally, if you can talk about what you think are the biggest threats to the Internet Archive today?

MARK GRAHAM: You know, I think I tend to think more about opportunities. You know, you see, like, the Bread and Puppet poster behind me with “yes.” I think part of our culture here at the Internet Archive is to try to be helpful, to try to be the best library that we can be. So, I think maybe, you know, a threat, which is really more like an opportunity, is that we maybe are not able to fully realize our potential and to meet the need that people have to be able to collectively come together to preserve our cultural heritage, not just for these times, but truly for all times. But it’s something we’re committed to, and we’re grateful for the 150,000-plus patrons that help support our efforts. And we’re grateful for the people that simply use our service and get benefit from it, and then maybe, I don’t know, maybe tell other people, “Hey, there’s this thing called the Internet Archive, and you can get Democracy Now! episodes there or old-time radio or missing U.S. government webpages.”

AMY GOODMAN: And I just visited Bread and Puppet up in Vermont, so I appreciate —

MARK GRAHAM: Indeed.

AMY GOODMAN: — what’s behind you, the “yes” and the flower.

MARK GRAHAM: Yes.

AMY GOODMAN: Mark Graham, I want to thank you so much for being with us, director of the Wayback Machine at the Internet Archive. And folks can go to democracynow.org to see Part 1 of our conversation. I’m Amy Goodman. Thanks so much for joining us.

The original content of this program is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License. Please attribute legal copies of this work to democracynow.org. Some of the work(s) that this program incorporates, however, may be separately licensed. For further information or additional permissions, contact us.

Up Next

Wayback Machine Saves Thousands of Federal Webpages Amid Purge of Government Data Under Trump

Non-commercial news needs your support

We rely on contributions from our viewers and listeners to do our work.
Please do your part today.
Make a donation
Top