EFF Warns IETF Drafts Could Let Sites Gate Automated Access to Web Content

By ChatGPT — AI-generated · Published:

The Electronic Frontier Foundation on Wednesday put a spotlight on two little-known internet standards efforts that could shape how websites deal with crawlers, AI systems and other automated visitors.

In a Deeplinks post titled “The Free and Open Web Is Under Attack at the IETF,” EFF staff technologist Tori Noble warned about active work at the Internet Engineering Task Force, the standards body that develops many of the technical rules behind the web. The focus is not a new law or a binding internet policy. It is a pair of draft specifications that would give websites more standardized ways to signal how their content may be used and to verify who, exactly, is sending automated traffic.

One effort, known as AI Preferences, or AIPREF, is aimed at making website preferences machine-readable. Its drafts — “Associating AI Usage Preferences With Content” and “A Vocabulary For Expressing AI Usage Preferences” — describe a new Content-Usage HTTP header and a Content-Usage directive for robots.txt, the long-used text file websites publish to communicate with crawlers. The vocabulary includes categories such as bots, search, train-ai and train-genai.

As one AIPREF draft puts it, “This document describes two mechanisms for associating preferences with content: A Content-Usage header field for HTTP; A Content-Usage directive for the Robots Exclusion Protocol (colloquially known as "robots.txt").”

The second effort, Web Bot Auth, addresses identity rather than content preferences. Its charter says the group aims to “standardize methods for cryptographically authenticating automated clients and providing additional information about their operators to Web sites.” In practice, that means automated HTTP clients — software agents that fetch pages or use web services — could cryptographically sign outbound requests so a server can verify who runs the bot. The draft architecture discusses signature headers, key distribution, public lists and verification flows.

Taken together, the two efforts would formalize something that has often been informal. AIPREF would let sites publish structured rules about AI-related uses of content. Web Bot Auth would give sites a way to distinguish between identified bots and unidentified ones.

That combination is what worries critics. EFF argues the drafts could make it easier for site operators to build allowlists, denylists and other forms of gated access to public web content. In the group’s view, the effect would not be limited to large AI companies scraping the web for training data. It could also affect journalists, academic researchers, watchdog groups, comparison-shopping tools, accessibility services, startups and archives such as the Internet Archive, all of which rely in some way on automated access.

The drafts themselves do not require websites to charge for access, and they do not create legal penalties for ignoring a signal. They are Internet-Drafts — works in progress, not final Request for Comments standards. That distinction matters. The IETF process often involves substantial revision, and some proposals never become formal standards at all.

Still, the standards fight is not purely theoretical. Cloudflare, the web infrastructure company that sits in front of a large share of the internet, has been a prominent participant in Web Bot Auth development and has published materials about Web Bot Auth support and “agentic commerce.” Industry outlets have also reported that Google has been testing Web Bot Auth for some automated traffic, suggesting that major platform companies see practical value in the work.

AIPREF also carries a notable legal wrinkle. Its vocabulary draft says, “The vocabulary is intended to be used in jurisdictions where expressing preferences results in legal obligations, as well as where there are no associated legal obligations.” That does not mean the draft itself creates enforceable rights. But it does show the authors expect the same technical signal could be used in places where law gives it more force.

That matters because the legal status of web scraping in the United States remains unsettled and fact-specific. Court fights over access to public web data have been shaped by cases including hiQ Labs v. LinkedIn and Van Buren v. United States. Whether a robots.txt file, a header field or an authentication system becomes legally meaningful depends on courts and lawmakers, not on the draft alone.

For years, robots.txt has been a mostly voluntary convention: a website can ask crawlers to stay out, but compliance depends largely on the crawler operator. What is new in these IETF efforts is the push toward a more formal, machine-readable system for AI-use preferences, alongside a separate mechanism for cryptographically verified bot identity.

That is why the dispute now reaching public view is bigger than a niche argument over web plumbing. It is a standards debate with real policy consequences for who gets to access public web content, under what terms, and with what proof of identity — but it is not, at least yet, a settled change to how the internet works.

Tags: #ietf, #ai, #web, #privacy