Block the Bots in Enhance Projects

Simon MacDonald’s avatar

by Simon MacDonald
@macdonst@mastodon.online
on

An all ways stop sign. Photo by John Matychuk

The backlash against Artificial Intelligence bots scraping the web seems to be growing. Web luminaries like Ethan Marcotte have written about how and why they are opting out of their work being hoovered up to train “AI” data models. Sites like Read The Docs are stating that AI crawlers need to be more respectful after noticing their bandwidth declined 75% after blocking AI bots. Cloud providers like CloudFlare have made it much easier to block bots.

With Enhance applications you’ve always been able to block AI crawlers by providing your own robots.txt file, but today we are introducing a new plugin called @enhance/arc-plugin-block-bots.

Functionality

The plugin will add a new route to your application at /robots.txt. This route is used to tell web crawlers and bots which pieces of your web site they are allowed to access. By default, the response generated by the plugin looks like this:

User-agent: Amazonbot
User-agent: anthropic-ai
User-agent: Applebot-Extended
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: cohere-ai
User-agent: Diffbot
User-agent: FacebookBot
User-agent: FriendlyCrawler
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: GoogleOther-Image
User-agent: GoogleOther-Video
User-agent: GPTBot
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: Meta-ExternalAgent
User-agent: OAI-SearchBot
User-agent: omgili
User-agent: omgilibot
User-agent: PerplexityBot
User-agent: YouBot
Disallow: /

Once a day, the plugin will check the well maintained ai.robots.txt for new user agents to block. If the list has been updated, your site’s robot.txt file will be updated accordingly. This way you don’t need to constantly update the file as the plugin will take care of that chore for you.

Setup

To add @enhance/arc-plugin-block-bots to your Enhance application first install the package.

npm i @enhance/arc-plugin-block-bots

Then edit your .arc file to add the plugin.

@plugins
enhance/arc-plugin-block-bots

Then all you need to do is deploy your application and the /robots.txt route will be available.

Future Plans

This is just the first release of our bot blocking plugin. We’ve noticed that not all bots are well behaved citizens of the interwebs as some will ignore your robots.txt directives. We are looking at ways to protect each and every route of your application from bots using Enhance middleware or by automatically configuring Amazon WAF Bot Control.

Next Steps

  • Try out the plugin in your project, and let us know if you have any issues.
  • Let us know what metric you want to see next in the plugin. Better yet, send us a PR!
  • Follow Axol, the Enhance Mascot on Mastodon
  • Join the Enhance Discord and share what you’ve built, or ask for help.