AI Bot management: new challenges for your team

The growing proliferation of AI-powered bots requires granular control over content and the access permissions granted.

Some of these bots perform functions necessary for indexing and ranking web content, while others operate indiscriminately, consuming and collecting information without consent.

The challenge of unwanted AI bots

Content creators and businesses are seeing their websites being crawled and used to train AI models or generate derivative content without their consent and without driving traffic. This activity, in addition to threatening business models based on content monetization, can also generate an unwanted increase in traffic, impacting site performance and increasing operating costs.

A recent article in The Register analyzes the case of the Wikimedia Foundation and highlights how its infrastructure, designed for peak human traffic, with a logic designed to distribute the most in-demand content closer to users, is suffering excessive visits from AI scraper bots that do not respect popularity and visit pages with less interesting topics that have to be served specifically. This consumes more resources and causes a significant increase in costs. They estimate that 65% of their most expensive traffic comes from bots, although they actually browse 35% of the pages. Their infrastructure managers are considering reducing the traffic generated by scrapers by 20% measured in requests and 30% measured in bandwidth.

The good old robots.txt

Traditional methods for granting permissions and establishing rules, such as robots.txt, have proven insufficient against AI bots that ignore established protocols to equally access websites in search of content. Mitigating the activity of hyper-aggressive LLM crawlers is imperative. The key lies in the ability to discern between “good” and “bad” bots and applying specific control policies.

Identify the source of a bot and manage its traffic

To effectively manage bot traffic, the first critical step is to properly identify it. A bot is, essentially, an automated program that interacts with websites. Its origin and purpose can be inferred through several mechanisms:

User-Agent String: although easily forged, the User-Agent string is the first indicator of the type of bot accessing a site.
IP and Reverse DNS Analysis: a more robust technique involves verifying the bot’s source IP address. Legitimate bots from large companies like Google or Microsoft typically operate from known IP ranges and can be verified via a reverse DNS lookup, which confirms domain ownership.
Browsing behavior: access patterns (speed, sequence of requests, type of content requested) can reveal whether a bot is acting maliciously or following a predictable, legitimate pattern.
HTTP Header Analysis: the presence of specific HTTP headers, implemented by the bot provider, can serve as an additional verification method.

The decision to allow or block certain bots is strategic and depends on each organization’s objectives:

Allow search engine crawlers: this is essential for indexing and visibility in search results, which translates into organic traffic to websites. Indiscriminately blocking these bots can negatively impact SEO rankings.
Allow analytics or monitoring bots: certain bots are used by web analytics, performance monitoring, or security tools, and blocking them would prevent the collection of valuable data.
Block unauthorized scraper bots: these bots seek to extract large volumes of data, consume bandwidth and server resources, and, in many cases, exploit intellectual property without permission or attribution. Blocking them can reduce operational costs.
Block malicious bots: this includes bots that attack DDoS attacks, spam, ad fraud, or credential stuffing, which pose a direct threat to the security and availability of the site.

Verified bot control

Perimetrical expands the capabilities of its Bot Mitigation with a new feature that allows for greater granularity when taking control of AI bots. It places particular emphasis on Google bot verification and is designed for customers who want more sophisticated traffic management.

The key to this improvement lies in verifying the origin of AI bots that identify themselves as Google, checking with the company itself to see if these requests originate from its bots. Perimetrical offers a specific HTTP header so our clients can better manage their bots:

Accurately identify: determine whether a bot posing as Google is actually originating from Google infrastructure by validating its IP address against Google-owned IP ranges.
Filter traffic: distinguish between legitimate Google bots (e.g., those that power Google’s advanced search results or AI features) and those that pretend to be legitimate to perform unauthorized scraping or malicious activities.
Enforce granular policies: offer users the option to allow verified Google bots through, while automatically blocking traffic from unverified bots attempting to exploit Google’s identity. This prevents bots impersonating Google from consuming resources and extracting data without consent.

This precise tool provides more possibilities for protecting digital content, ensuring that only authorized and verified bots can access the information.

What can Transparent Edge do?

Transparent Edge is positioned as a strategic partner for organizations seeking to protect and optimize the delivery of their digital content. Our solutions are designed to:

Mitigate bot attacks: block malicious bots, including scrapers, spammers, and fraud bots, using advanced machine learning and heuristic techniques.
Optimize performance: ensure legitimate traffic accesses your content quickly and efficiently, even during peak demand or DDoS attacks.
Protect intellectual property: implement granular access control policies, such as Google’s new bot verification feature, to preserve the value of your content.
Reduce operational costs: reduce bandwidth consumption and request downloads to the source by blocking unwanted traffic.
Provide visibility and control: all the analytics tools that enable technology teams to monitor traffic and adjust bot permissiveness policies are available in a single dashboard.

The internet continues to evolve constantly; today, AI is redefining web interactions, and the landscape is constantly changing. That’s why Perimetrical offers the tools companies need to maintain control over their websites, differentiating between traffic that adds value and traffic that detracts from it.

AI Bot management: new challenges for your team

The challenge of unwanted AI bots

The good old robots.txt

Identify the source of a bot and manage its traffic

Verified bot control

What can Transparent Edge do?

Optimize your data with an anomaly detection system

DDoS: New tab in the dashboard

From Vue 2 dashboard to Remix: a modernization journey

AI Bot management: new challenges for your team

The challenge of unwanted AI bots

The good old robots.txt

Identify the source of a bot and manage its traffic

Verified bot control

What can Transparent Edge do?

Artículos Relacionados

Optimize your data with an anomaly detection system

DDoS: New tab in the dashboard

From Vue 2 dashboard to Remix: a modernization journey