The AI & bot traffic
reality check
The industry conversation about bot traffic has become binary and fear-driven. We analyzed over 10 billion requests, combined data from the world's largest infrastructure networks, and spoke with engineers and experts to give you the full picture.
Six things you need to know
before reading the rest
The data across this report tells a consistent story that should change how you think about bot traffic on every site you manage.
Bot traffic has become a behavior problem, not a volume problem. Broken crawlers hitting expensive endpoints. Standard defenses being bypassed. Automatic mitigation making the wrong call for your specific context. The sites managing this well are not the ones that found a perfect solution. They are the ones that made a conscious choice about which trade-offs fit their situation.
More bots isn't the problem.
What changed is how they behave.
For years, the conversation about bot traffic focused on volume. Security teams tracked what percentage of requests were automated, blocked the obviously malicious ones, and called it a day. That model worked reasonably well when bots mostly crawled static HTML pages, the kind of lightweight content a server could handle millions of times without breaking a sweat.
Then AI changed everything abruptly. Within the past two years bots designed not just to index content for search results, but to ingest it wholesale for model training, retrieval-augmented generation, and user-triggered queries have flooded the web. These crawlers are hungrier, faster, and fundamentally less well-behaved than anything that came before.
According to Cloudflare's 2025 Year in Review, AI bots averaged 4.2% of all HTML requests in 2025. When Googlebot is included, the combined share jumps to 8.5%. By December 2025, humans made up just 47% of HTML traffic, while bots made up the rest.
Akamai measured an even starker shift as AI bot activity surged 300% in a single year, with the commerce sector recording over 25B bot requests in just two months. Publishing felt the sharpest impact, with 63% of AI bot activity hitting media sites.
Not all of these bots are bad. Some are:
- Googlebot, which you cannot block without destroying your SEO.
- AI discovery crawlers that might surface your content in AI-powered search results.
- Legitimate automation tools your own team uses.
- Genuinely misbehaving. Running in loops, hammering uncached endpoints, burning through server resources with no useful output.
Managing this well requires understanding of which category you are dealing with. The rest of this report breaks that down.
What 10 billion requests actually tells us
Raw traffic percentages can be misleading. A site that gets a million bot requests a month sounds alarming, until you realize 95% of those are Googlebot indexing your pages and moving on. The number that matters isn't the total count. It's what each category of traffic is actually doing when it arrives.
Across Kinsta's infrastructure, tracking over 10.2 billion requests in a 30-day window, the breakdown of automated traffic reveals a nuanced picture. The majority of requests are handled normally. A meaningful portion are challenged or filtered. A smaller slice are blocked outright. And a critical category, requests from well-known bots like Googlebot, are deliberately skipped over so they can do their jobs without interference.
What happened to each request across 10.2 billion total over 30 days. The largest single action is simply allowing traffic through, meaning the majority of automated requests are handled without intervention. The second largest category is challenges, which are not blocks: they present a verification step that real humans pass easily.
* Kinsta data aggregated from a 30-day analysis window.
The bots are not attacking.
They are just stuck.
Most AI crawlers are designed to follow every link they find and record every unique page address. That works fine on simple sites. But modern websites, especially WooCommerce stores, generate slightly different URLs for essentially the same page.
For example, a product link with a color filter, a cart link with a quantity, a calendar page with a sort order. To a human, these all look like the same page, but to a bot following URLs, each one looks brand new.
So the bot follows the first link. That page generates another variation. The bot follows that. And another. And another. It has no way to recognize it is going in circles, and some of these loops ran undetected for multiple days before infrastructure rules caught them.
The real damage is not the volume itself, but where these requests land. A homepage can be served a million times from cache at almost zero cost, but a shopping cart page cannot. Every cart, wishlist, or checkout request bypasses the cache entirely and forces the server to run PHP, query the database, and handle a session. When you multiply that by millions of looping requests, a well-intentioned crawler becomes an infrastructure event.
Adding something to a cart gets a bot absolutely nothing useful. There is no content to index, no data to train on. Yet five major crawlers were caught doing exactly this, simultaneously, suggesting the same bug exists across completely independent systems.
We had a bot hitting a product comparison page and just not stopping. The URL kept growing — color1=red&color2=blue, then color1=blue&color2=blue, then more combinations stacked on top. Millions of requests. In one case it ran for multiple days before a WAF rule killed it. These aren't attacks. They're bugs running at full speed with nothing to stop them.
The support queue tells a different story
Infrastructure telemetry and industry reports frame bot traffic as a systems problem, however support conversations reveal the human side of it. When site owners notice something is wrong, they usually see a sudden spike in visits, a performance drop, unexpected resource usage. What they rarely see immediately is the cause.
Across support tickets and Intercom conversations, a consistent pattern appears. Customers expect the platform to automatically catch and stop bot traffic spikes. They often struggle to understand what kind of traffic they are actually seeing. And once they realise what is happening, they start asking for clearer visibility and tools that allow them to manage it themselves.
When a bot spike causes a performance problem, the first question from site owners is rarely technical. It is almost always a version of:
Customers expect bot management to work like spam filtering — detected and stopped before it affects the site. Many assume that abnormal traffic patterns should always be caught and mitigated by the platform. When they are not, the gap feels like a failure, not a trade-off.
When visits spike unexpectedly, customers often cannot tell what they are looking at. The same high request volume could be:
- Legitimate search crawlers doing their job
- AI training bots harvesting content
- A hacking attempt or vulnerability scan
- A well-intentioned crawler stuck in a loop
Without visibility into server logs or request behaviour, these scenarios look identical. That uncertainty shapes every question site owners ask when they open a support ticket.
What the broader research confirms
Kinsta's infrastructure data does not exist in isolation. The patterns we observe across our platform are being independently documented by some of the largest traffic analysis companies in the world, often with even starker findings. Understanding those reports in context helps explain why bot traffic suddenly feels different to everyone managing a WordPress site.
TollBit, which tracks AI crawler activity across roughly 400 partner publisher websites, published one of the most striking data points of 2025: by Q4 2025, there was one AI bot visit for every 31 human visits, up from just one in 200 at the start of the year. That is not gradual change. It is a step change in the nature of web traffic. And it is likely an undercount. TollBit's own researchers noted that modern AI scrapers are increasingly indistinguishable from human visitors in server logs, loading pages, solving CAPTCHAs, and even respecting cookies while ignoring the intent of those protections.
The scale of what is being ignored is significant. In March 2025 alone, TollBit documented over 26 million AI scrapes that bypassed robots.txt, the files websites use to tell crawlers what not to touch. The share of bots ignoring these files jumped from 3.3% to 12.9% in a single quarter. The tools most commonly recommended for managing crawlers, robots.txt and noindex meta tags, are becoming unreliable as enforcement mechanisms.
For WordPress site owners managing WooCommerce stores, the WP Engine report's finding is perhaps the most directly relevant: AI-driven bot traffic was found to consume up to 70% of costly dynamic resources on affected sites, including hosting environment capacity, PHP workers, and database connections. This is the mechanism behind those support tickets asking why a site suddenly slowed down with no apparent cause. Despite this, WP Engine found that only 38% of the web teams they analyzed were using any dedicated bot mitigation solution.
Cloudflare's data adds another dimension the industry is only beginning to grapple with: the purpose of crawling has changed. Training-related crawling now represents nearly 80% of all AI bot activity, up from 72% a year earlier. This crawling returns almost nothing to the sites being scraped. Anthropic's crawl-to-refer ratio reached as high as 100,000:1 at certain points in 2025, meaning for every visitor referred back, the crawlers had visited up to a hundred thousand pages. For content-driven businesses, this is a fundamentally different relationship with bots than anything that existed five years ago.
QUOTES MAYBE
There is no perfect bot strategy.
There are only trade-offs.
Every bot management decision involves giving something up. The sites that manage this well are not the ones that found the right answer. They are the ones that made a conscious choice about which trade-off fits their situation.
Block AI crawlers and you protect your content. Allow them and you might appear in the next generation of AI-powered search. There's no objectively correct answer. It depends entirely on whether AI referral traffic matters for your business model.
Allow AI search crawlers (PerplexityBot, Google-Extended). Block pure training crawlers (CCBot, GPTBot, anthropic-ai). Evaluate referral data before committing.
The decision framework in the next section is built around these three trade-offs. Select your site type and your primary concern, and it will tell you exactly which crawlers to allow, challenge, or block, and why.
Stop guessing.
Build the right strategy for your site.
The right approach depends entirely on what your site does and what you are trying to protect. Answer two questions and get a specific, actionable bot strategy, including exactly which crawlers to allow, which to challenge, and which to block outright.
If you are an agency advising a client: run through this for each site type in your portfolio. The 20 combinations cover the full range of WordPress deployments; the outputs are written to be advisable, not just self-applicable. Each recommendation includes the trade-off your client needs to understand before you make the change.
Save this page as a reference for client conversations. The framework covers ecommerce, content, business, SaaS, and dev environments, the five site types that account for the vast majority of WordPress agency work. Each recommendation is specific enough to act on and honest enough about trade-offs that you can present it to a client without overpromising.
Your performance issues are almost certainly bots hitting WooCommerce's add-to-cart and checkout endpoints. These bypass the page cache entirely and force PHP execution and database queries on every single request. The fix isn't blocking everything. The goal is protecting specific high-cost paths.
Disallow: /shop?add-to-cart= and Disallow: /checkout to robots.txt for all crawlersHow hosting providers actually handle this
Most hosting providers offer far less bot management capability than the scale of this problem would suggest. After analyzing 11 major WordPress hosting providers, the pattern is consistent: most either handle everything automatically with no user input, or charge extra for meaningful control. Very few offer included tools with genuine user-level configuration.
The problem with automatic-only approaches is that they make decisions that should be context-dependent on your behalf. WP Engine's Global Edge Security is a capable enterprise-grade system, but it offers no user input. If the system decides a particular type of traffic should be blocked or challenged, you cannot override it for your specific situation. For an agency managing dozens of client sites with different needs, that is a significant limitation.
The paid upgrade model is perhaps even more frustrating. Cloudways charges an additional monthly fee for the Cloudflare add-on that provides user-level bot controls. GoDaddy requires a security add-on starting at $7 per month. These are not unreasonable prices in isolation, but they create a two-tier environment where protection is theoretically available to everyone yet practically implemented by fewer than four in ten, according to WP Engine's own research.
| Provider | Approach | User control | Included | AI crawlers |
|---|---|---|---|---|
| Kinsta* this report | Platform baseline + environment-level control | Yes, granular | All plans | Block toggle |
| WP Engine | Global Edge Security (WAF + bot management) | Automatic only | Paid add-on | Not documented |
| SiteGround | AI-based anti-bot system with CAPTCHA | Automatic only | Included | Limited |
| Cloudways | Imunify360 WAF + optional Cloudflare add-on | Paid add-on only | $4.99/mo add-on | Add-on required |
| Hostinger | AI Audit dashboard via CDN feature | Yes | Business plan+ | Allow/block toggle |
| GoDaddy | WAF and DDoS protection | Paid only | $7/mo+ add-on | Not documented |
| Nexcess | Performance Shield (Cloudflare Enterprise) | Automatic only | All tiers | Not documented |
| DreamHost | Bot analytics only, no active mitigation | Analytics only | Included | No mitigation |
| Bluehost / IONOS | Standard SSL and basic security | None | Not available | Not available |
You now understand the problem.
Here is where to start.
The right first step depends on where you are. Pick the track that matches your situation.
Start with visibility, then make one decision
Pull your server logs or use Kinsta Analytics to separate bot traffic from human traffic. Look specifically for requests to /cart, /checkout, and any URL with query parameters that vary. You are looking for volume and pattern, not individual bots.
Kinsta's platform-level rules already filter the most clearly broken behavior: the loop detection and ASN mitigation that blocked 550M requests in 30 days. Make sure these are active before adding anything site-specific. They are conservative by design and will not break legitimate traffic.
If your logs show bot traffic on WooCommerce endpoints: add robots.txt Disallow for /cart and ?add-to-cart= and back it up with a WAF rule. If you found AI training crawlers scraping content: enable the AI bot block toggle. Do not try to fix everything at once.
For agencies: run this audit across three client sites with different profiles (ecommerce, content, and business/lead gen). The patterns will be consistent enough to build a reusable recommendation framework for your client conversations.
Environment-level bot controls with multiple protection presets. Included on all Kinsta plans.
Path-specific blocking and challenge rules. Surgical control without blanket bans that break SEO.
Separate bot traffic from human traffic. Understand what is actually hitting your site before making decisions.