HTTP as it is seen in the wild
Out of a somewhat idle curiosity, I decided to do up some numbers for actual HTTP requests against one of the servers here. All of this is using the past 28 days of old logs (plus today's):
289160 total requests 277323 GET 5722 PROPFIND 3665 OPTIONS 2215 POST 178 HEAD 39 CONNECT 18 garbled
Most of the requests were successful; 90% got a 2xx or a 3xx response. 55,246 (21%) of the GET requests were successful conditional GETs, out of 256,392 successful GETs; I'm not sure whether to consider this good or bad.
(Unfortunately I don't have enough information to find out how many requests were willing to accept gzip'd results.)
The popularity of PROPFIND and OPTIONS surprised me, but almost all of
them turn out to be from just three external IPs, with the lion's share
coming from just one. Most of the OPTIONS requests were to /
, and most
of the PROPFIND requests were to the (nonexistent) /LJF4100
, so I
suspect that someone's machine is badly misconfigured.
The majority of the HEAD requests were for /
, with my
Atom syndication feed being the somewhat distant runner-up. Requests
came from all over with nothing clearly dominating the results.
(From this I conclude that optimizing HEAD is not really a high priority, which is good because DWiki doesn't.)
HTTP/1.0 dominated over HTTP/1.1, about 67% to 33%; no one is still making pre-HTTP/1.0 requests. (Apart from our very primitive monitoring system, which I am ignoring for this.)
A small number of apparently legitimate people made requests with full 'http://...' URLs (theoretically only usable against proxies; 396 requests in total). To my surprise, a full third of them used HTTP/1.0; the rest used HTTP/1.1.
Requests came from 11,745 different IP addresses. The average number
of requests per IP was 24.6, but the median was only 3 (and the mode was 1 request,
which does not surprise me). A surprisingly large number of the IPs that
made only one request asked for robots.txt
(although it was not the
most popular such request). As usual, the most active visitor was our
internal search engine.
Sidebar: POST targets
This server (currently) hosts CSpace (and thus WanderingThoughts), which
is what the majority of the POST requests were directed against (1,299
out of the 2,215; I get a fair number of comment spam attempts). A small
number of the remainder (126) were legitimate; the rest were bad in
various ways, ranging from repeatedly poking nonexistent URLs to various
XML RPC exploit attempts (and one mysterious POST to /
).
The most popular POST target was the nonexistent URL path /officescan/cgi/cgiRecvFile.exe, followed by my Recent Comments page.
Sidebar: the breakdown of responses
Distribution of HTTP response codes:
201807 2xx 199273 200 2534 206 59814 3xx 55246 304 4106 301 459 302 27506 4xx 13494 404 8018 403 5756 405 234 400 2 401 1 416 1 414 30 5xx
Some of the 404'd URLs are fairly popular, but I'm not going to try to read the tea leaves about that.
|
|