HTTP as it is seen in the wild
Out of a somewhat idle curiosity, I decided to do up some numbers for actual HTTP requests against one of the servers here. All of this is using the past 28 days of old logs (plus today's):
289160 total requests
277323 GET
5722 PROPFIND
3665 OPTIONS
2215 POST
178 HEAD
39 CONNECT
18 garbled
Most of the requests were successful; 90% got a 2xx or a 3xx response. 55,246 (21%) of the GET requests were successful conditional GETs, out of 256,392 successful GETs; I'm not sure whether to consider this good or bad.
(Unfortunately I don't have enough information to find out how many requests were willing to accept gzip'd results.)
The popularity of PROPFIND and OPTIONS surprised me, but almost all of
them turn out to be from just three external IPs, with the lion's share
coming from just one. Most of the OPTIONS requests were to /, and most
of the PROPFIND requests were to the (nonexistent) /LJF4100, so I
suspect that someone's machine is badly misconfigured.
The majority of the HEAD requests were for /, with my
Atom syndication feed being the somewhat distant runner-up. Requests
came from all over with nothing clearly dominating the results.
(From this I conclude that optimizing HEAD is not really a high priority, which is good because DWiki doesn't.)
HTTP/1.0 dominated over HTTP/1.1, about 67% to 33%; no one is still making pre-HTTP/1.0 requests. (Apart from our very primitive monitoring system, which I am ignoring for this.)
A small number of apparently legitimate people made requests with full 'http://...' URLs (theoretically only usable against proxies; 396 requests in total). To my surprise, a full third of them used HTTP/1.0; the rest used HTTP/1.1.
Requests came from 11,745 different IP addresses. The average number
of requests per IP was 24.6, but the median was only 3 (and the mode was 1 request,
which does not surprise me). A surprisingly large number of the IPs that
made only one request asked for robots.txt (although it was not the
most popular such request). As usual, the most active visitor was our
internal search engine.
Sidebar: POST targets
This server (currently) hosts CSpace (and thus WanderingThoughts), which
is what the majority of the POST requests were directed against (1,299
out of the 2,215; I get a fair number of comment spam attempts). A small
number of the remainder (126) were legitimate; the rest were bad in
various ways, ranging from repeatedly poking nonexistent URLs to various
XML RPC exploit attempts (and one mysterious POST to /).
The most popular POST target was the nonexistent URL path /officescan/cgi/cgiRecvFile.exe, followed by my Recent Comments page.
Sidebar: the breakdown of responses
Distribution of HTTP response codes:
201807 2xx
199273 200
2534 206
59814 3xx
55246 304
4106 301
459 302
27506 4xx
13494 404
8018 403
5756 405
234 400
2 401
1 416
1 414
30 5xx
Some of the 404'd URLs are fairly popular, but I'm not going to try to read the tea leaves about that.
|
|