2009-12-20
Some things about getting useful output from time
No version of time
has ever had what you could call a genuinely useful
default output format, at least not for benchmarking and testing. The
POSIX (and System V) standard format has all of the information you want
in the right format (the times in simple seconds), but spreads it across
multiple lines; the BSD format used by csh
puts the times in 'human
readable' format of hours, minutes, and seconds, and includes a bunch
of additional information (much of which is inaccurate, useless, or
outright made up by the kernel).
Fortunately, GNU time lets you set a different format. Since I keep looking this up, here is what I want to use:
time -f '%e real %U user %S kernel' ....
I want the times in pure seconds, not in 'human readable' converted form, because I am almost invariably performing math on them in order to find out the information that I'm really interested in.
(Perhaps somewhere there is a convenient Python module for doing this
with human readable numbers, but frankly bc
works fine with seconds so
my motivation to go looking is low.)
FreeBSD's time
program has no format specifier option, but defaults
to essentially this format. Solaris's time
program also has no format
specifier option and defaults to a 'human readable' multi-line format;
the best you can do is make it use the POSIX format with 'time -p
'.
On basically all systems, csh
and tcsh
have their own time
builtin
which uses the BSD csh
format or some variant of it (and can be
customized to some extent).
Now, let's talk about why I say that time
makes up some of its
numbers.
time
generally gets its numbers from getrusage()
(okay,
technically it gets them from wait4()
, which returns the same
information). However, various versions of getrusage()
explicitly
document that some information is plain not returned or is not what you
think it is. The Linux manpage for getrusage()
says, for example:
[...] Not all fields are meaningful under Linux. In linux 2.4 only the fields
ru_utime
,ru_stime
,ru_minflt
, andru_majflt
are maintained. Since Linux 2.6,ru_nvcsw
andru_nivcsw
are also maintained.
(Note that time
determines the wall clock time itself.)
This corresponds to time
's user and system time, major and minor page
faults, and voluntary and involuntary time switches (%w and %c, not
printed by default), and it assumes that the Linux kernel does a good
job of keeping track of this information, which I would be dubious of.
The kernel has good intentions, but doing completely accurate accounting
can be hard and is not necessarily a high priority.
(Current Linux 2.6 kernels seems to also produce numbers for IO input and output, provided that you have the right kernel configuration options turned on. Again, the trustworthiness of those numbers is unknown.)
Currently, FreeBSD 8.0 says:
The numbers
ru_inblock
andru_oublock
account only for real I/O; data supplied by the caching mechanism is charged only to the first process to read or write the data.
I'm not going to quote from the Solaris 10 getrusage()
manpage, but it has a
similar long list of caveats and cautions.
Now, it's possible that time
will return real and useful
information for things you are interested in (beyond the basic time
information). But before you rely on its numbers for performance
measurement, you certainly need to check to make sure that they
are accurate and useful. This should include not just reading the
getrusage()
manpage for your OS, but actually constructing synthetic
programs that perform known operations and making sure that time
reports accurate information for what they do.
Some thoughts on intercepting https traffic
It's been pointed out to me that there are legitimate reasons to intercept and inspect https traffic, and this can even be a primary purpose of having a local certificate authority. For example, breaking open https traffic can be vital for being able to see and possibly analyze malware downloads.
(Note that you should really not do this covertly, without admitting that you're inspecting https traffic. Sooner or later someone will notice that the SSL certificate authority for some outside site is your own internal CA, and things go rapidly downhill from there.)
If you are going to do this, you should do it selectively, for both policy and technical reasons. The policy reasons should be obvious, including that the less you intercept the less that you can inadvertently leak if something goes wrong. The technical reason is that unless you build a quite complicated https interception system, you only really want to intercept things that have valid certificates.
With simple interception schemes, you set up SSL with the internal client, including giving it a valid signed certificate, before you've necessarily connected to the remote server, gotten its certificate, and validated it. If the remote server cert fails to validate, pretty much the only thing you can do is break the connection. Even with a more complicated scheme, you can't pass through the invalid server cert while still being able to intercept the traffic, and without seeing the real server cert there is no way for the user to make a sensible decision about whether or not to continue.
I can think of two ways to do such selective https interception. The best way is to use a https proxy, because this gives you access to the actual hostname the client is trying to connect to; this lets you make the most fine-grained decisions about what traffic to intercept. In this approach, the https proxy selectively diverts some connections to your special https inspection system, while proxying all of the rest as usual.
The more brute force approach is to use firewall redirection to divert https traffic for some IP addresses off to your https inspection system. This has the twin flaws that you have to get all of the IP addresses of the websites you want to intercept traffic for, and that you may intercept too much traffic by using IPs instead of hostnames (although until SNI catches on this probably won't be much of a worry, since shared-host https is basically impossible right now).