2013-04-15
The basics of 4K sector hard drives (aka 'Advanced Format' drives)
Modern hard drives have two sector sizes, the physical sector size and the logical one. The physical sector size is what the hard drive actually reads and writes in; the logical sector size is what you can ask it to read or write (and I believe what logical block addresses are in). The physical block size is always equal to or larger than the logical one. Writing to only part of a physical sector requires the drive to do a read-modify-write cycle.
In the beginning, basically all drives had a 512 byte sector size (for both physical and logical, which weren't really split back then). Today it's difficult or impossible to find a current SATA drive that is not an 'Advanced Format' drive with 4096 byte physical sectors. To date I believe that all 4k drives have a 512 byte logical sector size (call this 4k/512), but in the future that may change so that we see 4k/4k drives.
(At this point I have no idea if vendors want to move to a 4k logical sector size. If they don't move life gets simpler for a lot of people, us included.)
The main issue for 4k/512 drives is partial writes. If you're waiting for the write to complete a partial write apparently costs you one rotational latency in extra time. If you're not waiting, eg if you're just writing to the drive's write cache (at a volume where it doesn't fill up), you're probably still going to lose overall IOPs.
(The other problem with partial writes is that if things go wrong they can corrupt the data in the rest of the physical sector, data which you didn't think you were writing.)
There are two ways to get partial writes. The first is that your OS simply writes things smaller than the physical block size (perhaps it uses the logical block size for something or just assumes that sectors are 512 bytes and that it can write single ones). The other is unaligned large writes, where you may be issuing writes that are multiples of the physical block size but the starting position is not lined up with the start of physical blocks. Since most filesystems today normally write in 4k blocks or larger, unaligned writes are the most common problem. The extra bonus for unaligned writes is that they give you two partial writes, one at the start and a second at the end, both of which cost you time, IOPs, or both.
(Aligned large writes that are not multiples of the physical block size will also cause partial writes at the end, but I think that this is relatively uncommon today.)
Getting writes to be aligned requires that everything in the chain from basic partitioning (BIOS or GPT, take your pick) up through internal OS partitioning and on-disk filesystem data structures be on 4k (or larger) boundaries. This is often not the case for existing legacy partitioning. Frequently the original (and existing) partitioning tools rounded things up (or down) to essentially arbitrary 'cylinder' boundaries using nominal disk geometries that were entirely imaginary and generally arbitrary.
(There was a day when disk geometries were real and meaningful, but that was more than a decade ago for most machines.)
Modern disk drives advertise both their physical and logical block sizes (in disk inquiry data). Unfortunately this information may or may not properly propagate up through a complex storage stack (which may involve hardware or software RAID, SAN controllers, logical volume management, virtualization, and so on). The good news is that most modern software aligns things on 4k or larger boundaries regardless of what block size the underlying storage claims to have, so you have at least some chance of having everything work out. The bad news is that you're probably not using all-modern software.
(This is the kind of thing that I write to get everything fixed in my head, since we're now seriously looking into how badly 4k sector drives are going to impact our fileserver environment.)
Note that some vendors make drives with the same model number that can have different physical block sizes. I have a pair of Seagate 500 GB SATA drives (with the same model number, ST500DM002), bought at the same time from the same vendor, one of which turns out to have 4k sectors and one of which has 512 byte sectors as I expected. Fortunately the difference is basically harmless for what I'm using them for.
(Seagate documents this possibility in a footnote on their technical PDF for the drive series, if you read the small print.)
Go's friction points for me (and a comparison to Python)
A commentator on my entry on Python's data structures problem asked in part:
So, what's next, if anything? I take it Go wasn't a revolution in the way the migration from C to Python was. [...]
This brings up the complex issue of my views on Go. Part of the issue is that Go has a bunch of friction points right now. Some of them are intrinsic in the language and some of them are simply artifacts of the current situation and will hopefully change.
(I wrote more about where I think Go fits into my programming back in GoInterest.)
In general I don't think that Go will ever be as fast to program in as Python is (in the sense of how long it takes me to write a program, not in how fast it runs). Go goes to a lot of work to reduce the amount of bureaucracy involved through various features, but Python is simply at a higher level in terms of eliminating make-work and as a result it's significantly more flexible and adaptable. The tradeoffs involved are sensible for both languages and their goals; as discussed Go has a strong emphasis on large scale software engineering and Python doesn't.
(To put it one way, Go is a great language for large scale software projects but I almost never write those. As a sysadmin I'm generally a small scale programmer.)
I'm going to split this into current and intrinsic friction points, then do this in point form to keep the size of this entry from exploding. First, the current friction points:
- Go is not pervasively available in the way that things like Python,
Perl, and awk are. This is especially true of current versions
of the native Go toolchain, which is really what you want to be
working with. This is merely a pain for personal development (I
can always build the toolchain myself) but a relative killer for
work programming in our environment.
(To put it one way, 'first you download and build the compiler' does not make Go sound attractive to my co-workers.)
- Go's standard library is limited and portions of it are crazy. This
can be (somewhat) fixed with external packages but then I have to
find them and evaluate them and so on, which is a hassle. It would
be less of a hassle if people started making OS packages for various
good add on Go packages, the way many Perl and Python add-on modules
are only an
apt-get
oryum
command away on most Linuxes.(Part of why this matters to me is that
$GOPATH
makes me grind my teeth. It strikes me as such a bad fit for working with multiple projects under version control that it's painful.) - The state of web frameworks for Go seems unclear right now. I
especially care about form handling and validation, especially
for database-backed forms (because this aspect is generally the
largest pain in the rear to code by hand; it's what drove me to
Django for my Python web app).
- Debugging is less friendly with Go than with Python, because if you screw up in Python it will dump out a great big verbose stack backtrace; often this points me to exactly the mistake I made. Go is a lot terser and thus less helpful.
(There are also pragmatic issues with using Go in production.)
I thought that I had several intrinsic language issues but at this point all I can think of is the general extra annoyance of explicit error handling as opposed to Python's tacit exceptions. I understand why Go makes the choice it does but Python's exception-based approach is just plain convenient for quick coding and it means that you can write much less code (you can aggregate error checks and even skip writing explicit ones and your program will still abort on errors).
(I consider things like Go type assertions to be part of the general price paid for static typing. I can't really describe static typing as a friction point, although to be honest it sort of is.)
Also, as I've written before I maintain that Go's
obsessive focus on goroutines with basically no support for select()
et al is ultimately a mistake. Goroutines cannot do everything and there
are real situations that they don't cope with (not unless you allow
them to be canceled from outside while they are in nominally blocking
routines).
(If I use Go more I may find some additional irritations. Python is a relatively featureful language as compared to Go, so I may find myself missing things like function decorators at some point.)