Wandering Thoughts archives

2019-07-12

Browers can't feasibly stop web pages from talking to private (local) IP addresses

I recently read Jeff Johnson's A problem worse than Zoom (via), in which Johnson says:

[...] The major browsers I've tested — Safari, Chrome, Firefox — all allow web pages to send requests not only to localhost but also to any IP address on your Local Area Network! Can you believe that? I'm both astonished and horrified.

(Johnson mostly means things with private IP addresses, which is the only sense of 'on your local and private network' that can be usefully determined.)

This is a tempting and natural viewpoint, but unfortunately this can't be done in practice without breaking things. To understand this, I'll outline a series of approaches and then explain why they fail or cause problems.

To start with, a browser can't refuse to connect to private IP addresses unless the URL was typed in the URL bar because there are plenty of organizations that use private IP addresses for their internal web sites. Their websites may link to each other, load resources from each other, put each other in iframes, and in general do anything you don't want an outside website to do to your local network, and it is far too late to tell everyone that they can't do this all of a sudden.

It's not sufficient for a browser to just block access by explicit IP address, to stop web pages from poking URLs like 'http://192.168.10.10/...'. If you control a domain name, you can make hosts in that have arbitrary IP addresses, including private IP addresses and 127.0.0.1. Some DNS resolvers will screen these out except for 'internal' domains where you've pre-approved them, but a browser can't assume that it's always going to be behind such a DNS resolver.

(Nor can the browser implement such a resolver itself, because it doesn't know what the valid internal domains even are.)

To avoid this sort of DNS injection, let's say that the browser will only accept private IP addresses if they're the result of looking up hosts in top level domains that don't actually exist. If the browser looks up 'nasty.evil.com' and gets a private IP address, it's discarded; the browser only accepts it if it comes from 'good.nosuchtld'. Unfortunately for this idea, various organizations like to put their internal web sites into private subdomains under their normal domain name, like '<host>.corp.us.com' or '<host>.internal.whoever.net'. Among other reasons to do this, this avoids problems when your private top level domain turns into a real top level domain.

So let's use a security zone model. The browser will divide websites and URLs into 'inside' and 'outside' zones, based on what IP address the URL is loaded from (something that the browser necessarily knows at the time it fetches the contents). An 'inside' page or resource may refer to outside things and include outside links, but an outside page or resource cannot do this with inside resources; going outside is a one-way gate. This looks like it will keep internal organizational websites on private IP addresses working, no matter what DNS names they use. (Let's generously assume that the browser manages to get all of this right and there are no tricky cases that slip by.)

Unfortunately this isn't sufficient to keep places like us working. We have a 'split horizon' DNS setup, where the same DNS name resolves to different IP addresses depending on whether you're inside or outside our network perimeter, and we also have a number of public websites that actually live in private IP address space but that are NAT'd to public IPs by our external firewall. These websites are publicly accessible, get linked to by outside things, and may even have their resources loaded by outside public websites, but if you're inside our network perimeter and you look up their name, you get a private IP address and you have to use this IP address to talk to them. This is exactly an 'outside' host referring to an 'inside' resource, which would be blocked by the security zone model.

If browsers were starting from scratch today, there would probably be a lot of things done differently (hopefully more securely). But they aren't, and so we're pretty much stuck with this situation.

web/BrowsersAndLocalIPs written at 21:49:48; Add Comment

Reflections on almost entirely stopping using my (work) Yubikey

Several years ago (back in 2016), work got Yubikeys for a number of us for reasons beyond the scope of this entry. I got designated as the person to figure out how to work with them, and in my usual way with new shiny things, I started using my Yubikey's SSH key for lots of additional things over and above their initial purpose (and I added things to my environment to make that work well). For a long time since then, I've had a routine of plugging my Yubikey in when I got in to work, before I unlocked my screen the first time. The last time I did that was almost exactly a week ago. At first, I just forgot to plug in the Yubikey when I got in and didn't notice all day. But after I noticed that had happened, I decided that I was more or less done with the whole thing. I'm not throwing the Yubikey away (I still need it for some things), but the days when I defaulted to authenticating SSH with the Yubikey SSH key are over. In fact, I should probably go through and take that key out of various authorized_keys files.

The direct trigger for not needing the Yubikey as much any more and walking away from it are that I used it to authenticate to our OmniOS fileservers, and we took the last one out of service a few weeks ago. But my dissatisfaction has been building for some time for an assortment of reasons. Certainly one part of it is that the big Yubikey security issue significantly dented my trust in the whole security magic of a hardware key, since using a Yubikey actually made me more vulnerable instead of less (well, theoretically more vulnerable).

Another part of it is that for whatever reason, every so often the Fedora SSH agent and the Yubikey would stop talking to each other. When this happened various things would start failing and I would have to manually reset everything, which obviously made relying on Yubikey based SSH authentication far from the transparent experience of things just working that I wanted. At some points, I adopted a ritual of locking and then un-locking my screen before I did anything that I knew required the Yubikey.

Another surprising factor is that I had to change where I plugged in my Yubikey, and the new location made it less convenient. When I first started using my Yubikey I could plug it directly into my keyboard at the time, in a position that made it very easy to see it blinking when it was asking for me to touch it to authenticate something. However I wound up having to replace that keyboard (cf) and my new keyboard has no USB ports, so now I have to plug the Yubikey into the USB port at the edge of one of my Dell monitors. This is more awkward to do, harder to reach and touch the Yubikey's touchpad, and harder to even see it blinking. The shift in where I had to plug it in made everything about dealing with the Yubikey just a bit more annoying, and some bits much more annoying.

(I have a few places where I currently use a touch authenticated SSH key, and these days they almost always require two attempts, with a Yubikey reset in the middle because one of the reliable ways to have the SSH agent stop talking to the Yubikey is not to complete the touch authentication stuff in time. You can imagine how enthused I am about this.)

On the whole, the most important factor has been that using the Yubikey for anything has increasingly felt like a series of hassles. I think Yubikeys are still reasonably secure (although I'm less confident and trusting of them than I used to be), but I'm no longer interested in dealing with the problems of using one unless I absolutely have to. Nifty shiny things are nice when they work transparently; they are not so nice when they don't, and it has surprised me how little it took to tip me over that particular edge.

(It's also surprised me how much happier I feel after having made the decision and carrying it out. There's all sorts of things I don't have to do and deal with and worry about any more, at least until the next occasion when I really need the Yubikey for something.)

sysadmin/YubikeyMostlyDropped written at 01:27:37; Add Comment

2019-07-10

I brought our Django app up using Python 3 and it mostly just worked

I have been worrying for some time about the need to eventually get our Django web application running under Python 3; most recently I wrote about being realistic about our future plans, which mostly amounted to not doing anything until we had to. Well, guess what happened since then.

For reasons beyond the scope of this entry, last Friday I ended up working on moving our app from Django 1.10.7 to 1.11.x, which was enlivened by the usual problem. After I had it working under 1.11.22, I decided to try running it (in development mode, not in production) using Python 3 instead of Python 2, since Django 1.11.22 is itself fully compatible with Python 3. To my surprise, it took only a little bit of cleanup and additional changes beyond basic modernization to get it running, and the result is so far fully compatible with Python 2 as well (I committed the changes as part of the 1.11 move, and since Monday they're running in production).

I don't think this is particularly due to anything I've done in our app's code; instead, I think it's mostly due to the work that Django has done to make everything work more or less transparently. As the intermediate layer between your app and the web (and the database), Django is already the place that has to worry about character set conversion issues, so it can spare you from most of those. And generally that's the big difference between Python 2 and Python 3.

(The other difference is the print statement versus 'print()', but you can make Python 2.7 work in the same way as Python 3 with 'from __future__ import print_function', which is what I did.)

I haven't thoroughly tested our web app under Python 3, of course, but I did test a number of the basics and everything looks good. I'm fairly confident that there are no major issues left, only relatively small corner cases (and then the lurking issue of how well the Python 3 version of mod_wsgi works and if there are any traps there). I'm still planning to keep us on Python 2 and Django 1.11 through at least the end of this year, but if we needed to I could probably switch over to a current Django and Python 3 with not very much additional work (and most of the work would be updating to a new version of Django).

There was one interesting and amusing change I had to make, which is that I had to add a bunch of __str__ methods to various Django models that previously only had __unicode__ methods. When building HTML for things like form <select> fields, Django string-izes the names of model instances to determine what to put in here, but in Python 2 it actually generates the Unicode version and so ends up invoking __unicode__, while in Python 3 str is Unicode already and so Django was using __str__, which didn't exist. This is an interesting little incompatibility.

Sidebar: The specific changes I needed to make

I'm going to write these down partly because I want a coherent record, and partly because some of them are interesting.

  • When generating a random key to embed in a URL, read from /dev/urandom using binary mode instead of text mode and switch from an ad-hoc implementation of base64.urlsafe_b64encode to using the real thing. I don't know why I didn't use the base64 module in the first place; perhaps I just didn't look for it, since I already knew about Python 2's special purpose encodings.

  • Add __str__ methods to various Django model classes that previously only had __unicode__ ones.

  • Switch from print statements to print() as a function in some administrative tools the app has. The main app code doesn't use print, but some of the administrative commands report diagnostics and so on.

  • Fix mismatched tabs versus spaces indentation, which snuck in because my usual editor for Python used to use all-tabs and now uses all-spaces. At some point I should mass-convert all of the existing code files to use all-spaces, perhaps with four-space indentation.

  • Change a bunch of old style exception syntax, 'except Thing, e:', to 'except Thing as e:'. I wound up finding all of these with grep.

  • Fix one instance of sorting a dictionary's .keys(), since Python 3 now returns an iterator here instead of a sortable object.

Many of these changes were good ideas in general, and none of them are ones that I find objectionable. Certainly switching to just using base64.urlsafe_b64encode makes the code better (and it makes me feel silly for not using it to start with).

python/DjangoAppPython3Surprise written at 21:46:22; Add Comment

2019-07-09

Systemd services that always restart should probably set a restart delay too

Ubuntu 18.04's package of the Prometheus host agent comes with a systemd .service unit that is set with 'Restart=always' (something that comes from the Debian package, cf). This is a perfectly sensible setting for the host agent for a metrics and monitoring system, because if you have it set to run at all, you almost always want it to be running all the time if at all possible. When we set up a local version of the host agent, I started with the Ubuntu .service file and kept this setting.

In practice, pretty much the only reason the Prometheus host agent aborts and exits on our machines is that the machine has run out of memory and everything is failing. When this happens with 'Restart=always' and the default systemd settings, systemd will wait its default of 100 milliseconds (the normal DefaultRestartSec value) and then try to restart the host agent again. Since the out of memory condition has probably not gone away in 100 ms, this restart is almost certain to fail. Systemd will repeat this until the restart has failed five times in ten seconds, and then, well, let me quote the documentation:

[...] Note that units which are configured for Restart= and which reach the start limit are not attempted to be restarted anymore; [...]

With the default restart interval, this takes approximately half a second. Our systems do not clear up out of memory situations in half a second, and so the net result was that when machines ran out of memory sufficiently badly that the host agent died, it was dead until we restarted it manually.

(I can't blame systemd for this, because it's doing exactly what we told it to do. It is just that what we told it to do isn't the right thing under the circumstances.)

The ideal thing to do would be to try restarting once or twice very rapidly, just in case the host agent died due to an internal error, and then to back off to much slower restarts, say once every 30 to 60 seconds, as we wait out the out of memory situation that is the most likely cause of problems. Unfortunately systemd only offers a single restart delay, so the necessary setting is the slower one; in the unlikely event that we trigger an internal error, we'll accept that the host agent has a delay before it comes back. As a result we've now revised our .service file to have 'RestartSec=50s' as well as 'Restart=always'.

(We don't need to disable StartLimitBurst's rate limiting, because systemd will never try to restart the host agent more than once in any ten second period.)

There are probably situations where the dominant reason for a service failing and needing to be restarted is an internal error, in which case an almost immediate restart minimizes downtime and is the right thing to do. But if that's not the case, then you definitely want to have enough of a delay to let the overall situation change. Otherwise, you might as well not set a 'Restart=' at all, because it's probably not going to work and will just run you into the (re)start limit.

My personal feeling is that most of the time, your services are not going to be falling over because of their own bugs, and as a result you should almost always set a RestartSec delay and consider what sort of (extended) restart limit you want to set, if any.

Sidebar: The other hazard of always restarting with a low delay

The other big reason for a service to fail to start is if you have an error in a configuration file or the command line (eg a bad argument or option). In this case, restarting in general does you no good (since the situation will only be cleared up with manual attention and changes), and immediately restarting will flood the system with futile restart attempts until systemd hits the rate limits and shuts things off.

It would be handy to be able to tell systemd that it should not restart the service if it immediately fails during a 'systemctl start', or at least to tell it that the failure of an ExecStartPre program should not trigger the restarting, only a failure of the main ExecStart program (since ExecStartPre is sometimes used to check configuration files and so on). Possibly systemd already behaves this way, but if so it's not documented.

linux/SystemdRestartUseDelay written at 23:45:39; Add Comment

2019-07-08

SMART drive self-tests seem potentially useful, but not too much

I've historically ignored all aspects of hard drive SMART apart, perhaps, from how smartd would occasionally email us to complain about things and sometimes those things would even be useful. There is good reason to be a SMART sceptic, seeing as many of the SMART attributes are underdocumented, SMART itself is peculiar and obscure, hard drive vendors have periodically had their drives outright lie about SMART things, and SMART attributes are not necessarily good predictors of drive failures (plenty of drives die abruptly with no SMART warnings, which can be unnerving). Certain sorts of SMART warnings are usually indicators of problems (but not always), but the absence of SMART warnings is no safety (see eg, and also Blackblaze from 2016). Also, the smartctl manpage is very long.

But, in the wake of our flaky SMART errors and some other events with Crucial SSDs here, I wound up digging deeper into the smartctl manpage and experimenting with SMART self-tests, where the hard drive tries to test itself, and SMART logs, where the hard drive may record various useful things like read errors or other problems, and may even include the sector number involved (which can be useful for various things). Like much of the rest of SMART, what SMART self-tests do is not precisely specified or documented by drive vendors, but generally it seems that the 'long' self-test will read or scan much of the drive.

By itself, this probably isn't much different than what you could do with dd or a software RAID scan. From my perspective, what's convenient about SMART self-tests is that you can kick them off in the background regardless of what the drive is being used for (if anything), they probably won't get too much in the way of your regular IO, and after they're done they automatically leave a record in the SMART log, which will probably persist for a fair while (depending on how frequently you run self-tests and so on).

On the flipside, SMART self-tests have the disadvantage that you don't really know what they're doing. If they report a problem, it's real, but if they don't report a problem you may or may not have one. A SMART self-test is better than nothing for things like testing your spare disks, but it's not the same as actually using them for real.

On the whole, my experimentation with SMART self-tests leaves me feeling that they're useful enough that I should run them more often. If I'm wondering about a disk and it's not being used in a way where all of it gets scanned routinely, I might as well throw a self-test at it to see what happens.

(They probably aren't useful and trustworthy enough to be worth scripting something so that we routinely run self-tests on drives that aren't already in software RAID arrays.)

PS: Much but not all of my experimentation so far has been on hard drives, not SSDs. I don't know if the 'long' SMART self-test on a SSD tests more thoroughly and reaches more bits of the drive internals than you can with just an external read test like dd, or conversely if it's less thorough than a full read scan.

tech/SMARTSelfTestsMaybe written at 21:07:18; Add Comment

2019-07-07

Straightforward web applications are now very likely to be stable in browsers

In response to my entry on how our goals for our web application are to not have to touch it, Ross Hartshorn left a comment noting:

Hi! Nice post, and I sympathize. However, I can't help thinking that, for web apps in particular, it is risky to have the idea of software you don't have to touch anymore (except for security updates). The browsers which are used to access it also change. [...]

I don't think these are one-off changes, I think it's part of a general trend. If it's software that runs on your computer, you can just leave it be. If it's a web app, a big part of it is running on someone else's computer, using their web browser (a piece of software you don't control). You will need to update it from time to time. [...]

This is definitely true in a general, abstract sense, and in the past it has been true in a concrete sense, in that some old web applications could break over time due to the evolution of browsers. However, this hasn't really been an issue for simple web applications (ones just based around straight HTML forms), and these days I think that even straightforward web applications are going to be stable over browser evolution.

The reality of the web is that there is a huge overhang of old straightforward HTML, and there has been for some time; in fact, for a long time now, at any given point in time most of the HTML in existence is 'old' to some degree. Browsers go to great effort to not break this HTML, for the obvious reason, and so any web application built around basic HTML, basic forms, and the like has been stable (in browsers) for a long time now. The same is true for basic CSS, which has long since stopped being in flux and full of quirks. If you stick to HTML and CSS that is at least, say, five years old, everything just works. And you can do a great deal with that level of HTML and CSS.

(One exhibit for this browser stability is DWiki, the very software behind this blog, which has HTML and CSS that mostly fossilized more than a decade ago. This includes the HTML form for leaving comments.)

Augmenting your HTML and CSS with Javascript has historically been a bit more uncertain and unstable, but my belief is that even that has now stopped. Just as with HTML and CSS, there is a vast amount of old(er) Javascript on the web and no one wants to break it by introducing incompatible language changes in browsers. Complex Javascript that lives on the bleeding edge of browsers is still something that needs active maintenance, but if you just use some simple Javascript to do straightforward progressive augmentation, I think that you've been perfectly safe for some time and are going to be safe well into the future.

(This is certainly our experience with our web application.)

Another way to put this is that the web has always had some stable core, and this stable core has steadily expanded over time. For some time now, that stable core has been big enough to build straightforward web applications. It's extremely unlikely that future browsers will roll back very much of this stable core, if anything; it would be very disruptive and unpopular.

(You don't have to build straightforward web applications using the stable core; you can make your life as complicated as you want to. But you're probably not going to do that if you want an app that you can stop paying much attention to.)

web/WebAppsAndBrowserStability written at 23:23:22; Add Comment

2019-07-06

Clearing disk errors (or SMART complaints) for Linux software RAID arrays

I've written in the past about clearing SMART disk complaints for ZFS pools, which involved an intricate dance with hdparm and various other things. Due to having a probably failing HD on my home machine, I've now had a chance to deal with much the same issue for a software RAID mirror (with LVM on top). It turns out that trying to fix up disk errors for software RAID levels with redundancy is embarrassingly simple; to try to fix disk read errors, you have software RAID check the array. Because this reads all of every component of the array, it will hopefully discover any read errors and then automatically try to fix them by rewriting the sectors.

(This doesn't happen with ZFS's self-checks, because ZFS optimizes to only read and check used space. If the disk read errors are in currently unused space, they don't get hit.)

To start a check of the array, you write either 'repair' or 'check' to /sys/block/md<N>/md/sync_action. Based on the description in the Linux kernel's software RAID documentation, using 'repair' is better.

Sometimes you know where the error is, for example because the kernel has told you with a message like:

md/raid1:md53: read error corrected (8 sectors at 3416896 on sdd4)
md/raid1:md53: redirecting sector 3416896 to other mirror: sdd4

If you're using mirrored RAID and you want to speed up the repair process (or not take the IO performance hit of re-scanning your entire array while you're trying to do other things), you can limit the portion of the array that 'repair' scans by writing limits to the sync_min and sync_max files in /sys/block/md<N>/md. These are normally '0' and 'max' respectively, which you're going to want to remember because you want to reset them to that after your check is done.

As the documentation more or less covers, the process to do this is:

  1. Echo appropriate values to sync_min and sync_max, perhaps 100 sectors before and after the value reported in the kernel messages.
  2. Start a check by echoing 'repair' to sync_action.
  3. Watch sync_completed until it says that the repair has reached your sync_max value.
  4. Echo 'idle' to sync_action to officially stop the repair.
  5. Reset sync_min and sync_max to their defaults of '0' and 'max'.

If you also have kernel messages logged that report the raw HD sector numbers of your problem sectors, you can also use 'hdparm --read-sector' afterward to verify that sectors no longer have read errors and have identical contents to the version on the good drives.

Of course, all of this is a good reason to make sure that your system automatically does a 'check' (or 'repair') of all of your software RAID arrays on a regular basis. I believe that most current Linux distributions have this already set up, but sometimes these things can get turned off.

Note that you should never clear read errors on software RAID array components by using 'hdparm --write-sector'. With software RAID, it's absolutely crucial that either the sector has the correct contents or that it returns an error. If it reads but returns different data, you have a software RAID inconsistency and may corrupt your data.

PS: As we found out the hard way once, you should keep an eye on how many read errors your software RAID arrays are seeing on their components. Unfortunately this information doesn't seem to currently be captured by things like the Prometheus host agent, so we're probably going to add a script for it. You may also want to keep an eye on the count of sector content mismatches in /sys/block/md<N>/md/mismatch_cnt. Although this is too potentially noisy to alert on and gets reset periodically, it's useful and important enough to be tracked somehow.

linux/SoftwareRaidClearingDiskErrors written at 23:29:58; Add Comment

2019-07-05

My plan for two-stage usage of Certbot when installing web server hosts

Let me start with our problem. When you request TLS certificates through Certbot, you must choose between standalone authentication, where Certbot runs an internal web server to handle the Let's Encrypt HTTP challenge, or webroot authentication, where Certbot puts files in a magic location under the web server's webroot. You can only choose one, which is awkward if you want a single universal process that works on all your hosts, and this choice is saved in the certificate's configuration; it will automatically be used on renewal by default. The final piece is that Apache refuses to start up if there are missing TLS certificates.

All of this creates a problem when installing a host that runs Apache. What you would like to do is perform the install (including the your specific Apache configuration), request the TLS certificates using standalone authentication since Apache can't start yet, and then start Apache and switch to webroot authentication for certificate renewals (so that Certbot can actually renew things now that Apache is using port 80). This would be trivial if Certbot provided a command to change the configured renewal method for a certificate, but as far as I can see they don't. While you can specify the authentication method when you ask for a certificate renewal, this doesn't by itself update the configuration; instead, Certbot only changes the renewal method when you actually renew the certificate.

This means that one way around this would be to request our TLS certificates with standalone authentication, then once Apache was up and running, immediately renew them using webroot authentication purely for the side effect of updating the certificate's configuration. The problem with this (at least in our environment) is that we risk running into Let's Encrypt rate limits, although perhaps not as much as I thought. However, there is a trick we can play to avoid that, because we don't need the first certificate to be trusted. It only exists to bootstrap Apache, and Apache doesn't validate the certificate chain of your certificates. This means that we can ask Certbot to get test certificates instead of real Let's Encrypt ones (using standalone authentication), start Apache, then immediately 'renew' them as real Let's Encrypt certificates using webroot authentication, which will as a side effect update the certificate's configuration.

(Of course in many real situations the actual procedure is 'restore or copy /etc/letsencrypt from the current production machine'.)

This is not as smooth and fluid a process as acmetool offers, and you have to ask for the certificates twice, with different magic command line options. I'm not certain it's worth writing a cover script to simplify this a bit, but perhaps it is, since we also need magic options for registration.

(With appropriate work in the script, you wouldn't even need to list all of the hostnames a second time, just tell it to renew everything as a real certificate now.)

PS: Realizing this trick and working this out makes me feel a fair bit happier about using Certbot. This particular problem was the largest, most tangled obstacle I could see, so I'm glad to have gotten past it.

sysadmin/CertbotTwoStageDeploys written at 22:23:40; Add Comment

2019-07-04

Django's goals are probably not our goals for our web application

Django bills itself as "the web framework for perfectionists with deadlines". As a logical part of that, Django is always working to improve itself, as are probably almost all frameworks. For people with actively developed applications (perfectionists or otherwise), this is fine. They are working on their app anyway, constantly making other changes and improvements and adjustments, so time and Django updates will deliver a continue stream of improvements (along with a certain amount of changes they have to make to keep up, but again they're already making changes).

This does not describe our goals or what we do with our web application. What we want is to write our app, reach a point where it's essentially complete (which we pretty much achieved a while ago), and then touch it only on the rare occasions when there are changes in the requirements. Django provides what we need in terms of features (and someone has to write that code), but it doesn't and never will provide the stability that we also want. Neither sharks nor frameworks for perfectionists ever stand still.

This creates an awkward mismatch between what Django wants us to do and what we want to do, one that I have unfortunately spent years not realizing and understanding. In particular, from our perspective the work of keeping up with Django's changes and evolution is almost pure overhead. Our web application is running fine as it is, but every so often we need to go change it in order to nominally have security fixes available, and in completely unsurprising news I'm not very enthusiastic or active about doing this (not any more, at least; I was in the beginning). The latest change we need is an especially large amount of work, as we will have to move from Python 2 to Python 3.

(We don't need bug fixes because we aren't running into bugs. If we were, we probably would have to work around them anyway rather than wait for a new Django release.)

I don't know what the solution is, or even if there is a solution (especially at this point, with our application already written for Django). I expect that other frameworks (in any language) would have the same bias towards evolution and change that Django does; most users of them, especially big active ones, are likely people who have applications that are being actively developed on a regular basis. I suspect that 'web frameworks for people who want to write their app and then walk away from it' is not a very big niche, and it's not likely to be very satisfying for open source developers to work on.

(Among other structural issues, as a developer you don't get to do anything. You write your framework, fix the bugs, and then people like me want to you stop improving things.)

PS: I don't think this necessarily means that we made a bad choice when we picked Django way back when, because I'm not sure there was a better choice to be made. Writing our web app was clearly the right choice (it has saved us so much time and effort over the years), and using a framework made that feasible.

python/DjangoGoalsNotOurGoals written at 21:30:02; Add Comment

2019-07-03

Converting a variable to a single-element slice in Go via unsafe

I was recently reading Chris Wellons' Go Slices are Fat Pointers. At the end of the article, Wellons says:

Slices aren’t as universal as pointers, at least at the moment. You can take the address of any variable using &, but you can’t take a slice of any variable, even if it would be logically sound.

[...] However, if you really wanted to do this, the unsafe package can accomplish it. I believe the resulting slice would be perfectly safe to use:

// Convert to one-element array, then slice
fooslice = (*[1]int)(unsafe.Pointer(&foo))[:]

I had to read this carefully before I understood what it was doing, but then after I read the documentation for unsafe.Pointer() carefully, I believe that this is fully safe. So let's start with what it's doing. The important thing is this portion of the expression:

(*[1]int)(unsafe.Pointer(&foo))

This is essentially reinterpreting foo from an integer to a one-element array of integers, by taking a pointer to it and then converting that to a pointer to a one-element array. I believe that this use of unsafe.Pointer() is probably valid, because it seems like it falls under the first valid use in the documentation:

(1) Conversion of a *T1 to Pointer to *T2.

Provided that T2 is no larger than T1 and that the two share an equivalent memory layout, this conversion allows reinterpreting data of one type as data of another type. [...]

In Go today, an integer and a one-element array of integers are the same size, making the first clause true and pretty much requiring that the second one is true as well. I don't think that Go requires this in the language specification, but in practice it's very likely to be the case in any implementation that wants to adhere to Go's ethos of efficiency and minimalism. Once we have a valid pointer to a (valid) one-element array of int, it's perfectly legal to create a slice from it, which is what the '[:]' does. So if this use of unsafe is valid, the resulting slice is fully safe and valid.

Now we get to the interesting question of why Go doesn't allow this without the use of unsafe.Pointer(). One possible answer is that this is not allowed simply because it would require extra work in the language specification and the compiler. This may well be the case (and it's certainly a very Go style reason), but another possibly reason is that Go doesn't want to require that all implementations make a one-element array have exactly the same memory layout and implementation as a single variable. By confining this to the limited assurances of unsafe and not making it part of the guaranteed language specification, Go keeps people's options open.

(Of course this is only theoretical, because in practice a new implementation will likely want to reuse as much of the standard library as possible and the current standard library uses unsafe in various places. If you don't match what works with unsafe today in mainline Go, you're going to have to rewrite some of that code. Also, see how unsafe type conversions are still garbage collection safe for some more discussion of this area.)

programming/GoVariableToArrayConversion written at 22:00:33; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.