I think it's still reasonable to run personal servers on the Internet
In his comment on yesterday's entry, Pete Zaitcev showed me that I should clarify my opinion on running your own personal servers today on the Internet (to the extent that I have an opinion at all). To summarize the rest of this entry, I don't think there's any compelling reason why you shouldn't run a personal server if you want to and you more or less know what you're getting yourself into. At the same time, it's not trivial to do so; it's very much the DIY choice, with all that that implies.
First off, I definitely think that you should have a personal presence on the Internet that's not tied to your (current) employer; in other words, don't make my university sysadmin's email mistake. Having your own domain name is optional and will cost you some money and effort but it probably pays off in the long run, at least for websites (in today's email spam environment, changing email addresses every few years may actually be a feature). However, none of this requires you to have your own servers; plenty of places support you pointing some aspect of your domain at their infrastructure, at least for common things like websites, email, and DNS. Taking advantage of this (either for free or paying people) is definitely the easy way to go.
However, I think that it's still reasonable to have your own server or servers instead, especially now that you can get inexpensive virtual machines that you genuinely run yourself (your choices used to be 'shared hosting' or paying for actual physical hardware and rack space). Modern Unix server software is not full of holes and is generally relatively straightforward to administer, the Internet is not an intrinsically hostile place of DDoS and hate, and most people are still willing to talk to random machines for things like websites (your mileage may vary for things like sending email from your server to GMail). Generally if you put a modern Unix on the Internet for personal use and operate it with decent competence, you'll be okay at one level.
(My impression is that modern VPS providers have done a lot of work to make it very easy for you to bring up a new generic Ubuntu, CentOS, or whatever server that will come up in a sane and operable condition and probably automatically apply security updates and so on. I don't know what Amazon AWS is like, though.)
At another level, by running your own server you're making tradeoffs and accepting limitations. The broad downside is that you've chosen the DIY approach and DIY is always more work and requires more knowledge than getting someone else to do it for you. If you're already a sysadmin it can feel like a busman's holiday, and if you're not a sysadmin or an experienced Unix person you're going to have to turn yourself into one. One dangerous side of this is that it's easy to make mistakes through ignorance, for example not making sure you have some sort of backups. For a personal server, you don't necessarily need everything you want for running one in a company, but there are still a lot of things that may bite you some day. System administration is unfortunately a field so full of trivia that people keep having to rediscover pieces of it the hard way.
Another limitation is that, to put it one way, you're not going to get your own personal GMail, either in its interface or probably in its resilience against spam and other problems. The open source world has produced great marvels and there are things that can come close to some parts of the big company services, but on the whole the DIY approach is going to get you results that are objectively inferior in some ways. It's up to you to decide if you care for your usage; if you read all your email through an IMAP client, for example, the lack of a sophisticated GMail web interface is not an issue.
Judged purely by the end results, this can make running your own server a bad choice. You spend more time, have to learn more things and worry about more issues, and you get an inferior result. If you're going to run your own server anyway, you should have an answer to the question of why, or what you get out of it. One perfectly good answer is 'I want to play around with my own Unix server'; another is 'I don't like having so much of my Internet life at the mercy of big indifferent companies'.
Further, my current broad view is that you shouldn't run anything critical on a personal server unless you're extremely confident that you know what you're doing and that you have working backups (on another provider). Casually operated personal servers are best used for things that you can afford to be down for a few days while you patch things back together from an upgrade, a security problem, an accident, or your VPS provider screwing something up. If you need a highly resilient personal server environment, you're probably looking at a significant amount of work unless you're already an expert in the field and can put together a solid Puppet, Kubernetes, or AWS environment in your sleep.
On the flipside, this is caution speaking. Most of the time you're going to be fine, especially if you pay your VPS provider for some form of backups (and then keep your own offsite copies). Just make sure to apply security updates and as part of this, upgrade or build a new version of the VPS when your Unix or Linux distribution reaches its end of life.
(My personal plan is to use at least two completely separate VPS providers, but that requires getting over my inertia and lack of desire to run my own infrastructure.)
By the way, all of this assumes that you aren't someone who is going to be actively and specifically targeted by attackers. If this is not true, you really need to know what you're doing as far as security goes and you're probably better off in the tender arms of GMail and so on. GMail has a very good security team with a lot of resources, far more than you or I do.
Running servers (and services) well is not trivial
It all started with a Reddit comment thread called The "mass exodus" from Github to GitLab: 10 days later. In it, someone commented that they didn't understand why there was a need for cloud Git services in the first place, since running your own Git server for your company was easy enough. I think that part of this view is due to the difference between 'on premise' and 'off premise' approaches to environments, but as a sysadmin I had a twitchy reaction to the 'it's easy' part.
These days, it's often relatively easy to 'just set up a server' or a service, especially if you already work in the cloud. Spin up a VM or a Docker image, install some stuff, done, right? Well, not if you want this to be reliable infrastructure. So let's run down what you or I would have to do to set up a general equivalent of Github for internal company use:
- Figure out the Git server software you want to use, including whether you want the full Github experience or a web Git thing that people in can pull from and push to. Or you could go very old school and demand that people use SSH logins to a Unix machine where they do filesystem level git operations, although I'm not sure that would work well for very long.
- Possibly figure out how to configure this Git server software
for your particular requirements, setup, and available other
- Figure out how you're going to handle authentication for this
Git service. Do you have a company authentication system? How
will you tie this service to any 2FA that you use (and you probably
want to consider 2FA)?
If you can't outsource all authentication to some other system in your company, you've just taken on an ongoing maintenance task of adding, removing, and updating users and groups (and any other things that have to authenticate to the Git server). Failure to do this well, or failure to be hooked into other aspects of the company's authentication and HR systems, can result in fun things like fired employees still retaining high-privilege access to your Git server (ie the company's Git server). You probably don't want that.
(Non-integrated authentication causes sufficiently many problems that it's featured in a sysadmin test.)
- Install a 'server' to run all of the necessary components on. In
the best case, you use Docker or something that uses Docker images
and there's a pre-packaged Docker image that you can throw on.
In the worst case, you get to find and install a physical server
for this, including hooking it into any fleet-wide management
systems so that it automatically gets kept up to date on security
patches, and then install a host of un-packaged software from
source (or install random binaries you download from the Internet,
if you feel like doing that).
If you're using Docker or a VPS in the cloud, don't forget to figure out how you're going to get persistent storage.
- Figure out how to back up the important data on the system. Even
if you have persistent cloud storage, you want some form of backups
or ability to roll back in time, because sooner or later someone
will accidentally do something destructive to a bit of the system
(eg 'oops, I mass-deleted a bunch of open issues by mistake') and
you'll need to fix it.
Once you have backups set up, make sure that you're monitoring them on an ongoing basis so that you can find out if they break.
- If your company has continuous integration systems and similar development automation, or has production servers that pull from your Git repos (or that get pushed to by them), you're going to need to figure out how to connect all of this to the Git server software. This includes things like how to authenticate various parties to each other, unless everyone can pull anything from your Git server.
I'm going to generously assume that the system never has performance problems (which you'd have to troubleshoot) and never experiences software issues with your chosen Git server and any databases and the like that it may depend on (which you'd have to troubleshoot too). Once set up, it stays quietly running in the corner and doesn't actively require your attention. This is, shall we say, not always the experience that you actually get.
(I'm also assuming that you can just get TLS certificates from Let's Encrypt and that you know how to do this. Or perhaps the Git server stuff you picked does this all for you.)
Unfortunately we're not done, because the world is not a nice place. Even with the service just working, we still have more things to worry about:
- Figure out how to get notified if there are security issues or
important bugfixes in either the Git server software or the
underlying environment that it runs on. You can be confident
that there will be some, sooner or later.
- Even without security problems, someday you're going to have to
update to a new version of the Git server software. Will it be
fully compatible as a drop-in replacement? If your Git server is
important to your company, you don't really want to just drop the
new version in and hope for the best; you're going to have to
spend time building out an environment to test the new version in
(with something like your data).
New versions may require changes to other bits of the system, any local customizations, or to things you integrated the old version with. Updates are often a pain but at the same time you have to do them sooner or later.
- In general you need to worry about how to secure both the Git
server software and the underlying environment it runs on. The
defaults are not necessarily secure and are not necessarily
appropriate for your company.
- You may want to set up some degree of monitoring for things like disk space usage. If this Git server is important, people will notice right away if it goes down, but they may not notice in time if the disk space is just quietly getting closer and closer to running out because more and more people in the company are using it for more stuff.
If this is something that matters to the company and the company is more than a few people, it's also not just 'you' (a single person) who will be looking after the server. The company needs at least a few people involved so that you can go on vacation or even just get sick without the company running the risk of the central Git server that all the developers use just falling over and no one knowing how to bring it back.
In some environments this Git server will either be exposed to the Internet (even if it's only used by company people) or at least available across multiple 'internal network' locations because, say, your developers are not really on the same network as your production servers in the cloud. This will likely raise additional network security issues and perhaps authentication issues. This is very especially the case if you have to expose this Git service to the Internet. Mere obscurity and use only by company insiders is not enough any more these days; there are systems that mass scan the entire IPv4 Internet and keep world-accessible databases of what they find. If you have open ports or services, you have problems, and that means you're going to have to do the work to close things down.
Basically all of this applies to pretty much any service, not just Git. If you want to run a mailer, or an IMAP server, or a web server hosting a blog or CMS, or a DNS server, or whatever, you will have to face all of these issues and often more. Under the right circumstances, the initial setup can appear extremely trivial; you grab a Docker image, you start it somewhere, you make your internal DNS point a name to it, and you can call it up in your browser or otherwise start poking it. But that's just the start of a long and tiresome journey to a reliable, secure service that can be sustained over the long haul and won't blow up in your face someday.
I'm naturally inclined toward the 'on premise' view of things where we do stuff internally. But all of this is why, if someone approached me at work about setting up a Github-like service locally, I would immediately ask 'is there some reason we can't pay Github to do this for us?' I'm pretty confident that Github'll do it better, and if staff time isn't free they'll probably do it cheaper.
PS: I've probably missed some things you'd need to think about and tackle, which just goes to show how non-trivial this really is.
The 'on premise' versus 'off premise' approach to environments
As a result of thinking about why some people run their own servers and other people don't, it struck me today that on the modern Internet, things have evolved to the point where we can draw a division between two approaches or patterns to operating systems and services. I will call these the on premise and off premise patterns.
In the on premise approach, you do most everything within a self contained and closed environment of your own systems (a 'premise'). One obvious version of this is when you have a physical premise and everything you work with is located in it. This describes my department, for example, and many similar sysadmin setups; since we operate physical networks, have printers, and so on, we have no real choice but to do things on premise with physical hardware, firewall servers, and so on. However, the on premise approach doesn't require you to be doing internally focused work or for you to have physical servers. You can take the on premise approach in a cloud environment where you're running a web business.
(You can have a rousing debate over whether you can truly have a single on premise environment if you're split across multiple physical locations, or a physical office plus a cloud.)
In the off premise approach, you don't try to have a closed and self contained environment of your own systems and services, a 'premise' that stands alone by itself. Instead you have a more permeable boundary that you reach across to use and even depend on outside things, up to and including things from entirely separate companies (where all you can really do if there's a problem is wait and hope). The stereotypical modern Silicon Valley startup follows an off premise and outsourced approach for as many things as it can, and as a result works with and relies on a whole host of Software as a Service companies, including for important functions such as holding its source code repositories and coordinating development (often on Github).
An off premise approach doesn't necessarily require outsourcing to other companies. Instead I see it as fundamentally an issue of how self contained (and complete) your service environments are. If you're trying to do most everything yourself within an environment, or within a closely connected cluster of them, you're on premise. If you have loosely connected services that you group into different security domains and talk across the Internet to, you're probably off premise. I would say that running your own DNS servers completely outside and independently of the rest of your infrastructure is an off premise kind of thing (having someone else run them for you is definitely off premise).
While there's clearly a spectrum in practice, my impression is that on premise and off premise are also mindsets and these mindsets are generally sticky. If you're in the on premise mindset, you're reflexively inclined to keep things on premise, under your control; 'letting go' to an outside service is a stretch and you can think of all sorts of reasons that it'd be a problem. I suspect that people in the off premise mindset experience similar things in the other direction.
(As you might guess, I'm mostly an on premise mindset person, although I've been irradiated by the off premise mindset to a certain extent. For example, even though I'm in no hurry to run my own infrastructure for email, I'm even less likely to outsource it to a provider, whether GMail or anyone else.)