Chris's Wiki :: blog/sysadmin Commentshttps://utcc.utoronto.ca/~cks/space/blog/sysadmin/?atomcommentsDWiki2024-03-27T03:11:17ZRecent comments in Chris's Wiki :: blog/sysadmin.By Chris Siebenmann on /blog/sysadmin/PrometheusDNSMonitoringProblemtag:CSpace:blog/sysadmin/PrometheusDNSMonitoringProblem:15ef31a906c382371ac9aad58b70ce53cc337103Chris Siebenmann<div class="wikitext"><p>We aren't automatically generating any configurations so far, for an
assortment of reasons that include the extra complexity and the lack of
clear need. Your question did get me to think about and write up <a href="https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusAutomatingDNSChecks">how
I might do this</a>, but I don't think this
is something we want badly enough or would use often enough to be worth
the extra complexity.</p>
<p>(Would I write an entire exporter to do this? In theory, maybe, because
the exporter at least confines the complexity to one place instead of
spreading it out across Blackbox, Prometheus, and so on.)</p>
</div>2024-03-27T03:11:17ZBy Mike Kohne on /blog/sysadmin/PrometheusDNSMonitoringProblemtag:CSpace:blog/sysadmin/PrometheusDNSMonitoringProblem:e692c030d7bbc0d5a7b201f365ad5758085db256Mike Kohne<div class="wikitext"><p>Have you considered a script to generate the various configs? It'd be easier than writing a whole exporter, yet probably a lot quicker.</p>
</div>2024-03-16T14:15:13ZBy Walex on /blog/sysadmin/MyDesktopTourtag:CSpace:blog/sysadmin/MyDesktopTour:5f11a020ff12a61d365adf4fb0f1e36d83f7509fWalex<div class="wikitext"><p>«<em>Keeping a file for this long is something I hope to achieve but finding it year later is the real problem. I Guess I'll research a good folder structure.</em>»</p>
<p>I have files going back to 1982 in my home directory and to 1976 on paper.
I use both year-based directories and topic-based ones, and not too many.
I also use a very good (but to do that it takes 10-15% of extra space) local spider/indexer called "recoll" which I have found works a lot better than most others:</p>
<p><a href="http://recoll.org/pages/index-recoll.html">http://recoll.org/pages/index-recoll.html</a></p>
</div>2024-03-10T21:39:37ZBy TheSameWayTheBricksDont on /blog/sysadmin/MyDesktopTourtag:CSpace:blog/sysadmin/MyDesktopTour:09d0f574cbd95ff23f2548f681a75663bbe4ab01TheSameWayTheBricksDont<div class="wikitext"><p>Incredible, thank you very much.
Keeping a file for this long is something I hope to achieve but finding it year later is the real problem.
I Guess I'll research a good folder structure.</p>
</div>2024-03-10T15:06:57ZBy Chris Siebenmann on /blog/sysadmin/MyDesktopTourtag:CSpace:blog/sysadmin/MyDesktopTour:c4cc96f7a32561297d92e23992c1c1813a971607Chris Siebenmann<div class="wikitext"><p>I'm a data packrat so I have indeed kept the image. Doing some searches,
it appears to be a piece of concept art for Kiki by Katsuya Kondō. You
can see versions of it eg <a href="https://ghibli.fandom.com/wiki/Kiki">here</a>,
<a href="https://www.zerochan.net/3720670">here</a>, and <a href="https://ftp.sunet.se/mirror/archive/ftp.sunet.se/pub/tv+movies/anime-manga/new/">here</a>,
the last of which which may be my original source (no warranties implied
by any link). I suspect all of these versions were scanned from an art
book of the movie at some point many years ago and have circulated on
the Internet ever since.</p>
</div>2024-03-08T19:18:06ZBy TheSameWayTheBricksDont on /blog/sysadmin/MyDesktopTourtag:CSpace:blog/sysadmin/MyDesktopTour:d6d9530b9056976e878de9fa65dc26a6441d03b7TheSameWayTheBricksDont<div class="wikitext"><p>I know this is a (very) long shot but do you happen to have kept the Kiki wallpaper from your old desktop writeup? It was linked to on your old site but the FTP server serving the file doesn't exist anymore.
I love the movie and the drawing seem fantastic.</p>
<p>I just discovered you blog yesterday and I love it by the way, been looking around and following links ever since.</p>
</div>2024-03-08T17:41:07ZBy Lars Windolf on /blog/sysadmin/PrometheusAbsentMetricsAndLabelstag:CSpace:blog/sysadmin/PrometheusAbsentMetricsAndLabels:6206cf28b650aed33d345e4025dde9b0afe1a586Lars Windolf<div class="wikitext"><p>This really helped me this week! Was looking for a solution of exactly this problem. Thanks!!!</p>
</div>2024-03-02T11:33:54ZBy edgewood on /blog/sysadmin/RsyncRecentDirectoryContentstag:CSpace:blog/sysadmin/RsyncRecentDirectoryContents:1357252b8b3214fa1ee924038a8a39806283f6a9edgewood<div class="wikitext"><p>Late comment, but I almost always use the rsync options <code>--itemize-changes</code> and <code>--dry-run</code> to ensure that the rest of the options do what I'm expecting, then drop <code>--dry-run</code> when I'm satisfied.</p>
</div>2024-03-01T00:57:27ZBy Anonymous on /blog/sysadmin/WhyNoMachineInventorytag:CSpace:blog/sysadmin/WhyNoMachineInventory:62ff7c7aef09245b04902cc9b50174e126d490f3Anonymous<div class="wikitext"><p>Just two words: Configuration Management.</p>
</div>2024-02-28T21:43:25ZBy Mike Tancsa on /blog/sysadmin/SSHBruteForceAttacksAbruptlyDowntag:CSpace:blog/sysadmin/SSHBruteForceAttacksAbruptlyDown:179a541f0728148712dca9abedb6225559867f9fMike Tancsa<div class="wikitext"><p>My network (couple of /18s worth of ip space) gets hit in fits and starts as well. I also wonder if companies like Shodan have created a secondary market for players to constantly scan the internet and sell their results.
Anyways,
<a href="https://www.dshield.org/data/port/22">https://www.dshield.org/data/port/22</a>
doesnt show any overall trend changes as of late.</p>
</div>2024-02-24T23:16:41ZBy Anonymous on /blog/sysadmin/SSHBruteForceAttacksAbruptlyDowntag:CSpace:blog/sysadmin/SSHBruteForceAttacksAbruptlyDown:f1a51b55597b80079c1c0f8f2bf1c12d8cf0754fAnonymous<div class="wikitext"><p>Perhaps someone - yet unannounced - took a (mayor) botnet offline ?</p>
</div>2024-02-23T21:56:59ZBy jlin on /blog/sysadmin/SSHBruteForceAttacksAbruptlyDowntag:CSpace:blog/sysadmin/SSHBruteForceAttacksAbruptlyDown:f17e211db730158ed8f4e1ddd1dc8afb96825bfdjlin<div class="wikitext"><p>Could it be something like packet filtering at the upstream?</p>
</div>2024-02-23T09:36:48ZBy Verisimilitude on /blog/sysadmin/CustomizationSensibleLimitstag:CSpace:blog/sysadmin/CustomizationSensibleLimits:1ab7d49e69b96a25eed5277ea3c7f5de9413b8b1Verisimilitudehttp://verisimilitudes.net<div class="wikitext"><p>Ideally, programs should be small enough they can be understood and directly customized without the need for a special configuration format or language. While not what I mean, I've taken an Emacs mode and directly modified it, to load my modified version instead, because I couldn't figure out how to make the particular change otherwise. An ultimate form of this is designing and creating one's own tools, and slowly replacing Emacs piecewise like this; it's still my main writing tool, however.</p>
</div>2024-02-16T18:00:46ZBy Chris Siebenmann on /blog/sysadmin/GrafanaLokiStartupWALReplayIssuetag:CSpace:blog/sysadmin/GrafanaLokiStartupWALReplayIssue:7d9560b74b04416c1979372116a27feb222e696dChris Siebenmann<div class="wikitext"><p>There are a number of issues with Victorialogs today, starting with how
they explicitly say "it isn’t recommended to migrate from existing
logging solutions to VictoriaLogs Preview in general cases yet". Beyond
that it still relies on Loki's Promtail for shipping logs from systemd
and syslog, so we would only be half moving away from Loki, and it also
doesn't appear to have any integration with Grafana. Their current
documentation also says it's missing features from LogQL, some of which
we make significant use of in current queries.</p>
<p>Victoriallogs may someday be a complete, self-contained full replacement
for Loki, but it's not currently one.</p>
</div>2024-02-16T17:18:51ZBy valyala on /blog/sysadmin/GrafanaLokiLogcliNotestag:CSpace:blog/sysadmin/GrafanaLokiLogcliNotes:0c6f98bde5b0c64b66e262664138e9a7aae0a619valyala<div class="wikitext"><p>You definitely need to read <a href="https://docs.victoriametrics.com/victorialogs/querying/#command-line">https://docs.victoriametrics.com/victorialogs/querying/#command-line</a> and stop hurting yourself with logcli :)</p>
</div>2024-02-16T00:21:07ZBy valyala on /blog/sysadmin/GrafanaLokiStartupWALReplayIssuetag:CSpace:blog/sysadmin/GrafanaLokiStartupWALReplayIssue:0bcd58b73329aeb53c067f2b0aa64793b7c1cbc5valyala<div class="wikitext"><p>Why you still use Loki and don't switch to something more reliable and resource efficient? For instance, victorialogs . It is open source, it is easy to setup and operate - just a single small statically linked binary without external dependencies. It is much faster at querying than Loki. And it integrates well with traditional Unix commands - see <a href="https://docs.victoriametrics.com/victorialogs/querying/#command-line">https://docs.victoriametrics.com/victorialogs/querying/#command-line</a></p>
</div>2024-02-16T00:10:59ZBy Verisimilitude on /blog/sysadmin/ReportConfigFileLocationstag:CSpace:blog/sysadmin/ReportConfigFileLocations:7ead682dd24ca48bfa0b1ed1398d9349ea02a886Verisimilitudehttp://verisimilitudes.net<div class="wikitext"><blockquote><p>if a program uses a configuration file (or several), it should have an obvious command line way to find out where it expects to find that configuration file.</p>
</blockquote>
<p>This sounds an awful lot like a standard mechanism for configuration, which UNIX lacks.</p>
<blockquote><p>Ideally I'd like to avoid scanning all the way through the manual page or other documentation for the program to find out, because that's slow and annoying.</p>
</blockquote>
<p>I don't think MicroSoft Windows has this problem, but I regularly read people bitching about the central configuration mechanism it has, whatever its name is. Anyway, this is the fun of using <em>convention</em> for everything, nothing actually works.</p>
<p>Personally, I've grown to believe configuration files are stupid, and everything I design lacks them. Ideally, every program is so small that it can be modified directly for such configurations.</p>
</div>2024-02-15T21:37:11ZBy Milo on /blog/sysadmin/TrackingMachineImportancetag:CSpace:blog/sysadmin/TrackingMachineImportance:a14cabb7a8465b1889cb79d4f522210199823b6bMilo<div class="wikitext"><p>Chapter 6 ("The Checklist Factory") of The Checklist Manifesto makes similar points: they shouldn't be static, they're definitely not free (to create or use, and thus can't cover everything), and testing is needed. If you're unfamiliar, it might be worth a trip to your excellent university library.</p>
<blockquote><p>There are good checklists and bad, Boorman explained. Bad checklists are vague and imprecise. They are too long; they are hard to use; they are impractical. They are made by desk jockeys with no awareness of the situations in which they are to be deployed. They treat the people using the tools as dumb and try to spell out every single step. They turn people’s brains off rather than turn them on.</p>
</blockquote>
<blockquote><p>Good checklists, on the other hand, are precise. They are efficient, to the point, and easy to use even in the most difficult situations. They do not try to spell out everything—a checklist cannot fly a plane. Instead, they provide reminders of only the most critical and important steps—the ones that even the highly skilled professionals using them could miss. Good checklists are, above all, practical.</p>
</blockquote>
<blockquote><p>The power of checklists is limited, Boorman emphasized. […]</p>
</blockquote>
<blockquote><p>[Testing] is not easy to do in surgery, I pointed out. Not in aviation, either, he countered. You can’t unlatch a cargo door in mid-flight and observe how a crew handles the consequences. But that’s why they have flight simulators, and he offered to show me one. […]</p>
</blockquote>
<blockquote><p>The three checklists took no time at all—maybe thirty seconds each—plus maybe a minute for the briefing. The brevity was no accident, Boorman said. People had spent hours watching pilots try out early versions in simulators, timing them, refining them, paring them down to their most efficient essentials.</p>
</blockquote>
<p>I doubt it's practical for you to spend weeks observing rookies in sysadmin simulators. But when I helped with on-boarding of new employees, I found it helpful to refer them to a "new employee" wiki page and ask them to edit it or ask questions as necessary; when they came to me with a question I thought should have been covered there, but wasn't, I could add it while we were speaking. Same for employees with more tenure: if you spent a bunch of time figuring something out, and it'll take less time to document it, write it on the wiki. (We also had a very bureaucratic "official document" process—so bureaucratic that, quite frankly, most of us didn't know what the process for revising a document was... so we didn't do that, and hence few people who weren't ISO auditors ever looked at them.)</p>
<p>As for the general idea of simulation: simulating a network of servers would take significant time to set up, and would never have 100% fidelity (there's always some dusty old machine in a closet that no remaining employee knows is important), but maybe should be something to aspire to. If done very well, (almost) the whole environment could be deployed into virtual machines for (limited) failure testing; if done extraordinarily well, with budgets unattainable to most sysadmins, testing could involve unplugging production machines randomly (cf. Netflix's "Chaos Monkey").</p>
</div>2024-02-07T21:04:09ZBy Chris Siebenmann on /blog/sysadmin/TrackingMachineImportancetag:CSpace:blog/sysadmin/TrackingMachineImportance:af8cb9fc0fd486650feb2def14ee1fb3850a2a5eChris Siebenmann<div class="wikitext"><p><a href="https://utcc.utoronto.ca/~cks/space/blog/sysadmin/UseAChecklist">I'm a big fan of checklists</a>, but at the same time
I think there are real issues. In general, <a href="https://utcc.utoronto.ca/~cks/space/blog/sysadmin/DocumentationIsNotFree">documentation isn't free</a> and checklists are a form of documentation.
For checklists related to failures, there's the additional issue that
<a href="https://utcc.utoronto.ca/~cks/space/blog/sysadmin/DocumentationNeedsTesting">documentation needs testing</a>, which can be
hard to do if you need an actual failure or a sufficiently accurately
simulated one to test your checklist with (and it also takes time).
In system administration, checklists generally can't be static things
that are created once, because the environment is constantly changing;
this means not just updating but re-checking and so on.</p>
<p>System environments are often sufficiently complicated that it's very
hard to foresee all effects of a failure or all interactions that your
systems have (some would say it's impossible). It's a classic story in
for the field that 'we thought we understood everything and had mitigated
everything, except surprise we hadn't'.</p>
<p>(Our checklists work best for routine things like installing machines
and for exceptional events that we can consider carefully in advance,
like planned power shutdowns.)</p>
</div>2024-02-07T19:10:32ZBy Milo on /blog/sysadmin/TrackingMachineImportancetag:CSpace:blog/sysadmin/TrackingMachineImportance:09dee5335b55e27127bdcf860b0800596cfdff47Milo<div class="wikitext"><p>In aviation, there are checklists for nearly everything. Regular maintenance, pre-boarding inspections, power-up, take-off, and of course the abnormal ones like "engine failure" and "rapid decompression". There have been some efforts to bring this mindset into other fields such as surgery. One of the major difficulties is trying to keep these short and helpful, rather than something that's accumulated "cruft" and is perceived as a clock-gobbling chore.</p>
<p>I wonder if that would be useful for system administrators. Like, a cooling failure checklist, post-power-outage checklist, DNS failure checklist, as so on. A big benefit in this field would be that many of the tasks, such as "new server bringup" and most things related to monitoring, could be scripted.</p>
</div>2024-02-07T16:26:17ZBy Miksa on /blog/sysadmin/TrackingMachineImportancetag:CSpace:blog/sysadmin/TrackingMachineImportance:cfb6903ee19324365d11d7479f09dcbf5522e168Miksa<div class="wikitext"><p>We had a similar experience few years back, but for a sillier reason. On a Sunday morning one of our datacenters started approaching 60C, bunch of servers had already turned themselves off preemptively and few of us showed to investigate and open doors for extra cooling. Sitting at the door we started pondering what could be the reason that 3 out 4 cooling units were turned off. We surmised that some kind of building automation controls the cooling based on temperature sensors and one way or another the data from the sensors needs to be transmitted out. One of us got a recollection or hunch that the cabinet with a door in the datacenter wall might have something to do with it and we decided to take a look.</p>
<p>Inside we find a small router-looking device and a small Eaton UPS. The device was off, an indication of a dead UPS, so we decided to try what happens if we unplugget the power cords from the UPS and connected them to each other. The device came alive and soon after the cooling units started turning back on.</p>
<p>The datacenter has an UPS the size of a large room, and it all comes crashing down because of a little UPS no one even remember existed.</p>
<p>This experience was a big intensive to go through the process you are considering. Our goal was to produce lists with startup order for the servers in case a datacenter had gone down. The first phase was documenting the role (dev/test/prod/administration) and priority tier (1-5) for all servers. We already had this information, but it was quite spotty. Annoying ordeal but not too bad in the end. Create lists, couple op staff go through them and add their opinions, then as a full group comb the list and negotiate an educated guess for all of them. Takes few hours but is worth it. A parallel task was to modify our ochestration tool to ask for this info for all new servers.</p>
<p>Next phase was to create script that creates list based on this information and the rack location of the servers. A great help for this was our scripted maintenance windows. Many frontend servers had scripts that make them wait until some other server has finished booting and some service, usually database, is back online. The backend server would automatically get at least .5 higher priority than the front. Then just a matter of uploading these lists to a website. Biggest remaining obstacle is regularly printing these lists to the datacenters.</p>
</div>2024-02-07T14:54:03ZBy Milo on /blog/sysadmin/TrackingMachineImportancetag:CSpace:blog/sysadmin/TrackingMachineImportance:0af4d566261fab89eba0be41beac2df7d5a629b0Milo<div class="wikitext"><p>We also don't want to end up like <a href="https://web.archive.org/web/20230610235249/http://bash.org/?5273">this person quoted on bash.org</a> (which seems to have gone offline recently):</p>
<p><erno> hm. I've lost a machine.. literally _lost_. it responds to ping, it works completely, I just can't figure out where in my apartment it is.</p>
</div>2024-02-06T17:45:16ZBy Anonymous on /blog/sysadmin/TrackingMachineImportancetag:CSpace:blog/sysadmin/TrackingMachineImportance:f9fbc75c7279dd1bf9f4f78551a256124d7d5ac9Anonymous<div class="wikitext"><p>Perhaps (slightly) related: the first documented time people thought about this formally (as far as I can remember), was in this paper at Usenix/LISA '98: "Bootstrapping an Infrastructure" (<a href="https://www.usenix.org/legacy/event/lisa98/traugott.html">https://www.usenix.org/legacy/event/lisa98/traugott.html</a>) which might or might not still be relevant today.</p>
</div>2024-02-06T14:22:31ZBy goatops on /blog/sysadmin/TrackingMachineImportancetag:CSpace:blog/sysadmin/TrackingMachineImportance:ff78a76446d58c4850de362dc19ff113ad07edbagoatops<div class="wikitext"><p>We set up a simple color-coded sticker scheme for our ops team to follow in the event of a power failure where we wanted to maximise available power from the UPS. Each server was affixed with a green (can be shut down immediately), orange (can be shut down with notice) or red (try not to shut down at all) sticker on the front. Worked pretty well for us.</p>
</div>2024-02-06T12:25:48ZBy David Magda on /blog/sysadmin/TrackingMachineImportancetag:CSpace:blog/sysadmin/TrackingMachineImportance:ea8119e63bf30ae8e2c7cc7b79f4933fb8b08788David Magdahttp://www.magda.ca/<div class="wikitext"><p>If it is decided to try to document hardware, Netbox is a pretty good and worth looking at:</p>
<ul><li><a href="https://netbox.dev">https://netbox.dev</a></li>
<li><a href="https://docs.netbox.dev/en/stable/">https://docs.netbox.dev/en/stable/</a></li>
<li><a href="https://github.com/netbox-community/netbox">https://github.com/netbox-community/netbox</a></li>
<li><a href="https://demo.netbox.dev">https://demo.netbox.dev</a> (just click on "Log In": use any username/password, or click on "Sign In" for auto-generation)</li>
</ul>
</div>2024-02-06T12:02:29ZBy Arnaud Gomes on /blog/sysadmin/TrackingMachineImportancetag:CSpace:blog/sysadmin/TrackingMachineImportance:068721b1070c655d0bf70d55a4d21e73cdac8e6dArnaud Gomes<div class="wikitext"><p>We had the same kind of issues at a previous workplace, we ended up writing a "reboot the machine room HOWTO". It forced us to identify dependencies, and anything in the bottom half of the list was probably not essential.</p>
<pre>
-- A
</pre>
</div>2024-02-06T08:25:03ZBy -dsr- on /blog/sysadmin/ServersDroppingSerialPortstag:CSpace:blog/sysadmin/ServersDroppingSerialPorts:3b711944eb460c300394632a65afa96b0858b78d-dsr-https://blog.randomstring.org<div class="wikitext"><p>WTI ( <a href="https://www.wti.com/collections/console-servers">https://www.wti.com/collections/console-servers</a> ) makes reliable serial terminal servers. Fairly expensive, but also very long-lived.</p>
<p>Nevertheless, we've stopped putting them in datacenters in favor of IPMI.</p>
</div>2024-01-29T14:45:33ZFrom 193.219.181.219 on /blog/sysadmin/ServersDroppingSerialPortstag:CSpace:blog/sysadmin/ServersDroppingSerialPorts:a5dde821cf552bd0a9cdb80393c280417137a5f3From 193.219.181.219<div class="wikitext"><blockquote><p>The physical hardware for a serial port does add some cost and take up some space</p>
</blockquote>
<p>Routers and switches have long had a solution for that – RJ45 Cisco-style serial port. I'm surprised that servers never adopted it.</p>
<p>(Then again, I've just recently installed an iLO RJ45 addon module to one of our servers because the manufacturer included the whole BMC but for some reason decided that the dedicated Ethernet port should be something you buy as an addon...</p>
<p>On the other end of the spectrum, several of our switches have RJ45 <em>and micro-USB</em> for the serial console, with a PL2303 already inside the switch.)</p>
</div>2024-01-29T06:14:41ZBy Chris Siebenmann on /blog/sysadmin/VMServerQuiteUsefultag:CSpace:blog/sysadmin/VMServerQuiteUseful:72a3381757fe588a8da260108e6c6dcf082906caChris Siebenmann<div class="wikitext"><p>If we were to do a full scale VM environment, perhaps Proxmox or something
would make sense. In this case, the VM environment is being used as a way
to get low-effort test (virtual) servers, which means that the combination
of Ubuntu (which we run on regular servers) and libvirt (which I run on my
Fedora desktops) is the best low-effort approach; it requires us to learn
and operate basically nothing new.</p>
</div>2024-01-17T16:01:00ZBy Chris Siebenmann on /blog/sysadmin/OurWifiStatusMonitoringtag:CSpace:blog/sysadmin/OurWifiStatusMonitoring:a3d6939e773b680889ee29a8e1d7953a55eb3637Chris Siebenmann<div class="wikitext"><p>We have our own SSID for a combination of history (<a href="https://support.cs.toronto.edu/">we</a> started doing wireless fairly early)
and <a href="https://utcc.utoronto.ca/~cks/space/blog/sysadmin/WhyMultipleWirelessNetworks">having our own wireless network lets us support things that aren't
possible with the university-wide network</a>.</p>
<p>We are our own departmental networking team (we do everything around here
from (some) physical wiring on up), but most of the hardware and network
is actually run by the university wireless network people. I'm not sure
how much visibility they have into their (enterprise) wireless access
points and what's going on on them, but for structural reasons it would
be difficult for them to fully check for problems in our wifi network;
for end to end checks they'd need devices on the network (to verify,
for example, that they could get DHCP leases).</p>
<p>(Since the university network people run the hardware, we get things
fixed by getting in touch with them about the problem.)</p>
</div>2024-01-16T23:06:16ZBy a joe on /blog/sysadmin/OurWifiStatusMonitoringtag:CSpace:blog/sysadmin/OurWifiStatusMonitoring:88bfb4df30d78f4f2eacb241429ee7a90a91aa01a joe<div class="wikitext"><p>Interesting solution!</p>
<p>Curious, what prevents your "network" team/dept from monitoring your wifi network? Are you required to fix the problem yourself?</p>
<p>Why do you need your own SSID? Is this a trusted network that provides additional network access?</p>
</div>2024-01-16T22:17:31ZBy Chris Siebenmann on /blog/sysadmin/MetricsHowFarBackDependstag:CSpace:blog/sysadmin/MetricsHowFarBackDepends:4bff6636e5bc65e82bbe522c2474dc97990d7310Chris Siebenmann<div class="wikitext"><p>Our retention is done with beefy disks, which is easy for us because
we're running Prometheus on a physical server. We started with two
mirrored 4 TB HDDs and moved to two mirrored 20 TB HDDs when the 4 TB
ones got full enough. We haven't considered running multiple servers at
different sampling intervals, partly because that would mean finding a
second server and a second set of data disks for it.</p>
<p>(Prometheus also can't scrape things too slowly; you need to scrape
faster than every five minutes to keep samples from going stale.)</p>
</div>2024-01-10T17:21:52ZBy anarcat on /blog/sysadmin/MetricsHowFarBackDependstag:CSpace:blog/sysadmin/MetricsHowFarBackDepends:34e9320645677b6534f1833d92b03a4f32e2a22fanarcat<div class="wikitext"><p>I wonder: how do you actually implement retention now? beefy disks?</p>
<p>We keep only a year of samples here, and it's growing quite a bit. We only have two VMs for prometheus right now, so we could dedicate more resources to it, but we're at 100G right now. I guess we could just shift that by an order of magnitude and hit a terabyte, but it seems rather stiff for metrics we're rarely going to look at, especially in the level of details our current scrape interval provides (1 minute).</p>
<p>My current thinking is that we might have a primary Prometheus server that handles alerting and short-term, high frequency scraping (say 15-30s) and a secondary server that would extract those metrics with a much higher frequency (say 5-10m) and keep those potentially eternally... have you considered such a setup?</p>
</div>2024-01-10T15:52:16ZBy Barry on /blog/sysadmin/TenYearsNotLongEnoughtag:CSpace:blog/sysadmin/TenYearsNotLongEnough:d69144df4feceeaf26694781a83f2c818e416ce5Barry<div class="wikitext"><p>Where I'm working, many applications were set up with undocumented certificates that expire within a small multiple of five years, long enough for anyone in the know to have moved on and causing unnecessary outages. It's infuriating.</p>
</div>2024-01-07T22:18:26ZBy Elbert Boone on /blog/sysadmin/TenYearsNotLongEnoughtag:CSpace:blog/sysadmin/TenYearsNotLongEnough:8bd7ae89562efa49e229370577654c0bead1347aElbert Boone<div class="wikitext"><p>Verisimilitude, while the notBefore and notAfter dates do seem to be required, <a href="https://web.archive.org/web/20220709045951if_/https://www.ietf.org/rfc/rfc5280.html#section-4.1.2.5">RFC 5280</a> says: </p>
<blockquote><p>Both notBefore and notAfter may be encoded as UTCTime or GeneralizedTime.
CAs conforming to this profile MUST always encode certificate validity dates through the year 2049 as UTCTime; certificate validity dates in 2050 or later MUST be encoded as GeneralizedTime. Conforming applications MUST be able to process validity dates that are encoded in either UTCTime or GeneralizedTime.
To indicate that a certificate has no well-defined expiration date, the notAfter SHOULD be assigned the GeneralizedTime value of 99991231235959Z.</p>
</blockquote>
<p>However, it doesn't mandate that anyone avoid using that as a real (calculated) expiry date, nor that it be handled specially on parsing—thus contributing to the "Y10K problem" <a href="https://web.archive.org/web/20230528100546if_/https://cr.yp.to/y2k.html">and annoying Daniel J. Bernstein</a>.</p>
<p>While most people worry about 2038, RFC 5280 worries about 2050. Why? 4.1.2.5.1 explains it:</p>
<blockquote><p>The universal time type, UTCTime, is a standard ASN.1 type intended for representation of dates and time. UTCTime specifies the year through the two low-order digits ... (i.e., times are YYMMDDHHMMSSZ) ... Conforming systems MUST interpret the year field (YY) as follows:
Where YY is greater than or equal to 50, the year SHALL be interpreted as 19YY; and
Where YY is less than 50, the year SHALL be interpreted as 20YY.</p>
</blockquote>
<p>Yuck. Anyway, it's not hard to find interfaces such as <a href="https://manpages.debian.org/testing/libtls-dev/tls_peer_cert_notbefore.3.en.html">tls_peer_cert_notafter</a> that use time_t and might therefore be affected by year-2038 problems. And I don't see any OpenSSL option to force the above "9999" expiry, which makes me doubt people are using or testing it. So I'm not optimistic about post-2038 dates, but it might be worth passing "-days 99999", seeing if that works, and using Ewen's suggestion if not.</p>
</div>2024-01-07T15:38:34ZBy Verisimilitude on /blog/sysadmin/AreNegativeAccessRulesNeededtag:CSpace:blog/sysadmin/AreNegativeAccessRulesNeeded:e96671a8cc471c63df749e7bcc02008f494f8796Verisimilitudehttp://verisimilitudes.net<div class="wikitext"><p>It's a simple matter of aesthetics that a whitelist should always have a blacklist. It's trivial to think of rules that are simple to specify with one and not the other. It's obvious.</p>
<blockquote><p>And I suspect that you can always mechanically translate a set of positive and negative rules into a set of positive only rules, although possibly a quite verbose one.</p>
</blockquote>
<p>Yes. They're just different ways of specifying whether something be allowed or not. For the IP address example, one can imagine a table with a bit for every address or some larger unit, and it's clear that whitelists and blacklists have similar effects on the table's values.</p>
</div>2024-01-07T04:19:59ZBy Verisimilitude on /blog/sysadmin/TenYearsNotLongEnoughtag:CSpace:blog/sysadmin/TenYearsNotLongEnough:843501f9b632022418039ce917c0faceafd468d7Verisimilitudehttp://verisimilitudes.net<div class="wikitext"><p>This is an argument against a mandatory limit at all, and against TLS because it probably doesn't make that an option, does it?</p>
</div>2024-01-07T04:07:11ZBy David Magda on /blog/sysadmin/VMServerQuiteUsefultag:CSpace:blog/sysadmin/VMServerQuiteUseful:e2649473233fe16846c1b6b730b4c4bad37c23c2David Magdahttp://www.magda.ca<div class="wikitext"><p>Instead of rolling your own 'VM host', DIY-style, I would recommend looking at Proxmox. It uses Debian as a base and adds a layer of software that makes management easier (one can still access the CLI, and there are CLI utilities for the software). It's open source, but has "enterprise" support available, but can run with full functionality without it.</p>
<p>You can have stand-alone server(s), or link multiple ones together in a cluster (with live migration available, even if you use local storage (and not something like Ceph)).</p>
</div>2024-01-06T13:11:40ZBy Ewen McNeill on /blog/sysadmin/TenYearsNotLongEnoughtag:CSpace:blog/sysadmin/TenYearsNotLongEnough:fbf7532542e326b0443df432a20718029772b0a7Ewen McNeill<div class="wikitext"><p>Last time I dealt with this, I ended up deciding to make the replacement expire in 2037 (for the 2038 problem). It’s not <em>that</em> much longer than 10 years, but it seems very likely a bunch of things will be refreshed in anticipation of 2038 rollover issues. So it seemed the expiry time that was least likely to be an <em>unprepared</em> surprise.</p>
<p>But other than that I agree with you: either you want “max 6 months” and automated rollover, or you want multiple decades (and only rollover for breach / crypto algorithm change). In between is just asking for recurring surprises. Even manual 1 year rollover seems very likely to be a periodic surprise, especially if The One Person that was looking after it is no longer around.</p>
<p>Ewen</p>
</div>2024-01-04T06:41:25ZBy Marcos Dione on /blog/sysadmin/PrometheusBlackboxHTTPDurationstag:CSpace:blog/sysadmin/PrometheusBlackboxHTTPDurations:20e1d64f02e07262d5065a76190f87f81e8fbcffMarcos Dionehttps://www.grulic.org.ar/~mdione/glob/<div class="wikitext"><p>I wrote about monitoring Apache with custom logs and the grok exporter here:</p>
<p><a href="https://www.grulic.org.ar/~mdione/glob/posts/monitoring-apache-with-prometheus-and-grafana/">https://www.grulic.org.ar/~mdione/glob/posts/monitoring-apache-with-prometheus-and-grafana/</a></p>
<p>Corrections welcome!</p>
</div>2023-12-29T11:58:35ZBy Ivan on /blog/sysadmin/AreNegativeAccessRulesNeededtag:CSpace:blog/sysadmin/AreNegativeAccessRulesNeeded:015e2ab6f1f3f3c91d428f458100bc30dbd401bcIvan<div class="wikitext"><p>The force field should fail open if the control panel is shot from the inside, but fail closed if the control panel is shot from the outside. It's in the Evil Overlord List.</p>
<p>What happens if both control panels are shot is undefined behaviour and can be made into a major plot point later in the series.</p>
</div>2023-12-26T07:55:58ZBy Yildo on /blog/sysadmin/AreNegativeAccessRulesNeededtag:CSpace:blog/sysadmin/AreNegativeAccessRulesNeeded:acb83d8e6bccd759f292dcdeeeb7d41ae250568dYildohttps://eozygodon.com/yildo<div class="wikitext"><p>I think it's a case of fail open or fail closed. Do you want the forcefield to drop or stay up if someone shoots the control panel with a blaster? Failing closed is more secure. Failing open is more user friendly and easier to troubleshoot</p>
</div>2023-12-25T21:58:35ZBy Chris Siebenmann on /blog/sysadmin/OIDCThreeEmailAddressestag:CSpace:blog/sysadmin/OIDCThreeEmailAddresses:f73dbf89531e54f22fdf164e05b1f3c01e56816aChris Siebenmann<div class="wikitext"><p>WebFinger isn't normally part of the OIDC IdP, since it's used for
discovery of all sorts of things (many of them not at all related to
OIDC), so I think its lack of a separate account identifier is sensible.
You query WebFinger for 'information about this email' and it gives
you back something that assures you 'this is information about this
email'. The OIDC IdP itself normally returns some separate identifier
for the account as well as the account's email in the OIDC authentication
information, although whether any OIDC application uses that is another
question.</p>
</div>2023-12-18T14:40:43ZFrom 193.219.181.219 on /blog/sysadmin/OIDCThreeEmailAddressestag:CSpace:blog/sysadmin/OIDCThreeEmailAddresses:134610d90f88b9fb6f80f87acb7e76f071cec1f1From 193.219.181.219<div class="wikitext"><blockquote><p>Some or many OIDC applications will expect the 'subject' field of this JSON to match the email address that the person entered, so they can be sure they're getting accurate OIDC IdP information for this account.</p>
</blockquote>
<p>This is kind of surprising as a requirement; I would have expected the IdP to always be able to return a different account name that could then act as a permanent unchanging identifier. (Like how in traditional OpenID, version 2.x added the ability to log in using a generic URL like <code>http://launchpad.net</code> and the IdP would indicate <code>http://login.launchpad.net/+id/abcdef</code> to the webapp, or how in SAML2 you have the <code>subject-id</code> attribute...)</p>
</div>2023-12-18T10:38:05ZBy Martin Czygan on /blog/sysadmin/GrafanaLokiNoChunkCompactiontag:CSpace:blog/sysadmin/GrafanaLokiNoChunkCompaction:309376c1d7f8caab121b091427f0093d638bcbccMartin Czyganhttps://github.com/miku<div class="wikitext"><p>With loki schema v12 (as per <a href="https://github.com/grafana/loki/releases/tag/v2.5.0">loki v2.5.0</a>), chunk files go into subdirectories (for filesystem store), which mitigates the problem of too many files in a single directory a bit.</p>
<p>It is possible to have data in two schema formats side by side: <a href="https://grafana.com/docs/loki/latest/operations/storage/schema/">https://grafana.com/docs/loki/latest/operations/storage/schema/</a></p>
</div>2023-12-15T13:10:55ZBy Sotiris Tsimbonis on /blog/sysadmin/UsingBindNowForResolverstag:CSpace:blog/sysadmin/UsingBindNowForResolvers:b8df29174c67c910b1fdcf8dfe965a4678075f1dSotiris Tsimbonishttps://stsimb.irc.gr/<div class="wikitext"><p>We have been running bind since the beginning of time (bind 4 in the 90s) for both recursive and authoritative purposes, with multiple split views. After many many years (around 2012) we switched to unbound for recursive queries, and this worked out pretty well (and additionally we easily enabled dnssec validation by default, when it was difficult to do in bind).</p>
<p>But recently, when we had to tackle a situation that required split dns in our recursors, we opted to use bind again. We know that recent unbound versions offer this sort of functionality nowadays, but we felt exactly the same way you felt about the secondary feature (were comfortable with bind and didn't want our head to hurt).</p>
</div>2023-12-14T18:30:26ZBy Chris Siebenmann on /blog/sysadmin/UsingBindNowForResolverstag:CSpace:blog/sysadmin/UsingBindNowForResolvers:968db7f31463c320b52af60eca6c966811935470Chris Siebenmann<div class="wikitext"><p>We're locked into a requirement for split-horizon DNS partly because
various machines have different internal and external IP addresses.
Internally they're spread across various RFC 1918 subnets, while
externally they all have addresses in one public /24 (currently) that we
use to make them accessible to the outside world through (firewall) NAT.</p>
</div>2023-12-14T01:01:48ZBy Emory O on /blog/sysadmin/UsingBindNowForResolverstag:CSpace:blog/sysadmin/UsingBindNowForResolvers:b838de801bc5f4a3f216dcd9c81cf5afde9fa4a7Emory O<div class="wikitext"><p>Personally, I eventually found split-horizon DNS to be too much of a pain in the ass to justify continuing with it. Especially once DNSSEC got involved. In my opinion, it's better to make each "horizon" its own delegated zone, possibly with its own DNS servers that don't respond to external queries.</p>
<p>In other words, don't have a public www.example.net record while trying to pretend that foo.example.net exists internally but is NXDOMAIN externally. Instead, make the latter foo.internal.example.net, where "internal" is a delegated zone. That's my advice for new zones, anyway; switching a large existing zone could involve an unreasonable amount of work.</p>
</div>2023-12-13T18:18:41ZBy Jelle on /blog/sysadmin/UsingBindNowForResolverstag:CSpace:blog/sysadmin/UsingBindNowForResolvers:352edf36559441b9b48d9ef647304dc98da6f0c2Jelle<div class="wikitext"><p>It might already have been suggested. But you might want to look into putting dnsdist in front of your resolvers.</p>
<p>The configuration takes some getting used to but it allows you to do some interesting things like failover and advanced rate limiting.</p>
</div>2023-12-13T07:26:58ZBy Chris Siebenmann on /blog/sysadmin/CentrexToVoIPSysadminViewtag:CSpace:blog/sysadmin/CentrexToVoIPSysadminView:dd40a4946e8880131e32adc4f7fff9b3291474f7Chris Siebenmann<div class="wikitext"><p>One part of the rollout apparently going without problems is that our
networks were already 1G with a 10G core, and my impression is that VoIP
isn't too high bandwidth. However, another part is probably that we don't
have all that many phones and people mostly don't use them. If we have
a significant number of people spending a significant time on VoIP phones
all at once, I don't know what the bandwidth impact would have been.</p>
<p>(Our network is non-PoE so all of the physical phones had their own
power. Reading about it now I see that PoE is up to 1G+ speeds, too,
although we'd have had to buy new switches to get that; all of our PoE
capable switches are old ones that have 100 Mbit data ports and maybe
a 1G uplink.)</p>
</div>2023-12-05T16:21:24ZBy Chris Siebenmann on /blog/sysadmin/SimpleRemoteURLServertag:CSpace:blog/sysadmin/SimpleRemoteURLServer:16fc00861ed571d2cefc1f100de93ae3a3841d22Chris Siebenmann<div class="wikitext"><p>I don't think there's inspiration in either direction; I didn't know
about <a href="https://github.com/superbrothers/opener">opener</a> before now, and
since it's much older than my take, it clearly wasn't inspired by me.
The approach used by opener has the advantage that you can forward one
local daemon to multiple destinations (and it can use nc because it uses
a conventional filesystem based Unix socket), but I think my version
has simpler cleanup if (or when) the connection goes away.</p>
</div>2023-12-04T15:01:21ZBy Ian Z on /blog/sysadmin/SimpleRemoteURLServertag:CSpace:blog/sysadmin/SimpleRemoteURLServer:7303c4e2a622afb420f3dc47ba5d61e15394bf23Ian Z<div class="wikitext"><p>I need something like this myself now, and the lovely Duck found:</p>
<p><a href="https://github.com/superbrothers/opener">https://github.com/superbrothers/opener</a></p>
<p>which seems remarkably similar to your hack! Was there any actual inspiration either way?</p>
</div>2023-12-04T03:37:19ZBy pkern on /blog/sysadmin/CentrexToVoIPSysadminViewtag:CSpace:blog/sysadmin/CentrexToVoIPSysadminView:53c08a1c60e56663a9453d8d9728bcef8a39ce19pkern<div class="wikitext"><p>About Android: you've probably already noticed but it is possible to make use of an Android phone without signing it in to a Google account. But it takes extra effort -- eg. disabling GooglePlayStore among other built-in apps; finding alternatives to default Google apps and side-loading all those app APKs; installing apps from F-Droid, etc.</p>
</div>2023-12-01T17:28:44ZBy sapphirepaw on /blog/sysadmin/CentrexToVoIPSysadminViewtag:CSpace:blog/sysadmin/CentrexToVoIPSysadminView:84d443f4176dd6f9ba69a0f4751e1375df5af2cbsapphirepawhttps://www.sapphirepaw.org/<div class="wikitext"><p>Sounds like your VoIP rollout went well. I got to watch a PBX conversion from the sidelines once. The sales team promised "nothing will change," then the VoIP rollout overwhelmed the network. Core business and customer service were both disrupted.</p>
<p>I think they already had wiring to support physically segmenting the phones (and making them PoE), so they did that to favor robustness. But it was a rough week for them.</p>
</div>2023-12-01T14:55:51ZBy Chris Siebenmann on /blog/sysadmin/AmandaServerVsClientCompressiontag:CSpace:blog/sysadmin/AmandaServerVsClientCompression:9a00a52bf8990c2f6f3a1c45f0cbedb5cc8f6cf0Chris Siebenmann<div class="wikitext"><p>The tar (or whatever) that's doing the restore always runs on the
client and it's what scans ('searches') through the archive to restore
things. One way or another, it appears that tar has to read the entire
archive, which means the entire archive has to be sent to the client
for tar to read it. The difference is whether the archive is sent in
compressed form (with decompression happening on the client) or in
uncompressed form (with decompression happening on the server).</p>
</div>2023-11-19T22:03:50ZBy Barry on /blog/sysadmin/AmandaServerVsClientCompressiontag:CSpace:blog/sysadmin/AmandaServerVsClientCompression:69d5856349dbc722e87be8a5b632e33dedb679cbBarry<div class="wikitext"><blockquote><p>In an Amanda restore of only a few things from a compressed backup, Amanda will spend most of its time reading through your compressed archive</p>
</blockquote>
<blockquote><p>As a side effect we reduced the amount of data flowing over the network during both restores and backups, since we're now sending the compressed backup back and forth instead of the uncompressed one</p>
</blockquote>
<p>If searching the compressed archive for a few things to restore now takes place on the client, wouldn't that result in <em>more</em> network traffic?</p>
</div>2023-11-19T21:04:43ZBy vasi on /blog/sysadmin/AmandaReadsTarRestoresToEndtag:CSpace:blog/sysadmin/AmandaReadsTarRestoresToEnd:4b9f4c0e2a227272161237c6228b7e44ad37c4a7vasi<div class="wikitext"><p>There are some tools that can index tar files for use cases like this, such as my pixz (available in Debian main). Not sure how easy it would be to adapt to Amanda.</p>
</div>2023-11-13T04:02:53ZBy sapphirepaw on /blog/sysadmin/CustomizationSensibleLimitstag:CSpace:blog/sysadmin/CustomizationSensibleLimits:04940c4f4d72decc2d494eee644b30f0f2e50e25sapphirepawhttps://www.sapphirepaw.org/<div class="wikitext"><p>Upon reflection, I think there are definitely <em>types</em> of customization. Cosmetic changes are fine, and solving a problem is fine (our AWS instance build scripts, for instance.)</p>
<p>However, stuff like focus-follows-mouse or editing a program's keybindings just create friction when changing computers. (Which doesn't happen as much as it used to when I was in college, but I carry the scars nonetheless.)</p>
</div>2023-11-01T20:40:34ZBy dmv on /blog/sysadmin/CustomizationSensibleLimitstag:CSpace:blog/sysadmin/CustomizationSensibleLimits:57e5b7a71c4dc4aec5c1af77d6a1b4328adb453fdmv<div class="wikitext"><p>Long time listener, first time caller.</p>
<p>Two things.</p>
<p>First is a broader issue that you tackled in the post, but you did so from a slightly different angle than what came into my head as I started reading. The issue is that you encounter this "sensible limits" question both in your role as sysadmin and in your role as human being wandering the earth. It seems to me to be the same question in form but not in substance. I think the applicability to (likely) only one of those roles of the "doing things that scale" stuff shows the distinction. For example, I can't even imagine why you'd want to take scaling into account when you're concerned with your own personal tools and workflows. What matters there just is (but not only) your own idiosyncrasies. Whether you agree with the strong version of that POV or a watered-down one, I suspect you'd at least agree that the "sensible limits" question is more contextual than not, so I don't think you're going to find any one particular answer.</p>
<p>The second issue is related to that first one, in a way, and you hit it in your last paragraph,but again (for me) glanced off it a little. You use "should" a bunch of times in relation to customizing your personal tools. I suspect you are looking at it in terms of diminishing returns and/or what is justifiable for time sunk into customization. But you acknowledge that, frankly, it's just fun for you, too. So that really makes the personal tools customization question almost different in kind than the sysadmin customization question, because now we're in recreational territory. If you have the time and space and energy to mess around on some weird Emacs Lisp issue, why <em>shouldn't</em> you spend your time on it? (Obviously, if it's compulsive, there are other factors to consider, like your own mental health and whatnot.) I don't think there's a calculus you can apply here, whereas there might be in the other, professional case. At some level, even that professional case will simply reduce, at the limit, to differences of experience and opinion among your fellow professionals, but you at least have fairly common ground for discussion. What amount of joy it's sensible to allow yourself to have isn't amenable to that, I think.</p>
<p>Long story, short: both cases involve technology and programming. Both cases involve using them as tools to get things done. Only one of those cases involves <em>playing</em> with your tools for the sheer fun of it (and let's not kid each other: hackery and Emacs crimes are fun to commit).</p>
</div>2023-10-29T13:30:44ZBy Aglossa on /blog/sysadmin/WireGuardEasySmartphoneSetuptag:CSpace:blog/sysadmin/WireGuardEasySmartphoneSetup:f36918e7aa9598df55eabf23c81cd3121c0d6a7bAglossa<div class="wikitext"><p>You should look at <a href="https://www.firezone.dev/">https://www.firezone.dev/</a></p>
</div>2023-10-19T19:15:15ZFrom 193.219.181.219 on /blog/sysadmin/WireGuardEasySmartphoneSetuptag:CSpace:blog/sysadmin/WireGuardEasySmartphoneSetup:cc8f2dc5d0bc1f4ad327ba2cf01cc0641b436222From 193.219.181.219<div class="wikitext"><blockquote><p>It wouldn't be too hard to build a 'WireGuard registration' web application</p>
</blockquote>
<p>I've done so in the past (took probably a day or two, with config downloads, multiple devices, and all) but ultimately decided against using it.</p>
<p>One downside of WireGuard is that the standard tools have no means to attach labels to clients, so if you run <code>wg show</code> you have no idea what users are active, so you end up needing to write your own wrapper for that. More importantly, since WireGuard doesn't integrate with external authentication in any way, locking a user account doesn't remove their WG access, so you need to write your own cronjob for that, too. In the end, it just wasn't worth it compared to running IKEv2 (built-in or strongSwan app) ocserv (Cisco Anyconnect app, even if it's somewhat of a license violation but that's what we historically use).</p>
<p>If we were to switch away from ocserv for the student/faculty VPN, it would be either IKEv2 or OpenVPN – I've already a working reverse-engineered implementation of the RPC API backend for OpenVPN Connect (the one where it automatically fetches configuration from a URL).</p>
</div>2023-10-19T05:56:18ZBy Erik Auerswald on /blog/sysadmin/FirewallNATTwoPlacestag:CSpace:blog/sysadmin/FirewallNATTwoPlaces:ee076eb8b5d48ae95bdfad1c3ef884c61aa2c645Erik Auerswaldhttps://www.unix-ag.uni-kl.de/~auerswal/<div class="wikitext"><p>Regarding</p>
<blockquote><p>[…]leaking RFC 1918 addresses out to the world[…]</p>
</blockquote>
<p>In order to prevent your network from being used to send traffic with spoofed source addresses, you should consider to only allow packets with your official IPv4 and IPv6 addresses to leave your network towards the Internet, e.g., using outbound ACLs on your edge routers.</p>
<p>This would automatically prevent leaking RFC1918 addresses.</p>
</div>2023-10-17T11:29:07ZBy A Human on /blog/sysadmin/SplittingDNSResolverstag:CSpace:blog/sysadmin/SplittingDNSResolvers:6584de474b1765f9d01671a78a9c9b1703dafe09A Human<div class="wikitext"><p>You really need to discover dnsdist (<a href="https://dnsdist.org/">https://dnsdist.org/</a>).</p>
<p>Most of the problems you complain about (e.g. resolv.conf stickyness) are fixed transparently and easily by dnsdist.</p>
</div>2023-10-15T11:39:20ZBy A Human on /blog/sysadmin/DNSResolversAndIPRatelimitstag:CSpace:blog/sysadmin/DNSResolversAndIPRatelimits:35e4483fd77def000551d0b9e912debf85f82649A Human<div class="wikitext"><p>I mean sure, you could do all that.</p>
<p>Or you could just use dnsdist (<a href="https://dnsdist.org/">https://dnsdist.org/</a>) which does all that and much, much more.</p>
</div>2023-10-15T11:36:48ZBy Chris Siebenmann on /blog/sysadmin/EmailToolsAffectMyBehaviortag:CSpace:blog/sysadmin/EmailToolsAffectMyBehavior:c8c9bd6b6b28723c81adbe7b050e988b88bcc480Chris Siebenmann<div class="wikitext"><p>In theory (N)MH has <a href="https://manpages.org/burst"><code>burst</code></a> to burst
apart digests into individual messages, but I've never used it so I
don't know how well it works in practice (especially today, where
people may be inventing their own digest formats left and right).</p>
</div>2023-10-10T13:36:59ZBy Silvan on /blog/sysadmin/EmailToolsAffectMyBehaviortag:CSpace:blog/sysadmin/EmailToolsAffectMyBehavior:a0680bcecfb2499aee5b9ad410cc09dcbec42b3dSilvanhttps://sillymon.ch<div class="wikitext"><p>If you enjoy using MH, maybe the more modern <a href="https://github.com/leahneukirchen/mblaze">https://github.com/leahneukirchen/mblaze</a> would be interesting to you as well.</p>
<p>It uses Maildir and handles mime quite well from what I can tell.</p>
</div>2023-10-10T09:47:12ZBy Yeechang Lee on /blog/sysadmin/EmailToolsAffectMyBehaviortag:CSpace:blog/sysadmin/EmailToolsAffectMyBehavior:fe68765802c62fd54a3f9f2a6eb9845d98a59de8Yeechang Lee<div class="wikitext"><p>I have used Emacs VM to read mail for almost three decades now, and until your comment today had never considered whether and how its behavior (similarly varying between just a list of messages, just one message, or two buffers showing both) has affected my reading habits. I do believe that <a href="https://news.ycombinator.com/item?id=31699432">using VM overall is a superpower for email handling</a>.</p>
<p>VM also has a threaded view, but I do not turn it on by default and do not often invoke it. I do, however, use a keyboard macro that enhances the built-in <code>vm-kill-thread-subtree</code>.</p>
<p>Does MH, or the various clients, support breaking digests? VM has that (sadly, not working with an increasing number of lists), and that is definitely useful.</p>
</div>2023-10-10T08:06:32ZBy Aristotle Pagaltzis on /blog/sysadmin/EmailToolsAffectMyBehaviortag:CSpace:blog/sysadmin/EmailToolsAffectMyBehavior:86f6fbc628a5088c42fabb60f76f694acdf8e578Aristotle Pagaltzishttp://plasmasturm.org/<div class="wikitext"><p>I just about remember my switch from Windows to Linux, almost 20 years ago. The two big main blockers were finding and getting used to how to edit text files and how to do mail. Getting mail set up was far more complicated than what I was used to on Windows where machine-local email is not a thing, so the MTA/MDA/MUA split just doesn’t exist, and every client can and must assume you have an upstream SMTP/POP/IMAP server that you can just configure in the settings and you’re good to go. But I had the fortune of being told to go with mutt as my MUA, and oh my goodness, it had threads. The GUI clients I was used to did not. It was a night and day difference in keeping on top of mailing lists. I really can’t overstate it. And it got even better <em>still</em> when I had the bright idea to throw out the inbox/outbox split that every client I had used before had, by having mutt Fcc my own mail to the inbox instead of a separate folder: then I could use threading for my non-list correspondence too – threading in all folders, for everything.</p>
<p>Once I got to that point, I’d go on and on about how great this was whenever there was a conversation to which it was relevant. And then GMail came out a year or two later and made something very similar available to everyone, without having to set up mail on a Unixoid system. Today all graphical clients have at least a conversation view for mail threads, if not full threading. So I long since stopped talking about it.</p>
<p>I’ll be surprised if you find the benefit of this change wearing off with the novelty. I expect it to stick.</p>
</div>2023-10-09T16:31:28ZBy Fazal Majid on /blog/sysadmin/EmailThreeForwardingFormatstag:CSpace:blog/sysadmin/EmailThreeForwardingFormats:8cc66bf8c83c6abe92e5bd95bb10bc07b0bf50f0Fazal Majidhttps://majid.info/<div class="wikitext"><p>You forgot the obnoxious TNEF format generated by Microsoft Outlook, that no other mail client knows how to decode, and needs specialized tools to parse, but is still sent by luddites wallowing in their Windows brackwater.</p>
</div>2023-10-08T11:34:59ZBy mg@fork.pl on /blog/sysadmin/IMAPSentFolderIssuestag:CSpace:blog/sysadmin/IMAPSentFolderIssues:24384d1cbf1cb78527cc4cdcd2b9a8ef335871eamg@fork.pl<div class="wikitext"><p>Trojita was the bright star on MUA sky, sad to see it dead. </p>
<p>Anyway - regarding main subject - I've seen thunderbird to submit message to SMTP multiple times because of IMAP timeout (TB shows messagebox about the problem and allows you to choose "retry" - which apparently restarts whole transaction with SMTP submissions as well)... I think I have to check this dovecot submission proxy - although I guess I wouldn't want to go through it for some system account, mass mailings etc. (which doesn't require Sent)</p>
</div>2023-10-04T12:43:37ZBy Mike on /blog/sysadmin/FailingAtTLSRootRolloverIItag:CSpace:blog/sysadmin/FailingAtTLSRootRolloverII:46022c50850c797824c8474847adeaf48eb7a3b4Mike<div class="wikitext"><p>Hey, thanks very much for the write up!</p>
</div>2023-10-03T12:53:25ZBy Chris Siebenmann on /blog/sysadmin/FailingAtTLSRootRollovertag:CSpace:blog/sysadmin/FailingAtTLSRootRollover:b1649c53a28a2054302c2f3811d900a4420e64b1Chris Siebenmann<div class="wikitext"><p>This is a good question, so I wrote up <a href="https://utcc.utoronto.ca/~cks/space/blog/sysadmin/FailingAtTLSRootRolloverII">what we did we
couldn't gracefully roll over our OpenVPN TLS root certificate</a>. The short version is that we wound replacing
the OpenVPN server with a new one under a new name (with a new TLS root
certificate). This wasn't at all transparent to people using the OpenVPN
server, but at that point it didn't look like anything would be.</p>
</div>2023-10-03T03:02:05ZBy Mike on /blog/sysadmin/FailingAtTLSRootRollovertag:CSpace:blog/sysadmin/FailingAtTLSRootRollover:89182ebad652757183fd75707eedaf4f5440601cMike<div class="wikitext"><p>Hi there, I am approaching the same situation and just wondering what you ended up doing ?</p>
</div>2023-10-02T19:09:49ZBy jonys on /blog/sysadmin/IMAPSentFolderIssuestag:CSpace:blog/sysadmin/IMAPSentFolderIssues:3f7cee103980c32188d9a136a8d2c472ff59652ejonys<div class="wikitext"><p>QtWebKit, not QtWebEngine, I got them mixed up. The latter is still supported, although it lags behind Chromium on updates. Oh well, that's the modern web.</p>
</div>2023-09-29T17:04:12ZBy jonys on /blog/sysadmin/IMAPSentFolderIssuestag:CSpace:blog/sysadmin/IMAPSentFolderIssues:ee5702b9f5215c4af0e2b713405d29522f73eae0jonys<div class="wikitext"><p>Yup, the Dovecot server (and some others) implements RFC 4468 (BURL), which feeds the message to SMTP right from the IMAP Sent folder, so only one copy is performed between the e-mail client and the two servers. See <a href="https://doc.dovecot.org/admin_manual/submission_server/">https://doc.dovecot.org/admin_manual/submission_server/</a></p>
<p>It doesn't avoid some of the issues mentioned, such as the lack of DKIM signatures, but at least it prevents the two copies from getting out of sync or one going through and the other staying stuck somewhere between the client and the server.</p>
<p>Unfortunately the only client I know of that has BURL capability was Trojitá, which is no longer developed and ceased being usable after QtWebEngine was deprecated.</p>
</div>2023-09-29T16:59:31ZFrom 193.219.181.219 on /blog/sysadmin/IMAPSentFolderIssuestag:CSpace:blog/sysadmin/IMAPSentFolderIssues:e017dbd9fa4763af4be5d9ae1e2b9575228bfb79From 193.219.181.219<div class="wikitext"><blockquote><p>(Even with the two services tightly coupled, people could also be in a network environment that lets them talk to our authenticated SMTP server but not to our IMAP server.)</p>
</blockquote>
<p>One possible workaround might be to let Dovecot handle the SMTP submission as well – it has a Submission proxy nowadays, which I believe can store the sent messages server-side before forwarding them to your actual SMTP server. (Although that means you'd end up with two copies...)</p>
</div>2023-09-29T05:26:55ZBy David Magda on /blog/sysadmin/SplittingDNSResolverstag:CSpace:blog/sysadmin/SplittingDNSResolvers:4ea0b979e78d3743de3f120b9f059af8d58ac79fDavid Magdahttp://www.magda.ca/<div class="wikitext"><p>Is using the (per-IP) rating limit feature of either BIND (<code>named</code>) or Unbound a possible solution to certain system DoSing the service?</p>
</div>2023-09-27T21:40:53ZFrom 193.219.181.219 on /blog/sysadmin/UnixShellsNoMoreAccessControltag:CSpace:blog/sysadmin/UnixShellsNoMoreAccessControl:edf58220849a20c5befc6dc0ecc7b3c504d399d6From 193.219.181.219<div class="wikitext"><blockquote><p>and I believe that Samba doesn't normally (it has its own separate database, which is normally synchronized to your Unix passwords).</p>
</blockquote>
<p>It still has the ability to call PAM for <em>authorization</em>, though (just like e.g. sshd still does call PAM for the authorization functions even if you log in via public-key and bypass PAM authentication).</p>
</div>2023-09-20T03:58:47ZBy Aristotle Pagaltzis on /blog/sysadmin/UnixShellsNoMoreAccessControltag:CSpace:blog/sysadmin/UnixShellsNoMoreAccessControl:1678828f7f8316a9cefd8be152c21b7dc2574cf5Aristotle Pagaltzishttp://plasmasturm.org/<div class="wikitext"><p>Apache trying to interpret the shell field seems like a huge conflation of concerns and the wrong kind of magic to me. Even <code>login(1)</code> doesn’t actually attempt that: all it does it <em>run</em> the specified executable. It is the executable which then implements some policy – not <code>login(1)</code>. Of course this is not an option for Apache; I don’t think it’s feasible for it to somehow defer to the specified executable in the same way. “Then it should assign meaning based on the path name itself” seems entirely the wrong reaction, though.</p>
<p>What is the requirement lurking in the background here?</p>
<p>A UID is good for two things in Unix: owning files and owning processes.</p>
<p>Is the idea that you want an easy way to say “this UID should never be allowed to own a process”? That seems a reasonable desire, just the shell <code>passwd</code> field not the right layer to do it with.</p>
<p>Is the idea that the UID may own processes but you want to be able to restrict it to fixed set of what kind of process? A more complicated want, but again: reasonable but not for <code>passwd</code>.</p>
<p>Or is the idea that the UID may own arbitrary processes, but you just don’t want it to be able to log in? In that case I’m not sure I can picture why you’d want that. What makes logging in special? A shell is just a process, and can be used to create other processes; if it’s OK for the UID to own other processes that may do whatever, what makes shell especially undesirable? What even constitutes “logging in” really?</p>
<p>(As a sidebar, running through all the cases, but for the other thing a UID is good for: being able to limit a UID to owning processes but not files seems an interesting idea that might possibly be useful but I can’t actually imagine how. More obviously useful would be to be able to restrict file ownership to a certain set of files; on one hand, d’uh, this already exists and is called permissions; OTOH in the existing mechanism the file is the object on which access control is specified, and it might be potentially be useful and allow for better clarity to be able to specify policy based on the UID (“in the FS root only these accounts have access to create files” (with implicit MAC based on what is then created there) vs “this UID is prohibited from owning files anywhere but this directory” (which would apply regardless what sort of things anyone puts anywhere else in the FS)). For the login special case restriction, I can’t even think of what the file ownership parallel could be.)</p>
</div>2023-09-19T03:50:05ZBy Chris Siebenmann on /blog/sysadmin/UnixShellsNoMoreAccessControltag:CSpace:blog/sysadmin/UnixShellsNoMoreAccessControl:46d7a24265b69d522857a64342257998dc71d79bChris Siebenmann<div class="wikitext"><p>Unfortunately, a variety of common authentication systems don't use
PAM. Apache htpasswd definitely doesn't, and I believe that Samba doesn't
normally (it has its own separate database, which is normally synchronized
to your Unix passwords). I'm not sure if common LDAP servers use PAM
when you authenticate through them, but there you have the problem that
often what matters is the specific LDAP client the user is authenticating
to, not the LDAP server. And various things that authenticate via LDAP
aren't even thinking about Unix users, so they definitely don't use PAM;
they just consider LDAP an authoritative source of users, maybe user
attributes, and authentication.</p>
</div>2023-09-18T18:24:39ZFrom 141.52.248.1 on /blog/sysadmin/UnixShellsNoMoreAccessControltag:CSpace:blog/sysadmin/UnixShellsNoMoreAccessControl:fbd20d8db80eb9625dbac6f22c5a13224321194dFrom 141.52.248.1<div class="wikitext"><p>And additionally if you have PAM you can use /etc/security/access.conf to limit login to a specific user group or lock out other users/groups.</p>
</div>2023-09-18T13:20:53ZFrom 193.219.181.219 on /blog/sysadmin/UnixShellsNoMoreAccessControltag:CSpace:blog/sysadmin/UnixShellsNoMoreAccessControl:ffffd91c29b906c4b6e3eaef915112ae8d110f07From 193.219.181.219<div class="wikitext"><p>You still have a "close enough" equivalent – if your accounts are stored in /etc/passwd, then <em>hopefully</em> all services using them do so through PAM and not through direct file access. So in that case you'll still have PAM honor the "lockout" via <code>chage -l</code> (or wherever it was) due to account/pam_unix enforcing the check, and you should still have it honor an invalid shell due to pam_shells.so enforcing that.</p>
<p>(And if the services directly access /etc/shadow, Slackware style, I would honestly say that's a self-inflicted problem...)</p>
</div>2023-09-18T08:35:49ZBy Chris Siebenmann on /blog/sysadmin/SimpleRemoteURLServertag:CSpace:blog/sysadmin/SimpleRemoteURLServer:01dafd19c4e52eb1b954ffd0495449f45d3eed59Chris Siebenmann<div class="wikitext"><p>Oops, I left that detail out. This is for use with programs (like GUI mail
readers) that have URLs and want to open them in 'my browser', and thus
have a 'open URL' command or mouse click or whatever. Generally these
route through <a href="https://utcc.utoronto.ca/~cks/space/blog/linux/XdgOpenWhichBrowser">xdg-open</a>, where I've
arranged for the special client script to be hooked in.</p>
<p>So the flow is I'm reading email in exmh on our login server (with
X forwarded over SSH), I click a URL, exmh runs (my) xdg-open,
that runs the a script that sends the URL to the 'client' side, the
'client' side sends it to the server side, and the server side opens
the URL in my desktop Firefox. Previously, xdg-open would have run
a script that used direct X property based Firefox remote control
to open the URL on my desktop Firefox (<a href="https://github.com/siebenmann/ffox-remote">via a little program</a>), but that X property
manipulation is now too slow for me.</p>
</div>2023-09-13T19:39:43ZBy Ian Z aka nonbrowser on /blog/sysadmin/SimpleRemoteURLServertag:CSpace:blog/sysadmin/SimpleRemoteURLServer:8b8828f0e5d342723e3406c5ebed29fb080865d4Ian Z aka nonbrowser<div class="wikitext"><p>Isn't there a missing link here? How the the "client program" (running on your terminal server machine) know where to snarf the URL from?</p>
</div>2023-09-13T19:21:47ZBy Simon on /blog/sysadmin/AlertsOnUserStoriesIssuestag:CSpace:blog/sysadmin/AlertsOnUserStoriesIssues:8a70fb57cd25d5ad72cc2d2a0762639348e85a6cSimon<div class="wikitext"><p>Don't get me wrong, I'm sure you know what works well for you. I'm not arguing you should change your monitoring. So this is just a comment about two details you mentioned:</p>
<blockquote><p>[...] This often describes web applications [...]</p>
</blockquote>
<p>I'm surprised that you take a web app as an example. Becuase for them it's pretty tricky to do proper end to end test for their user interfaces. Consider for example a mail services. Testing IMAP/POP/SMTP with some example user is rather easy to implement. On the other hand testing a webmailer is much more work (assuming you actual test the user interface not the API the UI is using). Especially if you expect significant churn on the user interface as you mention.</p>
<blockquote><p>The coverage matrix for SSH login hosts times separate fileservers gets pretty large, [...]</p>
</blockquote>
<p>So what? It's the same test just invoked with different variables. It's the same as for simpler "ICMP ping" or "SSH port is open" tests you mention.</p>
</div>2023-09-05T03:10:34ZBy Joseph on /blog/sysadmin/AlertsOnUserStoriesIssuestag:CSpace:blog/sysadmin/AlertsOnUserStoriesIssues:88184be3a34e2583d5463796b5c1b36d63e5c1d4Joseph<div class="wikitext"><p>So i think of alerting on symptoms as a way to reduce the risk of false positives. In other words this:</p>
<p><a href="https://paulbellamy.com/2017/08/symptoms-not-causes">https://paulbellamy.com/2017/08/symptoms-not-causes</a></p>
<p>Alerting on causes is easy but all too frequently an engineer doesn't put the appropriate level of thinking in to when an alert could trigger when everything is fine. </p>
<p>I think of this as a principle or guideline, pragmatism matters. If i am managing a server, a disk running out of space has always been a bad sign.
On the other hand cpu utilization alerts are a mass producer of false positive alerts</p>
</div>2023-09-02T13:16:35ZBy Verisimilitude on /blog/sysadmin/MaybeNotAllowLoginstag:CSpace:blog/sysadmin/MaybeNotAllowLogins:864b4dc453d4d2aa3dd662f707d10c06081640acVerisimilitudehttp://verisimilitudes.net<div class="wikitext"><p>What's the point of having this Swiss cheese security model at all? I'm not aware of a single UNIX system that has ever properly protected users from each other, so why use UNIX at all? It would be more efficient obviously to have less code doing more things, without this dead weight. At what point will the broken abstractions pile up and be replaced, finally?</p>
</div>2023-08-18T21:58:54ZBy andyjpb on /blog/sysadmin/LogMonitoringTarpittag:CSpace:blog/sysadmin/LogMonitoringTarpit:55c63a708dab30990aee7709f1ec8ba8be7d9a1eandyjpbhttp://www.ashurst.eu.org/<div class="wikitext"><p>I've been using Marcus J. Ranum's "Artificial Ignorance" method ( <a href="https://www.ranum.com/security/computer_security/papers/ai/">https://www.ranum.com/security/computer_security/papers/ai/</a> ) on a small scale for a long time (wow! decades now! :-) ).</p>
<p>The idea is to write regexes for everything you're not interested leaving, by definition, only interesting things.</p>
<p>It's sometimes a pain to keep the regexes up-to-date as it requires a little discipline but perhaps no more than you have to do to set up any kind of log monitoring system?</p>
<p>One of the most painful (but most useful) is that the regexes tend to under-match rather than over-match which means acquiescing uninteresting things can take a couple of attempts. ...but perhaps that's a problem of matching logs with regex and it's better to have it fail in a way that under-matches something uninteresting rather than under-matches something interesting?</p>
<p>Have you had any experience with anything like this and do you think it'd work at the kinds of scales you operate at (both in terms of team members and machines)?</p>
</div>2023-08-10T16:01:43ZBy phein4242 on /blog/sysadmin/LogMonitoringTarpittag:CSpace:blog/sysadmin/LogMonitoringTarpit:ac438f8d1d840b00d6da8b05bddacdcaaee6699ephein4242<div class="wikitext"><p>Hi Chris,</p>
<p>While in general I agree that logs can be a tarpit, they are also one of the most valuable sources of information during troubleshooting (topped by things like debug/verbose and call tracing/debuggers).</p>
<p>Thing is tho, theory aside, systems do need to be maintained, and that sometimes requires hacks. In this specific case, since you know beforehand that you rollout your services with autorestart, having a restart check is a crucial piece of insight into your application.</p>
<p>And given that this is about systemd units and logging, which has a well-defined api, implementing this check should not be much more then a log tailer/parser, one to three regexps, some control logic and something to send alerts (or reworked into a dedicated exporter).</p>
<p>On the more general level, yes, directly parsing all logging is useless (unless you do it programmatically). Knowing what to look for, and being able to transform that into a more rich context about ypur applications, is very valuable knowledge nevertheless. This is even mandatory if there are no exporters for the app you want to run.</p>
<p>Will parsing logs ever become painless? I highly doubt it, but structuree logs do make the pain a bit less :)</p>
</div>2023-08-08T18:20:25ZBy nolist_policy on /blog/sysadmin/LogMonitoringTarpittag:CSpace:blog/sysadmin/LogMonitoringTarpit:acab90e4cdb013a3da483ff73df1fff4afbcf926nolist_policy<div class="wikitext"><blockquote><p>The first problem with monitoring this way is that there's no guarantee that the message you're monitoring for won't change.</p>
</blockquote>
<p>This is a solved problem (at least for the linux kernel) with <a href="https://docs.kernel.org/core-api/printk-index.html">printk indexing</a>. So you can query the running kernel what log messages it can output. With that your monitoring tool can check when an expected message was changed and you get the messages in printf format so you can parse them if you want.</p>
</div>2023-08-08T11:48:14ZFrom 109.37.131.94 on /blog/sysadmin/LogMonitoringTarpittag:CSpace:blog/sysadmin/LogMonitoringTarpit:9172627b195b411eb8ead0f519d68e0b1ae2bbf1From 109.37.131.94<div class="wikitext"><p>I like to take the opposite approach, all logs are compared against a list of known patterns and lines that do not match are reported.</p>
<p>I've used logcheck in the past, and github.com/fd0/erpel nowadays</p>
</div>2023-08-08T10:12:58ZBy Verisimilitude on /blog/sysadmin/LogMonitoringTarpittag:CSpace:blog/sysadmin/LogMonitoringTarpit:1f3801c989f40cad2bf570dc42d8ce2260b01814Verisimilitudehttp://verisimilitudes.net<div class="wikitext"><blockquote><p>As it happens, one of my potentially unpopular views is that monitoring your logs is generally a tarpit that isn't worth it.</p>
</blockquote>
<p>Under UNIX, it certainly is. For some reason, IBM has no issues with usable logs, and their machines even call them for repairs.</p>
<blockquote><p>The fundamental problem with general log monitoring is that logs are functionally unstructured.</p>
</blockquote>
<p>I would like an elaboration on this part. Does this merely refer to the line-by-line nature of UNIX logs, or to the wide space of possible values? There certainly can be issues with this, but I see not why it should happen with a properly-written program. If a program produces useless logs, then it is worthless.</p>
<p>A proper way to handle it, and the only proper way to handle it I think, is to have a log be a machine-readable and ordered set of values, real values, and not lines of supposed text. There then is the required structure and a real constraint on nonsense.</p>
<blockquote><p>It's easy to say that we'll monitor for the Prometheus host agent crashing and systemd restarting it, but it's much harder to be sure that we have properly identified the complete collection of log messages that signal this happening.</p>
</blockquote>
<p>I find the very thought of such a poorly-documented program to be ridiculous.</p>
<blockquote><p>Remember, log messages are unstructured, which means it's hard to get a complete inventory of what to look for short of things like reading the (current) program source code to see what it logs.</p>
</blockquote>
<p>This is why a log should be entirely composed of records, which should be documented comprehensively.</p>
</div>2023-08-08T02:24:38ZBy Scott on /blog/sysadmin/AmandaWhereSpeedLimitsIItag:CSpace:blog/sysadmin/AmandaWhereSpeedLimitsII:be557cc3cafef3715376b12609a002a596939a72Scott<div class="wikitext"><p>It’s impossible to fully optimize. All you can do is reshuffle the deck and shift the bottleneck…
Check out Guerrilla Capacity Planning by Neil Gunther.</p>
</div>2023-07-29T02:08:03ZBy MikeP on /blog/sysadmin/SecurityScannersTwoViewstag:CSpace:blog/sysadmin/SecurityScannersTwoViews:d00adf9933b260de635131e817d7509b26817f94MikePhttps://snowcrash.ca<div class="wikitext"><p>Opk, I'd be a bit cautious, what Chris is talking about is not an intrusion detection system. I think you know that, but I wanted to make sure. :) There's different kinds of IDS as well, host-based and network-based. Same as with vulnerability scanners, there's a few open-source/free types, and a whole lot of commercial ones that will cost you anything from peanuts to your entire IT budget. So which is best depends on your environment, budget, which boxes bosses want ticked, and as you said, willingness/ability to look at logs and do anything based on what you find.</p>
<p>Rather than fill up Chris's blog here with discussion tangential to his actual post, feel free to drop me a line, mike at the domain in the URL on my post here. Always happy to talk network detection. :)</p>
</div>2023-07-27T19:23:34ZBy Anonymous on /blog/sysadmin/DoAnEndOfServiceWriteuptag:CSpace:blog/sysadmin/DoAnEndOfServiceWriteup:d29a6a835563251cc159fc05661a78c9deeb7ba9Anonymous<div class="wikitext"><p>At my former job, we used to maintain a formal document for every piece of software/infrastructure we were responsible for (operating/maintaining/upgrading), that described all the design and configuration choices we made for that thing, and the rationale/reasons we made for those choices. We kept it up to date as things changed throughout the entire lifetime of the thing. It worked pretty well for us.</p>
</div>2023-07-22T17:57:33ZBy Chris Siebenmann on /blog/sysadmin/SecurityScannersTwoViewstag:CSpace:blog/sysadmin/SecurityScannersTwoViews:d0e9066603e41ba34559bd5b61da1d859dd2dc8fChris Siebenmann<div class="wikitext"><p>I haven't named what we're using because I'm not all that familiar
with it and the alternatives; it was chosen and set up by a co-worker,
so I can't make any sort of informed comments on the options. What we
currently use is OpenVAS ('Greenbone OpenVAS', which Wikipedia tells me
descends through a long line of open source happenings from Nessus).
It's not perfect but it seems to work okay and I don't deal with it
much, except looking at the reports it generates.</p>
</div>2023-07-14T15:29:55ZBy Opk on /blog/sysadmin/SecurityScannersTwoViewstag:CSpace:blog/sysadmin/SecurityScannersTwoViews:f5bc3ecd3f87da7c088efc3265d0b94e42d362c3Opk<div class="wikitext"><p>You seem reluctant to name your Open Source security and vulnerability scanner. Does that betray that you're not especially enamored with the thing and don't want to give it publicity? At my work, managers seem to have recently come across the existence of intrusion detection systems and are wanting that installed so they can tick a box. While it'd be tempting to install literally anything and ignore the logs just to keep them happy, I am inquisitive to know what works well and what doesn't.</p>
</div>2023-07-14T10:21:41ZBy Fazal Majid on /blog/sysadmin/SecurityScannersTwoViewstag:CSpace:blog/sysadmin/SecurityScannersTwoViews:02b52dbda432be1b7cd4d55475a22803aaa67f30Fazal Majidhttps://majid.info/<div class="wikitext"><p>Vulnerability scanners primarily work at the application layer, catching misconfigurations and app-level vulnerabilities where most of the risk lies, and thus valuable even for orgs with locked-down networks.</p>
</div>2023-07-13T07:55:14ZBy valyala on /blog/sysadmin/GrafanaLokiSimpleNotRecommendedtag:CSpace:blog/sysadmin/GrafanaLokiSimpleNotRecommended:232e510d5525d87b354a7c50ad05cd4c7e6d5264valyalahttps://github.com/valyala<div class="wikitext"><p>Give a try to <a href="https://docs.victoriametrics.com/VictoriaLogs/">VictoriaLogs</a> then! It supports the same <a href="https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields">log stream concept</a> as Grafana Loki does, but it provides the following features missing in Loki:</p>
<p>- Easy to setup and operate. It is just a single relatively small statically linked binary without any external dependencies. It <a href="https://docs.victoriametrics.com/VictoriaLogs/QuickStart.html">needs close to zero configuration and zero tuning</a> for achieving optimal performance.</p>
<p>- It stores the ingested logs to local filesystem and compresses it by 50x-80x. This is 15x better than Elasticsearch and up to a few times better than Loki.</p>
<p>- It allows ingesting high-cadinality fields such as request_id, trace_id or ip without worrying about high memory usage or high number of open files.</p>
<p>- It provides query language with full-text search - <a href="https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html">LogsQL</a>, which is much easier to use than <a href="https://grafana.com/docs/loki/latest/logql/">LogQL from Loki</a>. It also performs typical search queries much faster than Loki.</p>
<p>- It <a href="https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line">provides an excellent integration with command-line tools</a>, which are traditionally used for log analysis - grep, head, less, awk, jq, sort, uniq, wc, cut, etc.</p>
<p>See <a href="https://docs.victoriametrics.com/VictoriaLogs/FAQ.html#how-does-victorialogs-work">these docs</a> on how does VictoriaLogs work.</p>
</div>2023-07-13T03:36:39ZBy valyala on /blog/sysadmin/PrometheusExportersFixedPortstag:CSpace:blog/sysadmin/PrometheusExportersFixedPorts:9896d4a31a5b8db8f3c09c7a5b8bf2ac503dbf8cvalyalahttps://github.com/valyala<div class="wikitext"><p>Did you ever try monitoring Kubernetes pods with Prometheus? Kubernetes service discovery puts pod IP into `instance` label by default. When pod is restarted, Kubernetes usually gives it a new IP. This changes the `instance` label, which, in turn, creates a set of new time series for all the metrics exported by the pod.</p>
<p>On top of this, Prometheus is usually configured to add a dozen of various labels to every metric scraped from the pod. These labels include container name, pod name, node name, kubernetes namespace and other pod-level labels set in deployment config. The pod name is usually an unique string generated by Kubernetes per each running pod. It changes on pod restart. The pod can migrate to another node during the restart. This leads to the change for the `node` label. This worsens the situation with time series churn, especially in cases when pod restart frequently (for instance, during new deployments or when horizontal pod auto-scaling (HPA) is enabled).</p>
<p>This is common problem for Prometheus monitoring in Kubernetes, which leads to e.g. "high churn rate issues", when Prometheus has to store and index large number of time series over time.</p>
<p>This is a sad story about the current state of Kubernetes monitoring with Prometheus :(</p>
</div>2023-07-13T02:43:51ZBy Simon on /blog/sysadmin/MyMovingURLsBetweenBrowserstag:CSpace:blog/sysadmin/MyMovingURLsBetweenBrowsers:d60e9a67684027fdf317c2b092c6b873d752424cSimon<div class="wikitext"><blockquote><p>I could speed up some of it by creating a fvwm keyboard binding that ran my 'get X selection and run JavaScript Firefox with it' cover script, but I'd still have to select the URL one way or another in my normal Firefox, and there's probably no good way to speed that up.</p>
</blockquote>
<p>Ctrl-L in Firefox focus and selects everything in the URL bar. I use it often to edit the URL. But based on a quick try locally it seems the selection doesn't actually ends up in X11's selection buffer. So that would still need 3 (instead of 2) shortcuts (Ctrl-L; Ctrl-C; <your open browser from clipboard>).</p>
</div>2023-07-12T21:40:59Z