Wandering Thoughts

2024-04-21

Thoughts on potentially realistic temperature trip limit for hardware

Today one of the machine rooms that we have network switches in experienced some kind of air conditioning issue. During the issue, one of our temperature monitors recorded a high temperature of 44.1 C (it normally sees the temperature as consistently below 20C). The internal temperatures of our network switches undoubtedly got much higher than that, seeing as the one that I can readily check currently reports an internal temperature of 41 C while our temperature monitor says the room temperature is just under 20 C. Despite likely reaching very high internal temperatures, this switch (and probably others) did not shut down to protect themselves.

It's not news to system administrators that when hardware has temperature limits at all, those limits are generally set absurdly high. We know from painful experience that our switches experience failures and other problems when they get sufficiently hot during AC issues such as this, but I don't think we've ever seen a switch (or a server) shut down because of too-high temperatures. I'm sure that some of them will power themselves off if cooked sufficiently, but by that point a lot of damage will already be done.

So hardware vendors should set realistic temperature limits and we're done, right? Well, maybe not so fast. First off, there's some evidence that what we think of as typical ambient and internal air temperatures are too conservative. Google says they run data centers at 80 F or up to 95 F, depending on where you look, although this is with Google's custom hardware instead of off the shelf servers. Second, excess temperature in general is usually an exercise in probabilities and probable lifetimes; often the hotter you run systems, the sooner they will fail (or become more likely to fail). This gives you a trade off between intended system lifetime and operating temperature, where the faster you expect to replace hardware (eg in N years) the hotter you can probably run it (because you don't care if it starts dying after N+1 instead of N+2 years, in either case it'll be replaced by then).

And on the third hand, hardware vendors probably don't want to try to make tables and charts that explain all of this and, more importantly, more or less promise certain results from running their hardware at certain temperatures. It's much simpler and safer to promise less and then leave it up to (large) customers to conduct their own experiments and come up with their own results.

Even if a hardware vendor took the potential risk of setting 'realistic' temperature limits on their hardware, either they might still be way too high for us, because we want to run our hardware much longer than the hardware vendor expects, or alternately they could be too conservative and low, because we would rather take a certain amount of risk to our hardware than have everything aggressively shut down in the face of air conditioning problems (that aren't yet what we consider too severe) and take us entirely off the air.

(And of course we haven't even considered modifying any firmware temperature limits on systems where we could potentially do that. We lack the necessary data to do anything sensible, so we just stick with whatever the vendor has set.)

TemperatureLimitTripThoughts written at 22:46:10; Add Comment

2024-04-15

Having IPv6 for public servers is almost always merely nice, not essential

Today on lobste.rs I saw a story about another 'shame people who don't have IPv6' website. People have made these sites before and they will make them again and as people in the comments note, it will have next to no effect. One of the reasons for that is a variant on how IPv6 has often had low user benefits.

As a practical matter, almost all servers that people want to be generally accessible need to be accessible via IPv4, because there are still a lot of places and people that are IPv4 only (including us, for various reasons). And as the inverse version of this, practically everyone needs to be able to talk to public servers that are IPv4 only, even if this requires 6-to-4 carrier grade NAT somewhere in the network. So people operating generally accessible public servers can almost never go IPv6 only, and since they have to have to be reachable through IPv4 and approximately everyone can talk to them over IPv4, adding IPv6 support has only a moderate benefit. Maybe some people can avoid going through carrier grade NAT; maybe some people will get to feel nicer.

(You can choose to operate a website or a service as IPv6 only, but in that case you're cutting off a potentially significant amount of your general audience. This is not something that many site and service operators are enthusiastic about. Being IPv4 only has much less effects on your audience. This is related to how IPv6 mostly benefits new people on the Internet, not incumbents. Of course IPv6 only can make sense if your target audience is narrower and you happen to know that they all have working IPv6.)

When you have a service feature that is merely nice instead of essential and which potentially involves some significant engineering complexity, is it any surprise that many organizations put it rather far down their priority list? In my view, it's basically what one would expect from both an engineering and business perspective.

(In my view the corollary to this is that general server side IPv6 adoption could be best helped by some combination of making it easier to add IPv6 and making it more useful to have IPv6. Unfortunately a whole raft of historical decisions make it hard to do much about the former, cf.)

IPv6PublicServersMerelyNice written at 22:22:36; Add Comment

2024-04-06

Solving the hairpin NAT problem with policy based routing and plain NAT

One use of Network Address Translation (NAT) is to let servers on your internal networks be reached by clients on the public internet. You publish public IP addresses for your servers in DNS, and then have your firewall translate those public IPs to their internal IPs as the traffic passes through. If you do this with straightforward NAT rules, someone on the same internal network as those servers may show up with a report that they can't talk to those public servers. This is because you've run into what I call the problem of 'triangular' NAT, where only part of the traffic is flowing through the firewall.

The ability to successfully NAT traffic to a machine that is actually on the same network is normally called hairpin NAT (after the hairpin turn packets make as they turn around to head out the same firewall interface they arrived on). Not every firewall likes hairpin NAT or makes it easy to set up, and even if you do set it up through cleverness, using hairpin NAT necessarily means that the server won't see the real client IP address; it will instead see some IP address associated with the firewall, as the firewall has to NAT the client IP to force the server's replies to flow back through it.

However, it recently struck me that there is another way to solve this problem, by using policy based routing. If you add an additional IP address on the server, set a routing policy so that outgoing traffic from that IP can never be sent to the local network but is always sent to the firewall, and then make that IP the internal IP that the firewall NATs to, you avoid the triangular NAT problem without the firewall having to change the client IP (which means that the internal server gets to see the true client IP for its logs or other purposes). This sort of routing policy is possible with at least some policy based routing frameworks, because at one point I accidentally did this on Linux.

(You almost certainly don't want to set up this routing policy for the internal server's primary IP address, the one it will use when making its own connections to machines. I'd expect various problems to come up.)

You still need a firewall that will send NAT'd packets back out the same interface they came in on. Generally, routers will do this for ordinary traffic, but firewall rules on routers may come with additional requirements. However, it should be possible on any routing firewall that can due full hairpin NAT, since that also requires sending packets back out the same interface after firewall rules. I believe this is generally going to be challenging on a bridging firewall, or outright impossible (we once ran into issues with changing the destination on a bridging firewall, although I haven't checked the state of affairs today).

HairpinNATAndPolicyBasedRouting written at 00:06:59; Add Comment

2024-04-05

Why I think you shouldn't digitally sign things casually

Over on the Fediverse, I said:

My standard attitude on digital signatures for anything, Git commits included, is that you should not sign anything unless you understand what you're committing to when you do so. This usually includes "what people expect from you when you sign things". Signing things creates social and/or legal liability. Do not blindly assume that liability without thought, especially if people want you to.

In re: (a Fediverse post encouraging signing Git commits)

If people are asking you to sign something, they are attributing a different meaning to an unsigned thing from you than to a signed thing from you. Before you go along with this and sign, you want to understand what that difference in meaning is and whether you're prepared to actually deliver that difference in practice. Are people assuming that you have your signing key in a hardware token that you keep careful custody of? Are people assuming you take some sort of active responsibility for commits you digitally sign? What is going to happen (even just socially) if your signing key is compromised?

For a very long time, I've felt that people's likely expectations of the security of my potential digital signatures did not match up with the actual security I was prepared to provide (for example, my old entry on why I don't have a GPG key). Nothing in the modern world of security has changed my views, especially as I've become more aware of my personal limits on how much I care about security. And while it's true that a certain amount of modern security practices make things not what they're labeled, the actual reality doesn't necessarily change people's expectations.

If you understand what people are really asking you for and expecting, and you feel that you can live up to that, then sure, sign away. Or if you feel that actual problems are unlikely enough and the social benefits of signing are high enough. But don't do it blindly.

(And if you have no choice about it because some organization is insisting that you sign things if you want to publish software packages, push changes, or whatever, then you mostly have no choice. Either you can sign or you can drop out. Just remember that sometimes dropping out is the right (or the only) answer.)

PS: There is also a tangle of issues around non-repudiation that I'm not going to try to get into.

OnNotSigningThings written at 00:11:43; Add Comment

2024-03-23

The many possible results of turning an IP address into a 'hostname'

One of the things that you can do with the DNS is ask it to give you the DNS name for an IP address, in what is called a reverse DNS lookup. A full and careful reverse DNS lookup is more complex than it looks and has more possible results than you might expect. As a result, it's common for system administrators to talk about validated reverse DNS lookups versus plain or unvalidated reverse DNS lookups. If you care about the results of the reverse DNS lookup, you want to validate it, and this validation is where most of the extra results come in to play.

(To put the answer first, a validated reverse DNS lookup is one where the name you got from the reverse DNS lookup also exists in DNS and lists your initial IP address as one of its IP addresses. This means that the organization responsible for the name agrees that this IP is one of the IPs for that name.)

The result of a plain reverse DNS lookup can be zero, one, or even many names, or a timeout (which is in effect zero results but which takes much longer). Returning more than one name from a reverse DNS lookup is uncommon and some APIs for doing this don't support it at all, although DNS does. However, you cannot trust the name or names that result from reverse DNS, because reverse DNS lookups is done using a completely different set of DNS zones than domain names use, and as a result can be controlled by a completely different person or organization. I am not Google, but I can make reverse DNS for an IP address here claim to be a Google hostname.

(Even within an organization, people can make mistakes with their reverse DNS information, precisely because it's less used than the normal (forward) DNS information. If you have a hostname that resolves to the wrong IP address, people will notice right away; if you have an IP address that resolves to the wrong name, people may not notice for some time.)

So for each name you get in the initial reverse DNS lookup, there are a number of possibilities:

  • Tha name is actually an (IPv4, generally) IP address in text form. People really do this even if they're not supposed to, and your DNS software probably won't screen these out.

  • The name is the special DNS name used for that IP address's reverse DNS lookup (or at least some IP's lookup). It's possible for such names to also have IP addresses, and so you may want to explicitly screen them out and not consider them to be validated names.

  • The name is for a private or non-global name or zone. People do sometimes leak internal DNS names into reverse DNS records for public IPs.
  • The name is for what should be a public name but it doesn't exist in the DNS, or it doesn't have any IP addresses associated with it in a forward lookup.

    In both of these cases we can say the name is unknown. If you don't treat 'the name is an IP address' specially, such a name will also turn up as unknown here if you make a genuine DNS query.

  • The name exists in DNS with IP addresses, but the IP address you started with is not among the IP addresses returned for it in a forward lookup. We can say that the name is inconsistent.

  • The name exists in DNS with IP addresses, and one of those IP addresses is the IP address you started with. The name is consistent and the reverse DNS lookup is valid; the IP address you started with is really called that name.

(There may be a slight bit of complexity in doing the forward DNS lookup.)

If a reverse DNS lookup for an IP address gave you more than one name, you may only care whether there is one valid name (which gives you a name for the IP), you may want to know all of the valid names, or you may want to check that all names are valid and consider it an error if any of them aren't. It depends on why you're doing the reverse DNS lookup and validation. And you might also care about why a name doesn't validate for an IP address, or that an IP address has no reverse DNS lookup information.

Of course if you're trying to find the name for an IP address, you don't necessarily have to use a reverse DNS lookup. In some sense, the 'name' or 'names' for an IP address are whatever DNS names point to it as (one of) their IP address(es). If you have an idea what those names might be, you can just directly check them all to see if you find the IP you're curious about.

If you're writing code that validates IP address reverse DNS lookups, one reason to specifically check for and care about a name that is an IP address is that some languages have 'name to IP address' APIs that will helpfully give you back an IP address if you give them one in text form. If you don't check explicitly, you can look up an IP address, get the IP address in text form, feed it into such an API, get the IP address back again, and conclude that this is a validated (DNS) name for the IP.

It's extremely common for IP addresses to have names that are unknown or inconsistent. It's also pretty common for IP addresses to not have any names, and not uncommon for reverse DNS lookups to time out because the people involved don't operate DNS servers that return timely answers (for one reason or another).

PS: It's also possible to find out who an IP address theoretically belongs to, but that's an entire different discussion (or several of them). Who an IP address belongs to can be entirely separate from what its proper name is. For example, in common colocation setups and VPS services, the colocation provider or VPS service will own the IP, but its proper name may be a hostname in the organization that is renting use of the provider's services.

DNSIpLookupsManyPossibilities written at 23:07:31; Add Comment

2024-03-19

About DRAM-less SSDs and whether that matters to us

Over on the Fediverse, I grumbled about trying to find SATA SSDs for server OS drives:

Trends I do not like: apparently approximately everyone is making their non-Enterprise ($$$) SATA SSDs be kind of terrible these days, while everyone's eyes are on NVMe. We still use plenty of SATA SSDs in our servers and we don't want to get stuck with terrible slow 'DRAM-less' (QLC) designs. But even reputable manufacturers are nerfing their SATA SSDs into these monsters.

(By the '(QLC)' bit I meant SATA SSDs that were both DRAM-less and used QLC flash, which is generally not as good as other flash cell technology but is apparently cheaper. The two don't have to go together, but if you're trying to make a cheap design you might as well go all the way.)

In a reply to that post, @cesarb noted that the SSD DRAM is most important for caching internal metadata, and shared links to Sabrent's "DRAM & HMB" and Phison's "NAND Flash 101: Host Memory Buffer", both of which cover this issue from the perspective of NVMe SSDs.

All SSDs need to use (and maintain) metadata that tracks things like where logical blocks are in the physical flash, what parts of physical flash can be written to right now, and how many writes each chunk of flash has had for wear leveling (since flash can only be written to so many times). The master version of this information must be maintained in flash or other durable storage, but an old fashioned conventional SSD with DRAM had some amount of DRAM that was used in large part to cache this information for fast access and perhaps fast bulk updating before it was flushed to flash. A DRAMless SSD still needs to access and use this metadata, but it can only hold a small amount of it in the controller's internal memory, which means it must spend more time reading and re-reading bits of metadata from flash and may not have as comprehensive a view of things like wear leveling or the best ready to write flash space.

Because they're PCIe devices, DRAMless NVMe SSDs can borrow some amount of host RAM from the host (your computer), much like some or perhaps all integrated graphics 'cards' (which are also nominally PCIe devices) borrow host RAM to use for GPU purposes (the NVMe "Host Memory Buffer (HMB)" of the links). This option isn't available to SATA (or SAS) SSDs, which are entirely on their own. The operating system generally caches data read from disk and will often buffer data written before sending it to the disk in bulk, but it can't help with the SSD's internal metadata.

(DRAMless NVMe drives with a HMB aren't out of the woods, since I believe the HMB size is typically much smaller than the amount of DRAM that would be on a good NVMe drive. There's an interesting looking academic article from 2020, HMB in DRAM-less NVMe SSDs: Their usage and effects on performance (also).)

How much the limited amount of metadata affects the drive's performance depends on what you're doing, based on both anecdotes and Sabrent's and Phison's articles. It seems that the more internal metadata whatever you're doing needs, the worse off you are. The easily visible case is widely distributed random reads, where a DRAMless controller will apparently spend a visible amount of time pulling metadata off the flash in order to find where those random logical blocks are (enough so that it clearly affects SATA SSD latency, per the Sabrent article). Anecdotally, some DRAMless SATA SSDs can experience terrible write performance under the right (or wrong) circumstances and actually wind up performing worse than HDDs.

Our typical server doesn't need much disk space for its system disk (well, the mirrored pair that we almost always use); even a generous Ubuntu install barely reaches 30 GBytes. With automatic weekly TRIMs of all unused space (cf), the SSDs will hopefully easily be able to find free space during writes and not feel too much metadata pressure then, and random reads will hopefully mostly be handled by Linux's in RAM disk cache. So I'm willing to believe that a competently implemented DRAMless SATA SSD could perform reasonably for us. One of the problems with this theory is finding such a 'competently implemented' SATA SSD, since the reason that SSD vendors are going DRAMless on SATA SSDs (and even NVMe drives) is to cut costs and corners. A competent, well performing implementation is a cost too.

PS: I suspect there's no theoretical obstacle to a U.2 form factor NVMe drive being DRAMless and using a Host Memory Buffer over its PCIe connection. In practice U.2 drives are explicitly supposed to be hot-swappable and I wouldn't really want to do that with a HMB, so I suspect DRAM-less NVMe drives with HMB are all M.2 in practice.

(I also have worries about how well the HMB is protected from stray host writes to that RAM, and how much the NVMe disk is just trusting that it hasn't gotten corrupted. Corrupting internal flash metadata through OS faults or other problems seems like a great way to have a very bad day.)

SSDsUnderstandingDramless written at 23:15:41; Add Comment

2024-03-17

Disk write buffering and its interactions with write flushes

Pretty much every modern system defaults to having data you write to filesystems be buffered by the operating system and only written out asynchronously or when you specially request for it to be flushed to disk, which gives you general questions about how much write buffering you want. Now suppose, not hypothetically, that you're doing write IO that is pretty much always going to be specifically flushed to disk (with fsync() or the equivalent) before the programs doing it consider this write IO 'done'. You might get this situation where you're writing and rewriting mail folders, or where the dominant write source is updating a write ahead log.

In this situation where the data being written is almost always going to be flushed to disk, I believe the tradeoffs are a bit different than in the general write case. Broadly, you can never actually write at a rate faster than the write rate of the underlying storage, since in the end you have to wait for your write data to actually get to disk before you can proceed. I think this means that you want the OS to start writing out data to disk almost immediately as your process writes data; delaying the write out will only take more time in the long run, unless for some reason the OS can write data faster when you ask for the flush than before then. In theory and in isolation, you may want these writes to be asynchronous (up until the process asks for the disk flush, where you have to synchronously wait for them), because the process may be able to generate data faster if it's not stalling waiting for individual writes to make it to disk.

(In OS tuning jargon, we'd say that you want writeback to start almost immediately.)

However, journaling filesystems and concurrency add some extra complications. Many journaling filesystems have the journal as a central synchronization point, where only one disk flush can be in progress at once and if several processes ask for disk flushes at more or less the same time they can't proceed independently. If you have multiple processes all doing write IO that they will eventually flush and you want to minimize the latency that processes experience, you have a potential problem if different processes write different amounts of IO. A process that asynchronously writes a lot of IO and then flushes it to disk will obviously have a potentially long flush, and this flush will delay the flushes done by other processes writing less data, because everything is running through the chokepoint that is the filesystem's journal.

In this situation I think you want the process that's writing a lot of data to be forced to delay, to turn its potentially asynchronous writes into more synchronous ones that are restricted to the true disk write data rate. This avoids having a large overhang of pending writes when it finally flushes, which hopefully avoids other processes getting stuck with a big delay as they try to flush. Although it might be ideal if processes with less write volume could write asynchronously, I think it's probably okay if all of them are forced down to relatively synchronous writes with all processes getting an equal fair share of the disk write bandwidth. Even in this situation the processes with less data to write and flush will finish faster, lowering their latency.

To translate this to typical system settings, I believe that you want to aggressively trigger disk writeback and perhaps deliberately restrict the total amount of buffered writes that the system can have. Rather than allowing multiple gigabytes of outstanding buffered writes and deferring writeback until a gigabyte or more has accumulated, you'd set things to trigger writebacks almost immediately and then force processes doing write IO to wait for disk writes to complete once you have more than a relatively small volume of outstanding writes.

(This is in contrast to typical operating system settings, which will often allow you to use a relatively large amount of system RAM for asynchronous writes and not aggressively start writeback. This especially would make a difference on systems with a lot of RAM.)

WriteBufferingAndSyncs written at 21:59:25; Add Comment

2024-03-02

Something I don't know: How server core count interacts with RAM latency

When I wrote about how the speed of improvement in servers may have slowed down, I didn't address CPU core counts, which is one area where the numbers have been going up significantly. Of course you have to keep those cores busy, but if you have a bunch of CPU-bound workloads, the increased core count is good for you. Well, it's good for you if your workload is genuinely CPU bound, which generally means it fits within per-core caches. One of the areas I don't know much about is how the increasing CPU core counts interact with RAM latency.

RAM latency (for random requests) has been relatively flat for a while (it's been flat in time, which means that it's been going up in cycles as CPUs got faster). Total memory access latency has apparently been 90 to 100 nanoseconds for several memory generations (although individual DDR5 memory module access is apparently only part of this, also). Memory bandwidth has been going up steadily between the DDR generations, so per-core bandwidth has gone up nicely, but this is only nice if you have the kind of sequential workloads that benefit from it. As far as I know, the kind of random access that you get from things like pointer chasing is all dependent on latency.

(If the total latency has been basically flat, this seems to imply that bandwidth improvements don't help too much. Presumably they help for successive non-random reads, and my vague impression is that reading data from successive addresses from RAM is faster than reading random addresses (and not just because RAM typically transfers an entire cache line to the CPU at once).)

So now we get to the big question: how many memory reads can you have in flight at once with modern DDR4 or DDR5 memory, especially on servers? Where the limit is presumably matters since if you have a bunch of pointer-chasing workloads that are limited by 'memory latency' and you run them on a high core count system, at some point it seems that they'll run out of simultaneous RAM read capacity. I've tried to do some reading and gotten confused, which may be partly because modern DRAM is a pretty complex thing.

(I believe that individual processors and multi-socket systems have some number of memory channels, each of which can be in action simultaneously, and then there are memory ranks (also) and memory banks. How many memory channels you have depends partly on the processor you're using (well, its memory controller) and partly on the motherboard design. For example, 4th generation AMD Epyc processors apparently support 12 memory channels, although not all of them may be populated in a given memory configuration (cf). I think you need at least N (or maybe 2N) DIMMs for N channels. And here's a look at AMD Zen4 memory stuff, which doesn't seem to say much on multi-core random access latency.)

ServerCPUDensityAndRAMLatency written at 22:54:58; Add Comment

2024-02-29

The speed of improvement in servers may have slowed down

One of the bits of technology news that I saw recently was that AWS was changing how long it ran servers, from five years to six years. Obviously one large motivation for this is that it will save Amazon a nice chunk of money. However, I suspect that one enabling factor for this is that old servers are more similar to new servers than they used to be, as part of what could be called the great slowdown in computer performance improvement.

New CPUs and to a lesser extent memory are somewhat better than they used to be, both on an absolute measure and on a performance per watt basis, but the changes aren't huge the way they used to be. SATA SSD performance has been more or less stagnant for years; NVMe performance has improved, but from a baseline that was already very high, perhaps higher than many workloads could take advantage of. Network speeds are potentially better but it's already hard to truly take advantage of 10G speeds, especially with ordinary workloads and software.

(I don't know if SAS SSD bandwidth and performance has improved, although raw SAS bandwidth has and is above what SATA can provide.)

For both AWS and people running physical servers (like us) there's also the question of how many people need faster CPUs and more memory, and related to that, how much they're willing to pay for them. It's long been observed that a lot of what people run on servers is not a voracious consumer of CPU and memory (and IO bandwidth). If your VPS runs at 5% or 10% CPU load most of the time, you're probably not very enthused about paying more for a VPS with a faster CPU that will run at 2.5% almost all of the time.

(Now that I've written this it strikes me that this is one possible motivation for cloud providers to push 'function as a service' computing, because it potentially allows them to use those faster CPUs more effectively. If they're renting you CPU by the second and only when you use it, faster CPUs likely mean more people can be packed on to the same number of CPUs and machines.)

We have a few uses for very fast single-core CPU performance, but other than those cases (and our compute cluster) it's hard to identify machines that could make much use of faster CPUs than they already have. It would be nice if our fileservers had U.2 NVMe drives instead of SATA SSDs but I'm not sure we'd really notice; the fileservers only rarely see high IO loads.

PS: It's possible that I've missed important improvements here because I'm not all that tuned in to this stuff. One possible area is PCIe lanes directly supported by the system's CPU(s), which enable all of those fast NVMe drives, multiple 10G or faster network connections, and so on.

ServersSpeedOfChangeDown written at 22:43:13; Add Comment

2024-02-25

Open source culture and the valorization of public work

A while back I wrote about how doing work that scales requires being able to scale your work, which in the open source world requires time, energy, and the willingness to engage in the public sphere of open source regardless of the other people there and your reception. Not everyone has this sort of time and energy, and not everyone gets a positive reception by open source projects even if they have it.

This view runs deep in open source culture, which valorizes public work even at the cost of stress and time. Open source culture on the one hand tacitly assumes that everyone has those available, and on the other hand assumes that if you don't do public work (for whatever reason) that you are less virtuous or not virtuous at all. To be a virtuous person in open source is to contribute publicly at the cost of your time, energy, stress, and perhaps money, and to not do so is to not be virtuous (sometimes this is phrased as 'not being dedicated enough').

(Often the most virtuous public contribution is 'code', so people who don't program are already intrinsically not entirely virtuous and lesser no matter what they do.)

Open source culture has some reason to praise and value 'doing work that scales', public work; if this work does not get done, nothing happens. But it also has a tendency to demand that everyone do it and to judge them harshly when they don't. This is the meta-cultural issue behind things like the cultural expectations that people will file bug reports, often no matter what the bug reporting environment is like or if filing bug reports does any good (cf).

I feel that this view is dangerous for various reasons, including because it blinds people to other explanations for a lack of public contributions. If you can say 'people are not contributing because they're not virtuous' (or not dedicated, or not serious), then you don't have to take a cold, hard look at what else might be getting in the way of contributions. Sometimes such a cold hard look might turn up rather uncomfortable things to think about.

(Not every project wants or can handle contributions, because they generally require work from existing project members. But not all such projects will admit up front in the open that they either don't want contributions at all or they gatekeep contributions heavily to reduce time burdens on existing project members. And part of that is probably because openly refusing contributions is in itself often seen as 'non-virtuous' in open source culture.)

OpenSourceCultureAndPublicWork written at 23:21:12; Add Comment

(Previous 10 or go back to February 2024 at 2024/02/16)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.