Wandering Thoughts archives

2023-07-16

Social media posts aren't as small and simple as you might think

A while back, I read Tristan Hume's Production Twitter on One Machine? 100Gbps NICs and NVMe are fast (via), which goes through the mental exercise of sketching out such a thing. As part of this, Hume must assess the size of individual tweets, and if you were doing this for the Fediverse, you'd want to assess the size of individual posts there. Fediverse posts on my current server are limited to 500 characters, so things look simple. But reading Hume's article and thinking about it from a Fediverse perspective made me realize that things aren't actually that simple, and the actual posts are significantly more than their apparent text.

One of the things that isn't common on the Fediverse is using link shorteners. The reason for this is that they aren't necessary; on the Fediverse, URLs count for only 32 characters or so of your post length, no matter how long the actual URL is. As we've seen, longer URLs are truncated to this length when displayed. This is a good feature, but it means that Fediverse posts aren't as straightforward as they look; at a minimum, they contain the full URL, even if this puts them over 500 characters.

(I'm not quite sure how ActivityPub represents post data and handles cases like this.)

Over the years, I believe that both Twitter and the Fediverse have added quiet convenience features to their storage of post data. Some of this is pure metadata; for example, both know if a post is a reply to a previous post, and if so which post (and by who, and so on). Other features affect the contents of posts themselves. For example, I believe that Twitter has for some time tracked @mentions in tweets using the internal Twitter identifier for the user, so that if the account is renamed things still work (and an account name takeover can't suddenly mis-identify who the tweet was to or mentioning). I believe this is in addition to the raw '@<name>' text, which you want to retain in case the account vanishes entirely.

All of this is perfectly reasonable, and obviously it's something that the existing environments deal with fine. But it does mean that the actual storage of posts is more complicated and larger than just '240 (Unicode) characters' or so. As is not unusual, there's more complexity hiding underneath the rock when you turn it over.

tech/SocialMediaPostsNotSimple written at 22:55:36;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.