Modern email addresses can be in UTF-8

March 2, 2023

Over on the Fediverse, I noted:

It has been '0' days since someone's email client helpfully let them use a Unicode '‐' instead of an ASCII '-' in dash-separated email addresses. Or perhaps the client automatically used the Unicode character instead of the ASCII dash.

You may not be surprised to hear that email systems, ours included, don't consider the two to be the same. I'm not sure how it even works, although some sending MTAs appear to just send the address as UTF-8.

Specifically, the character in question is Unicode U+2010 Hyphen (also). The email in question was sent to us using this character in a destination address that actually had the ASCII dash; given that the U+2010 version of the address didn't exist, Exim on our external MX gateway rejected it. These days, Exim's logging is in UTF-8, as is pretty much anything you'll use to read the logs, so the result was pretty confusing to disentangle. To all appearances it looked like our email system had temporarily glitched out and decided that some valid local addresses didn't actually exist.

The answer to my final question, about how this actually works, is RFC 6531: SMTP Extension for Internationalized Email, also known as SMTPUTF8. Exim supports SMTPUTF8 (if built appropriately), and it defaults to advertising this to everyone (per Main configuration and the description of smtputf8_advertise_hosts in it). To simplify, a large part of what SMTPUTF8 support does is that the sender can use UTF-8 in envelope addresses, both MAIL FROM and RCPT TO. Either or both of the local part and the (sub)domain can be in UTF-8, although the resulting DNS label needs to conform with IDNA.

Allowing email addresses to use U+2010 hyphens instead of ASCII ones is a trivial use of SMTPUTF8. A potentially much more important one for genuine internationalization is allowing people to have addresses that aren't written only in ASCII, for example because their name itself is not ASCII. Any number of Europeans have accented characters in their names and so might like to have them in their email addresses, and then there's quite a lot of people who don't write their names in any version of the Latin alphabet. SMTPUTF8 accommodates all of them.

Of course not all mail systems out there in the world support SMTPUTF8, so today anyone using such an email address is taking some degree of risk (unless their system automatically handles the situation of a destination mail server not supporting SMTPUTF8 by, for example, rewriting the envelope address and possibly message headers to a known alternate version). But I suspect that the large email providers all support it, and their support for it (and willingness to generate and use email addresses in UTF-8) will push everyone to support it sooner or later.

(I have actually encountered SMTPUTF8 before, cf, but in the time since then I forgot about it.)


Comments on this page:

By Arnaud Gomes at 2023-03-03 14:17:43:

From a quick test, Google and Microsoft support it, Yahoo doesn't. Orange (very big here in France) does not either, but this is no surprise.

For mail providers like us (and, I guess, you), there is no good configuration here: either we enable SMTPUTF8 and risk having our outgoing mail rejected, or we don't and we may refuse legitimate email.

   -- A
By Ian Z aka nobrowser at 2023-03-04 00:24:24:

I18N and UTF8 may be enabled by default in exim by Debian, Ubuntu and their spawn, but it is not so in the template upstream Makefile.

For mail providers like us (and, I guess, you), there is no good configuration here: either we enable SMTPUTF8 and risk having our outgoing mail rejected, or we don't and we may refuse legitimate email.

Smtputf8 looks like a self-fulfilling prophecy where ESPs want to avoid outbound problems resulting in more mail systems not enabling it...

I'm sure that most email validation tools (aka regex) are stuck on ASCII too.

Written on 02 March 2023.
« A gotcha with Systemd's DynamicUser, supplementary groups, and NFS (v3)
When securely erasing disks, who are you trying to stop? »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Mar 2 23:06:06 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.