It finally happened to me. A user entered an emoji into an input field, and a production system broke.
I don’t want to get into the weeds of the particular system, except to say that it’s a strictly ordered queue that couldn’t move forward because an entry was being rejected by a database validation check.
Three pieces of background:
- MariaDB implents its JSON column data type by combining the regular LONGTEXT column type with a validation constraint that runs inserted or updated values through
json_valid(). Column definition looks something like this:
column_name longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL CHECK (json_valid('column_name')). I don’t how this differes from MySQL, if at all.
- The JSON RFC states that only a handful of Unicode characters inside strings have to be escaped, but all of them may be escaped.
2.5. Strings [...] All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F). Any character may be escaped.
json_encode()escapes all Unicode characters by default. (This behavior can be disabled by passing the
The issue presented itself as database inserts failing because the JSON validation was failing, but only if the emoji in question (
😏) was escaped (
\ud83d\ude0f). Running the data manually through
json_valid() succeeded though. This seems to indicate that MariaDB was using different code paths for column validation and the query function.
We were already running a version of MariaDB that didn’t exhibit this issue (10.4.11) in another environment, so rather than making a bunch of code changes we added some tests to check for this issue and upgrade the production docker images with the new database server version. I checked the changelogs and saw no mention of this issue, though, so it’s possible this was fixed unintentionally and might resurface in a future build. I’ll look into filing a bug for it.