Custom Emoji for Mastodon

So in glitch-soc recently an idea that's been talked about A Bunch is custom instance emoji for Mastodon. The purpose of this post is to give a rough outline of what that would look like and how it might get implemented.

The Plan

When discussing custom emoji, a number of questions have come up, such as:

  1. Who gets to define them?
  2. How will they be inserted by users?
  3. How will they be internally represented?
  4. Will they be federated and if so how?

For the first of these questions, our rough consensus is to allow new-emoji-definition only at an instance level, ie by admins. This (a) greatly eases implementation, and also (b) helps to prevent abuse (unsavory emojis, emojis stolen without credit, etc). For the second question, shortcodes remain a convenient means of emoji insertion, and it shouldn't be too difficult to expand this functionality for our own purposes.

Regarding the latter questions, we have decided to rely on the following methods for expanding the Unicode emoji set:

Emojification of existing characters

Emoji 5.0 expressly allows conforming implementations to support a single code point outside of the basic emoji set for display, input, or editing as an emoji.

Instances should only emojify characters which are pictographs.

Zero-Width Joiner (ZWJ) sequences

In those cases where a predefined Unicode character is not available, instances may instead elect to support an emoji zero-width joiner (ZWJ) sequence, which combines multiple emoji characters into a single displayed glyph by placing U+200D ZERO WIDTH JOINER in-between them. This mechanism is already in place for characters such as the pride flag (๐Ÿณ + ZWJ + ๐ŸŒˆ), the eye in speech bubble (๐Ÿ‘ + ZWJ + ๐Ÿ—จ), and the various gender-variant forms of emojis, as well as emoji professions.

Emoji 5.0 expressly allows conforming implementations to support an emoji zwj sequence that is not in [the RGI emoji ZWJ sequence set] for display, input, or editing as an emoji.

Following this method of implementation is advantageous for a number of reasons:

  1. It is well-defined, already in use in other applications, and follows the Emoji spec
  2. It federates easily
  3. It provides recognizable fallbacks for unrecognized emoji (for emojified existing characters, their text representation; for ZWJ sequences, the characters used to compose them)
  4. It provides canonical Unicode sequences for custom emoji
  5. It allows multiple instances to use the same custom emoji without requiring them to do so
  6. It is not tied to any specific language

If a custom emoji gains widespread adoption across instances, they could additionally pave the way for eventual Unicode inclusion.


I've broken the plan for implementing this feature into multiple steps, which should be completed in order.

Step I : Shortcode support

The first thing which needs to be accomplished regarding custom emoji is allowing Mastodon instances to define their own emoji shortcodes. For an instance to support this feature, emoji shortcodes must be replaced with their corresponding Unicode characters in statuses, bios, and usernames. This replacement must take place for all local objects, and must not take place with regard to objects received from other instances (ie, through federation).

For statuses, this replacement should happen prior to the statuses being stored in the database. That way, if an emoji shortcode is removed, the content of the status will not change. For bios, usernames, and other situations, shortcodes may be stored in the database verbatim (to make later editing of these fields easier), but must be converted to emoji prior to federation.

A simple implementation of this might be to include an emoji.yml file in the root directory, which Mastodon then reads to perform the functions above. A sample file is given below:

        - squared_key
        - square_key
    src: /public/custom-emoji/squared_key.png
        - canadian_football
        - cfl
    src: /public/custom-emoji/canadian_football.png
Sample emoji.yml

For simplicity, zero-width joiners should not be present in such a file, but added on processing given the following algorithm:

  1. Let emoji_or_sequence be the one or more Unicode characters defining a custom emoji.

  2. If emoji_or_sequence contains any characters other than the following, abort with error:

    • A symbol character. This can be checked using the RegExp /\p{S}/.
  3. Abort with error if any of the following conditions are true:

    • There exists a U+FE0F VARIATION SELECTOR-16 character which is not preceded by a symbol.

    • The character sequence contains a skin tone modifier character (U+1F3FB..U+1F3FF EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6). Emoji modifiers are only valid for those characters defined in Unicode, and we have no easy means of testing this here.

  4. Let simplified_emoji_or_sequence be emoji_or_sequence with every U+FE0F VARIATION SELECTOR-16 character removed.

  5. Return simplified_emoji_or_sequence with a U+200D ZERO WIDTH JOINER inserted in-between every two adjacent characters. (If the length of simplified_emoji_or_sequence is 1, no characters are inserted.)

A more nuanced implementation might provide an input method through the admin web-view, as opposed to requiring admins to input custom emoji in the source. Implementations might additionally present admins with the option of specifying image files at multiple sizes, as opposed to just one.

Administrators may override existing Unicode emoji using this method as well, for example to provide their own shortcodes or images.

Step II : API interface

Once Mastodon has a means of replacing shortcodes with Unicode characters, we need to implement a means of getting those shortcodes, characters, and associated images to frontends to use. This requires an API access point, which must give the following:

  • The canonical forms of each custom emoji, including any U+200D ZERO WIDTH JOINER characters
  • The image file(s) to use with each custom emoji
  • Shortcodes for custom emoji

This API may additionally provide the following information:

  • Information regarding supported Unicode emoji and/or the shortcodes used to identify them

Step III : Frontend processing

After the API is finalized, frontend processing of emoji needs to be added. It is recommended that shortcode replacement take place in the backend, such that shortcodes from foreign statuses are not improperly handled. However, at the very least, frontends must:

  • Replace defined custom emoji sequences with their associated images
  • Provide users with a means of discovering and inserting shortcodes for custom emoji

Next Steps

Getting custom instance emoji is a big task, but once complete, a couple of other things have been proposed:

  1. Providing an about page listing all emoji and their recognized shortcodes
  2. Letting users provide their own images for any defined emoji
  3. Standardizing commonly used custom emoji and emoji sequences (for example, a Mastodon emoji) so that they are available (virtually) everywhere
  4. Getting upstream to adopt these changes ?? ! !