The LaTeX Input Method

Motivation and inspiration

As someone who has frequent online conversations about math, I often find myself needing to use mathematical or technical symbols like the right arrow (→), the set intersection symbol (∩), and various Greek letters (π, Σ, ϵ, δ, Δ). In an online discussion, frequently the best way to use these symbols is as Unicode characters1. Inserting these characters from a standard American English keyboard is nontrivial; on Linux, some symbols (like arrows, ≠, super- and subscript numerals, °, and accented characters like ô) can be inserted using the Compose key; a key is pre-designated as the Compose key, and then pressing the Compose key followed by two or three characters that roughly represent the desired character will insert that character. Compose oo yields °, Compose o^ yields ô, Compose -> yields →, Compose >= yields ≥, and so on. But there is no Compose key support for Greek letters or many math symbols; the Compose key, in fact, was developed more for inserting accented letters than specialized symbols. And many Linux users do not have a Compose key configured.

How the program is used

On my computer, pressing Super+Tab opens a menu into which one can type the \(\LaTeX\) code for a symbol (and other things, which will be explained later). The menu gives completions, which one can accept by pressing Enter. Pressing Shift+Enter accepts the input as entered. The program then looks up the code in a dictionary and uses XDoTool to insert that symbol into whichever dialog is currently focused. XDoTool can sometimes be flaky2, so a facility is provided to insert the previously-selected character again quickly by pressing Super+Tab and then Enter (or entering !last at the prompt).

This program is written in ZSH, and stores the code-character mapping in a ZSH associative array. A Julia program pulls Julia's code-character mappings and outputs them as a shell script that adds each into that associative array. User-defined symbols can be added to the array in the first few lines of the source code file. I've also defined ways of entering some emoji, box-drawing characters, and commonly-used blackboard bold characters; these are shown in the tables below. Entering Uxxxxxxxx or Uxxxx at the prompt inserts the Unicode character with hex code U+xxxxxxxx or U+xxxx, respectively. Pressing Super+Shift+Tab invokes the \(\LaTeX\) input method but copies the selected character to the clipboard instead of inserting it using XDoTool—handy for when XDoTool is misbehaving.

How the program works

The program uses dmenu to get user input and present a list of possible completions to the user, XDoTool to insert characters, and xclip to copy characters. I have the i3 window manager configured to run dmenu_latexinput.zsh whenever Super+Tab is pressed and dmenu_latexinput.zsh -c whenever Super+Shift+Tab is pressed. The -c command line option copies the selected character to the clipboard instead of inserting it, -p does the same but to the X11 PRIMARY selection, and -r outputs the selected character to stdout.

Other character mappings

\(\LaTeX\) additions

Code Hex Character
\crossoff U+2717
\Rls U+211D
\Int U+2124
\Rxi U+2102
\Rat U+211A
\Nat U+2115
--- U+2014
-- U+2013

Emoji shortcodes

In the process of writing and editing this post, I added the capability to automatically add all emoji shortcodes (using the Joypixels dataset from the Emojibase project). These shortcodes are based on the ones Discord uses.

Box drawing characters

The box-drawing character commands use the following schema: the first character of the command represents the type of box-drawing character ("b" for single light, "d" for # double), and then "u", "l", "d", and "r" (in that order) are included if the character in question has a line going that direction. For instance, blr maps to , dldr maps to , and buldr maps to . This schema doesn't support all box-drawing characters present in Unicode, but it supports those that are most necessary.

Code Hex Character
bur U+2514
blr U+2500
bldr U+252C
buldr U+253C
bud U+2502
bulr U+2534
bul U+2518
bld U+2510
bdr U+250C
budr U+251C
buld U+2524
dlr U+2550
dud U+2551
ddr U+2554
dld U+2557
dur U+255A
dul U+255D
dudr U+2560
duld U+2563
dldr U+2566
dulr U+2569
duldr U+256C

Source code

For dmenu_latexinput.zsh

 1#!/usr/bin/env zsh
 2typeset -A latex_syms
 3. $(dirname $(realpath $0))/latex_syms.zsh
 4
 5# Other symbols
 6latex_syms[\crossoff]=U00002717
 7latex_syms[\Rls]=U0000211D
 8latex_syms[\Int]=U00002124
 9latex_syms[\Rxi]=U00002102
10latex_syms[\Rat]=U0000211A
11latex_syms[\Nat]=U00002115
12latex_syms[---]=U00002014
13latex_syms[--]=U00002013
14
15# Emoji
16latex_syms[:rat:]=U0001F400
17latex_syms[:mouse:]=U0001F401
18
19# Box-drawing characters
20# Explanation: first char is type of character ("b" for single light, "d" for
21# double), and then "u", "l", "d", and "r" (in that order) are included if the
22# character in question has a line going that direction.
23## single light
24latex_syms[bur]=U00002514 # "└"
25latex_syms[blr]=U00002500 # "─"
26latex_syms[bldr]=U0000252C # "┬"
27latex_syms[buldr]=U0000253C # "┼"
28latex_syms[bud]=U00002502 # "│"
29latex_syms[bulr]=U00002534 # "┴"
30latex_syms[bul]=U00002518 # "┘"
31latex_syms[bld]=U00002510 # "┐"
32latex_syms[bdr]=U0000250C # "┌"
33latex_syms[budr]=U0000251C # "├"
34latex_syms[buld]=U00002524 # "┤"
35
36## double
37latex_syms[dlr]=U00002550
38latex_syms[dud]=U00002551
39latex_syms[ddr]=U00002554
40latex_syms[dld]=U00002557
41latex_syms[dur]=U0000255A
42latex_syms[dul]=U0000255D
43latex_syms[dudr]=U00002560
44latex_syms[duld]=U00002563
45latex_syms[dldr]=U00002566
46latex_syms[dulr]=U00002569
47latex_syms[duldr]=U0000256C
48
49local rp=$(realpath $0)
50local last=$(dirname $rp)/../logs/$(basename $rp)-last
51
52local charname=$(print -rl \!last ${(@k)latex_syms} | dmenu)
53local charcode=""
54
55if [[ $charname == "!last" ]]
56then
57  charname=$(cat $last)
58fi
59
60if [[ $charname[1] == "U" ]]
61then
62  # bad input sanitization
63  charcode=$charname
64else
65  charcode=$latex_syms[$charname]
66fi
67
68# bad input sanitization
69local char=$(eval "print \"\\$charcode\"")
70echo $charname >$last
71
72if [[ $1 == "-c" ]]
73then
74  echo -n $char | xclip -selection clipboard
75elif [[ $1 == "-p" ]]
76then
77  echo -n $char | xclip
78elif [[ $1 == "-r" ]]
79then
80  echo $char
81else
82  xdotool key --clearmodifiers \
83    $charcode
84fi

For gen_syms.jl

 1#!/usr/bin/env julia
 2import REPL, Downloads, JSON
 3
 4const TARGET        = "latex_syms"
 5const EMOJIBASE_URL = "https://github.com/milesj/emojibase/raw/master/packages/data/en/shortcodes/joypixels.raw.json"
 6
 7flip1(k::String, vs::Vector) = map(v->":$v:"=>parse(UInt32, k, base=16), vs)
 8flip1(k::String, v::String)  = flip1(k, [v])
 9
10escape_paren_brkt(x) = replace(x, "("=>raw"\(", ")"=>raw"\)",
11  "["=>raw"\(", ")"=>raw"\]",
12  "{"=>raw"\(", ")"=>raw"\}")
13to_xdotool(x)        = length(x)!=1 ? @error(x) :
14  "U"*uppercase(string(UInt(x[1]), base=16, pad=8))
15
16chars2xdotool1((x,y),) =
17  "$TARGET[$(escape_paren_brkt(x))]='$(to_xdotool(y))'"
18chars2xdotool(ps)      = join(map(chars2xdotool1, ps), "\n")
19
20iobuf = IOBuffer()
21
22Downloads.download(EMOJIBASE_URL, iobuf)
23
24shortcode_db      = JSON.parse(String(take!(iobuf)))
25shortcode_db_proc = vcat(map(kvs->flip1(kvs...),
26  filter(((k, v),)->isnothing(match(r"-", k)), collect(shortcode_db)))...)
27
28typeset = "typeset -A $TARGET\n"
29latexes = chars2xdotool(collect(REPL.REPLCompletions.latex_symbols))
30emoji   = chars2xdotool(shortcode_db_proc)
31
32println(typeset*latexes*emoji)

For latex_syms.zsh

Generated by gen_syms.jl.

The relevant parts of i3.conf

1bindsym $mod+Tab         exec --no-startup-id zsh -c dmenu_latexinput.zsh
2bindsym $mod+Shift+Tab   exec --no-startup-id zsh -c 'dmenu_latexinput.zsh -p'

Generating the above tables

This code is hacked-together and kind of disgusting, but it's worth including here to prevent it from being lost forever.

1lsre=r"^latex_syms\[(.+)\]=U(....)(....)"
2srccode=clipboard()
3clipboard(join(map(m->let
4    code=m[1]; uh=m[2]; ul=m[3];
5    uc="U+"*(uh!="0000" ? uh*" "*ul : ul);
6    ch=Char(parse(UInt32, uh*ul, base=16));
7    "| ~$code~ | $uc | $ch |"; end,
8  filter(!isnothing, match.(lsre, split(srccode, "\n")))), "\n"))

1

Other times require more advanced means of typesetting math, such as \(\LaTeX\). To incorporate full \(\LaTeX\) output into online discussions that support only text and images, I've developed a similar program that takes the contents of the clipboard, compiles it using \(\LaTeX\), and copies the output as an image to the clipboard.

The programming language Julia—designed for math, scientific computing, and technical computing—is designed to allow its users to use symbols in variable names the same way one might use symbols in published math formulae. (For example, sin(θ) is valid and, to an extent, encouraged.) To accommodate this, the Julia REPL has a feature where typing the \(\LaTeX\) code for a symbol—like ~θ~—and pressing Tab will result in that code being replaced by the symbol it represents (θ, here). While I can't hijack the Tab key across the entire operating system, I found the idea interesting and decided to adapt it into a method for entering Unicode symbols across the operating system.

2

To be fair to the developer of XDoTool, it may be possible that the way my script uses XDoTool is to blame: it might be that my script calls XDoTool while the user is still holding down the Enter key, and that that prevents XDoTool from inserting the desired character.