postprocess.ex (3853B)
1 defmodule Makeup.Lexer.Postprocess do 2 @moduledoc """ 3 Often you'll want to run the token list through a postprocessing stage before 4 running the formatter. 5 6 Most of what we can do in a post-processing stage can be done with more parsing rules, 7 but doing it in a post-processing stage is often easier and faster. 8 Never assume one of the options is faster than the other, always measure performance. 9 """ 10 11 @doc """ 12 Takes a list of the format `[{key1, [val11, val12, ...]}, {key2, [val22, ...]}]` and 13 returns a map of the form `%{val11 => key1, val12 => key2, ..., val22 => key2, ...}`. 14 15 The resulting map may be useful to highlight some tokens in a special way 16 in a postprocessing step. 17 18 You can also use pattern matching instead of the inverted map, 19 and it will probably be faster, but always benchmark the alternatives before 20 committing to an implementation. 21 """ 22 def invert_word_map(pairs) do 23 nested = 24 for {ttype, words} <- pairs do 25 for word <- words, do: {word, ttype} 26 end 27 28 nested 29 |> List.flatten() 30 |> Enum.into(%{}) 31 end 32 33 @doc """ 34 Converts the value of a token into a binary. 35 36 Token values are usually iolists for performance reasons. 37 The BEAM is actually quite fast at printing or concatenating iolists, 38 and some of the basic combinators output iolists, so there is no need 39 to convert the token values into binaries. 40 41 This function should only be used for tesring purposes, when you might 42 want to compare the token list into a reference output. 43 44 Converting the tokens into binaries has two advantges: 45 1. It's much easier to compare tokens by visual inspection when the value is a binary 46 2. When testing, two iolists that print to the same binary should be considered equal. 47 48 This function hasn't been optimized for speed. 49 Don't use in production code. 50 """ 51 def token_value_to_binary({ttype, meta, value}) do 52 {ttype, meta, to_string([value])} 53 end 54 55 @doc """ 56 Converts the values of the tokens in the list into binaries. 57 58 Token values are usually iolists for performance reasons. 59 The BEAM is actually quite fast at printing or concatenating iolists, 60 and some of the basic combinators output iolists, so there is no need 61 to convert the token values into binaries. 62 63 This function should only be used for tesring purposes, when you might 64 want to compare the token list into a reference output. 65 66 Converting the tokens into binaries has two advantges: 67 1. It's much easier to compare tokens by visual inspection when the value is a binary 68 2. When testing, two iolists that print to the same binary should be considered equal. 69 70 ## Example 71 72 ```elixir 73 defmodule MyTest do 74 use ExUnit.Case 75 alias Makeup.Lexers.ElixirLexer 76 alias Makeup.Lexer.Postprocess 77 78 test "binaries are much easier on the eyes" do 79 naive_tokens = ElixirLexer(":atom") 80 # Hard to inspect visually 81 assert naive_tokens == [{:string_symbol, %{language: :elixir}, [":", "a", "tom"]}] 82 better_tokens = 83 text 84 |> ElixirLexer.lex() 85 |> Postprocess.token_values_to_binaries() 86 # Easy to inspect visually 87 assert better_tokens == [{:string_symbol, %{language: :elixir}, ":atom"}] 88 end 89 end 90 ``` 91 92 Actually, you'll want to define some kind of helper to make it less verbose. 93 For example: 94 95 ```elixir 96 defmodule MyTest do 97 use ExUnit.Case 98 alias Makeup.Lexers.ElixirLexer 99 alias Makeup.Lexer.Postprocess 100 101 def lex(text) do 102 text 103 |> ElixirLexer.lex(group_prefix: "group") 104 |> Postprocess.token_values_to_binaries() 105 end 106 107 test "even better with our little helper" do 108 assert lex(":atom") == [{:string_symbol, %{language: :elixir}, ":atom"}] 109 end 110 end 111 """ 112 def token_values_to_binaries(tokens) do 113 Enum.map(tokens, &token_value_to_binary/1) 114 end 115 end