# The Earley Algorithm Explained

Let be a grammar and an input string .

A state is an element where is a production in the set of grammar productions , and is a position in the input string .

The set of active states when the input prefix is being analyzed is called .

More precisely, is the set of states whose production rule appears in a derivation from the symbol

Observe that it holds that $ \alpha \overset{*}{\Longrightarrow} a_{j} \ldots a_k$

and so the production can still be used for derivation at position .

The parser is seeded with consisting of only the top-level rule. The parser then repeatedly executes three operations:

  1. prediction,
  2. scanning, and
  3. completion.

These three operations are repeated until no new states can be added to the set

# Prediction

  1. (where j is the start of the substring), and is in the set of non terminals
  2. add states to grammar productions having Y on the left hand side.

Duplicate states are not added to the state set, only new ones.

# Scanning

If is the next terminal in the input stream, then of the form , add to .

# Completion

of the form , find all states in of the form and add to .

# Acceptance

The algorithm accepts if an state with the form ends up in , where is the start symbol of the grammar and is the input length.

If there are several of these states it means the grammaris ambiguous.

If there are none, the input is rejected.

# Pseudocode

We augment the initial grammar with the rule γ → •S


function EarleyParse(words, grammar) {

    function init(words) {
        let S = [];
        for(k =0; k <= words.length; k++) {
          S[k] = new Set();
        }
        return S;     
    }

    let S = init(words);

    function PREDICTOR((A → α•Bβ, j), k, grammar) {
      grammar.rules(B).forEach((B → γ) => S[k].add((B → •γ, k)))
    }

    function SCANNER((A → α•aβ, j), k, words) {
      if (words[k].match(a.regexp)) S[k+1].add((A → αa•β, j))
    }

    function COMPLETER((B → γ•, s), k) {
      S[s].forEach((A → α•Bβ, j) ) => S[k].add((A → αB•β, j))
    }

    S[0].add((γ → •S, 0));
    for(k = 0: k <= words.length; k++) {
        S[k].forEach(state => {  // S[k] can expand during this loop
            if (!state.isFinished()) 
                if (state.NextElement() in grammar.NonTerminal) 
                    PREDICTOR(state, k, grammar)         // non-terminal
                else
                    SCANNER(state, k, words)             // terminal
            else 
                COMPLETER(state, k)
        })
    }
    return S;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

# Example

# https://en.wikipedia.org/wiki/Earley_parser#Example

P -> S

S -> S "+" M
   | M

M -> M "*" T
   | T

T -> "1"
   | "2"
   | "3"
   | "4"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
➜  examples git:(main) ✗ nearleyc wikipedia.ne -o wikipedia.js 
➜  examples git:(main) ✗ nearley-test wikipedia.js -i '2+3*4'
Table length: 6
Number of parses: 1
Parse Charts
Chart: 0
0: {P →  ● S}, from: 0
1: {S →  ● S "+" M}, from: 0
2: {S →  ● M}, from: 0
3: {M →  ● M "*" T}, from: 0
4: {M →  ● T}, from: 0
5: {T →  ● "1"}, from: 0
6: {T →  ● "2"}, from: 0
7: {T →  ● "3"}, from: 0
8: {T →  ● "4"}, from: 0

Chart: 1
0: {T → "2" ● }, from: 0
1: {M → T ● }, from: 0
2: {M → M ● "*" T}, from: 0
3: {S → M ● }, from: 0
4: {S → S ● "+" M}, from: 0
5: {P → S ● }, from: 0

Chart: 2
0: {S → S "+" ● M}, from: 0
1: {M →  ● M "*" T}, from: 2
2: {M →  ● T}, from: 2
3: {T →  ● "1"}, from: 2
4: {T →  ● "2"}, from: 2
5: {T →  ● "3"}, from: 2
6: {T →  ● "4"}, from: 2

Chart: 3
0: {T → "3" ● }, from: 2
1: {M → T ● }, from: 2
2: {M → M ● "*" T}, from: 2
3: {S → S "+" M ● }, from: 0
4: {S → S ● "+" M}, from: 0
5: {P → S ● }, from: 0

Chart: 4
0: {M → M "*" ● T}, from: 2
1: {T →  ● "1"}, from: 4
2: {T →  ● "2"}, from: 4
3: {T →  ● "3"}, from: 4
4: {T →  ● "4"}, from: 4

Chart: 5
0: {T → "4" ● }, from: 4
1: {M → M "*" T ● }, from: 2
2: {M → M ● "*" T}, from: 2
3: {S → S "+" M ● }, from: 0
4: {S → S ● "+" M}, from: 0
5: {P → S ● }, from: 0


Parse results: 
[
  [
    [ [ [ [ '2' ] ] ], '+', [ [ [ '3' ] ], '*', [ '4' ] ] ]
  ]
]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

# Earley Parsing Explained

# Toby Ho on the Earley Parsing Algorithm

# Earley Parsing by Natalie Parde

# Wikipedia

# Optimizing Right Recursion

# NearleyJS

See section NearleyJS

Last Updated: a year ago