# The Earley Algorithm Explained
Let be a grammar
A state is an element
The set of active states when the input prefix
More precisely,
Observe that it holds that $ \alpha \overset{*}{\Longrightarrow} a_{j} \ldots a_k$
and so the production
The parser is seeded with
- prediction,
- scanning, and
- completion.
These three operations are repeated until no new states can be added to the set
# Prediction
(where j is the start of the substring), and is in the set of non terminals - add states
to grammar productions having Y on the left hand side.
Duplicate states are not added to the state set, only new ones.
# Scanning
If
# Completion
# Acceptance
The algorithm accepts if an state with the form
If there are several of these states it means the grammaris ambiguous.
If there are none, the input is rejected.
# Pseudocode
We augment the initial grammar with the rule γ → •S
function EarleyParse(words, grammar) {
function init(words) {
let S = [];
for(k =0; k <= words.length; k++) {
S[k] = new Set();
}
return S;
}
let S = init(words);
function PREDICTOR((A → α•Bβ, j), k, grammar) {
grammar.rules(B).forEach((B → γ) => S[k].add((B → •γ, k)))
}
function SCANNER((A → α•aβ, j), k, words) {
if (words[k].match(a.regexp)) S[k+1].add((A → αa•β, j))
}
function COMPLETER((B → γ•, s), k) {
S[s].forEach((A → α•Bβ, j) ) => S[k].add((A → αB•β, j))
}
S[0].add((γ → •S, 0));
for(k = 0: k <= words.length; k++) {
S[k].forEach(state => { // S[k] can expand during this loop
if (!state.isFinished())
if (state.NextElement() in grammar.NonTerminal)
PREDICTOR(state, k, grammar) // non-terminal
else
SCANNER(state, k, words) // terminal
else
COMPLETER(state, k)
})
}
return S;
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Example
# https://en.wikipedia.org/wiki/Earley_parser#Example
P -> S
S -> S "+" M
| M
M -> M "*" T
| T
T -> "1"
| "2"
| "3"
| "4"
2
3
4
5
6
7
8
9
10
11
12
13
14
➜ examples git:(main) ✗ nearleyc wikipedia.ne -o wikipedia.js
➜ examples git:(main) ✗ nearley-test wikipedia.js -i '2+3*4'
Table length: 6
Number of parses: 1
Parse Charts
Chart: 0
0: {P → ● S}, from: 0
1: {S → ● S "+" M}, from: 0
2: {S → ● M}, from: 0
3: {M → ● M "*" T}, from: 0
4: {M → ● T}, from: 0
5: {T → ● "1"}, from: 0
6: {T → ● "2"}, from: 0
7: {T → ● "3"}, from: 0
8: {T → ● "4"}, from: 0
Chart: 1
0: {T → "2" ● }, from: 0
1: {M → T ● }, from: 0
2: {M → M ● "*" T}, from: 0
3: {S → M ● }, from: 0
4: {S → S ● "+" M}, from: 0
5: {P → S ● }, from: 0
Chart: 2
0: {S → S "+" ● M}, from: 0
1: {M → ● M "*" T}, from: 2
2: {M → ● T}, from: 2
3: {T → ● "1"}, from: 2
4: {T → ● "2"}, from: 2
5: {T → ● "3"}, from: 2
6: {T → ● "4"}, from: 2
Chart: 3
0: {T → "3" ● }, from: 2
1: {M → T ● }, from: 2
2: {M → M ● "*" T}, from: 2
3: {S → S "+" M ● }, from: 0
4: {S → S ● "+" M}, from: 0
5: {P → S ● }, from: 0
Chart: 4
0: {M → M "*" ● T}, from: 2
1: {T → ● "1"}, from: 4
2: {T → ● "2"}, from: 4
3: {T → ● "3"}, from: 4
4: {T → ● "4"}, from: 4
Chart: 5
0: {T → "4" ● }, from: 4
1: {M → M "*" T ● }, from: 2
2: {M → M ● "*" T}, from: 2
3: {S → S "+" M ● }, from: 0
4: {S → S ● "+" M}, from: 0
5: {P → S ● }, from: 0
Parse results:
[
[
[ [ [ [ '2' ] ] ], '+', [ [ [ '3' ] ], '*', [ '4' ] ] ]
]
]
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# Earley Parsing Explained
# Toby Ho on the Earley Parsing Algorithm
# Earley Parsing by Natalie Parde
# Wikipedia
# Optimizing Right Recursion
Joop Leo’s optimizations for right-Recursion original paper
# NearleyJS
See section NearleyJS