# Gramáticas y Lenguajes Generados
# Gramáticas Independientes del Contexto
Supongamos una gramática
Por ejemplo, en la gramática de Egg este es el conjunto
expression: STRING
| NUMBER
| WORD apply
apply: /* vacio */
| '(' (expression ',')* expression? ')' apply
2
3
4
5
6
Sólo hay dos variables sintácticas
El conjunto de tokens es:
Observe que algunos de los tokens son a su vez lenguajes de cierta complejidad, cuya definición está en otro nivel de abstracción, el nivel léxico y que se pueden definir mediante un mecanismo mas secillo como son las expresiones regulares.
Por ejemplo, en una definición de Egg inicial podríamos definir así lo que entendemos por espacios o blancos, esto es, que partes del texto no son significativas para que nuestro programa pueda entender la estructura de la frase:
WHITES = /(\s|[#;].*|\/\*(.|\n)*?\*\/)*/
así como los tokens mas complejos:
STRING = /"((?:[^"\\]|\\.)*)"/
NUMBER = /([-+]?\d*\.?\d+([eE][-+]?\d+)?)/
WORD = /([^\s(),"]+)/
2
3
# Ejercicio
Construye una derivación para la frase
print(**(g,f)(8))
Observa que el resultado del análisis léxico sería un stream como este:
WORD["print"] "(" WORD[**] "(" WORD[g] "," WORD[f] ")" "(" NUMBER[8] ")" ")"
Solución:
En la solución que sigue,
abreviamos expression por
En forma gráfica, tenemos el árbol sintáctico concreto que sigue:
Este es el mismo diagrama hecho usando mermaid (opens new window):
# Lenguaje Generado por Una Gramática
Para cada variable sintáctica
Esto es,
El lenguaje Egg es el conjunto de frases
El problema a considerar es el de construir para un lenguaje parseA()
que reconozca las frases del lenguaje
Siguiendo con el ejemplo de Egg, en
()
(4,b)
(4, +(5,c))
(4,)
/* nada */
Recuerda que:
apply: /* vacio */
| '(' (expression ',')* expression? ')' apply
2
y que:
# ECMAScript A Complex Language Specification
This Ecma Standard (opens new window) defines the ECMAScript 2022 Language.
It is the thirteenth edition of the ECMAScript Language Specification. Since publication of the first edition in 1997, ECMAScript has grown to be one of the world's most widely used general-purpose programming languages. It is best known as the language embedded in web browsers but has also been widely adopted for server and embedded applications.
Although ECMAScript started as a language with a simple design, over the years that design has become more and more complex. The following section is just an illustration of how some design decisions have led to increased complexity in interpreting and implementing the language.
# Lexical Ambiguity Example
The source text of an ECMAScript is first converted into a sequence of input elements, which are
- tokens,
- line terminators,
- comments, or
- white space.
The source text is scanned from left to right, repeatedly taking the longest possible sequence of code points as the next input element.
In ECMAScript, there are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context that is consuming the input elements.
This requires multiple goal symbols for the lexical grammar. The use of multiple lexical goals ensures that there are no lexical ambiguities that would affect automatic semicolon insertion.
For example, there are no syntactic grammar contexts where both a leading division or division-assignment, and a leading RegularExpressionLiteral (opens new window) are permitted.
This is not affected by semicolon insertion (see 12.5 (opens new window)); in examples such as lines 4 and 5 in the following code:
let {a, b, hi, g, c, d} = require('./hidden-amb')
a = b
/hi/g.exec(c).map(d)
console.log(a);
2
3
4
where the first non-whitespace, non-comment code point after a LineTerminator (opens new window) is the /
(U+002F unicode name SOLIDUS) and the syntactic context allows division or division-assignment, no semicolon is inserted at the LineTerminator
!.
That is, the above example is interpreted in the same way as:
a = b / hi / g.exec(c).map(d);
When we run the code above, we get:
➜ prefix-lang git:(master) ✗ node examples/lexical-ambiguity.js
1
2
The contents of file examples/hidden-amb.js
explain why the output is 1
:
let tutu = { map(_) { return 2}}
let a = 5, b = 8, hi = 4, c = "hello", d =
g = { exec(_) { return tutu; }}
module.exports = {a, b, hi, c, d, g}
2
3
4
See the code in the repo crguezl/js-lexical-ambiguity (opens new window)
# ECMAScript Language: Grammar
- A Grammar Summary (opens new window) (Appendix with the whole grammar)
- [A.2] Expressions (opens new window)
- [A.3] Statements (opens new window)
- [A.4] Functions and Classes (opens new window)
- [A.5] Scripts and Modules (opens new window)
- [A.6] Number Conversions (opens new window)
- [A.7] Universal Resource Identifier Character Classes (opens new window)
- [A.8] Regular Expressions (opens new window)
# ECMAScript Language: Lexical Specification
- [A.1] Lexical Grammar (opens new window)
- 12 ECMAScript Language: Lexical Grammar (opens new window)
- 11 ECMAScript Language: Source Text (opens new window)
# ECMA TC39 at GitHub
- Github Organization Ecma TC39: Ecma International, Technical Committee 39 - ECMAScript (opens new window)
- This Github repository contains the source for the current draft of ECMA-262 (opens new window)
# The Design of Programming Languages
See section The Design of Programming Languages