0%

Lex and Yacc Notes

Lex and Yacc Notes

My GitHub Examples

1. Lex

Intro to lex/flex

  • lex: lexical analyzer generator (詞法產生器)
  • flex: fast lexical analyzer generator(快速詞法分析產生器)
    • lex的開放原始碼版本
  • flex > lex
  • 也叫做 Scanners 或 lexers

Editor Plugin

Compile

1
2
3
4
# OS X 10.11.6
flex lexfile.l
gcc lex.yy.c -ll
./a.out < input.txt
1
2
3
4
# Linux
flex lexfile.l
gcc lex.yy.c -lfl
./a.out < input.txt

Usage

1
2
3
4
5
6
7
8
/*** Definition section ***/
%{
initial C or C++ codes
%}
%%
/*** Rules section ***/
%%
/*** C Code section ***/

example from wiki

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/*** Definition section ***/
%{
/* C code to be copied verbatim */
#include <stdio.h>
%}
/* This tells flex to read only one input file */
%option noyywrap
NUMBER [0-9]+
%%
/*** Rules section ***/
{NUMBER} {
/* yytext is a string containing the matched text. */
printf("Saw an integer: %s\n", yytext);
}
.|\n { /* Ignore all other characters. */ }
%%
/*** C Code section ***/
int main(void)
{
/* Call the lexer, then quit. */
yylex();
return 0;
}

Solve yywrap() error

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
%{
....
%}
/* solve yywrap 1 */
%option noyywrap
%%
......
%%
/* solve yywrap 2 */
/*int yywrap () {return -1;}*/
/* solve yywrap 3 */
/* OS:X gcc ..... -ll */
/* Linux gcc ..... -lfl */

Common Lex Variables

  • yyout
    • The FILE* used for output, it is equivalent to stdout by default
  • yyin
    • The FILE* used for read input, it is equivalent to stdin by default
  • yytext
    • Store the matched token
  • yyleng
    • Store the length of the matched token

Common Lex Macros

  • ECHO
    • It is equivalent to fprintf(yyout, “%s”, yytext)
  • BEGIN
    • Switch the start state
    • can do comments

2. Yacc

Intro to yacc/bison

  • yacc: Yet Another Compiler Compiler
  • bison: GNU bison
  • bison > yacc
  • 需與lex一起用
  • 輸入是巴科斯範式(BNF)
  • 採用LALR(1)
    • Bottom up Parser
    • Match到RHS(right hand side)會轉成LHS
  • 也叫做Parser

Editor Plugin

Compile

1
2
3
4
5
# OS X
bison -d -o y.tab.c yacc.y
flex -o lex.yy.c lex.l
gcc lex.yy.c y.tab.c
./a.out < input.txt

Usage

1
2
3
4
5
6
7
// add in .l file
#include "y.tab.h"
// .l return token to .y
yylval.ival = atoi(yytext);
return(INUMBER);
return(yytext[0]);
1
2
3
4
5
6
7
8
%{
initial C or C++ codes
%}
… Other Definitions …
%%
… Rules Section …
%%
… Subroutine Section …

Rules section

  • Context Free Grammer

C Code section

  • yyerror
    • deal with error . ex: sytax error
  • yyparse
    • Use the productions of rule section to analyze the syntax
    • Communication with yylex

Symbol Attributes

  • yylval:
    • This is a variable defined in *.y file
    • When token is returned to yacc , yacc can use it to do something
    • By default, yylval is of type int

Non-Integer Symbol Attributes

  • The %union directive
  • Declare data type for terminals and non-terminals
    • terminals: %token token …
    • non-terminals: %type non-terminal …
1
2
3
%union{
datatype variable-name;
}

Token Declarations

  • %token: generic token declarations
  • %left: left-associative binary operators with precedence
    • ex: 9-3-2 = 4 not 8
  • %right: right-associative binary operators with precedence
    • ex: 2 ^ 2 ^ 3 = 256 not 64
  • %nonassoc: non-associative tokens with precedence (從左邊算到右邊跟右算到左一樣)