Rift – Syntactic Analysis

In which we define a big grammar to parse a program.

The parser for the Rift programming language compiler is done. It was developed using GNU Bison. The grammar file for the parser is in this file.

The rules of the grammar have two kinds of symbols. The terminals and the non-terminals. The non-terminals all have their own rules within the grammar and follow the conventions that all of it’s characters are lower case. The terminal symbols follow the convention of having uppercase characters and all start with T_. A rule from the grammar looks like the following:

var_decl: data_types T_ID T_SEMICOLON
| data_types T_ID T_EQUAL expr T_SEMICOLON
;

The terminal symbols in the grammar are called tokens and they came directly from the lexical analysis part of this compiler. This part, developed using Flex, defined a lot of rules that identified certain sequences of characters has a specific token. For example, the T_INT_B10_LIT token is used to expressed that a base 10 integer was found. Similarly, the T_EQUAL token is used when the = symbol is found.

Each grammar symbol has a type which is defined using the C, or in this case C++ types. Those type can be an integer, float, structure, class, etc. A special union is used in the grammar file to define the various types. Has of this version it looks like the following:

%union {
int token;
int ph;
}

It’s still very poor. It only defines two types, the token type which is an integer and a ph type, also an integer. The token type will be the type for every token returned by the scanner. Later that will also be changed, but for now it will be like this. The ph type stands for place holder and it will be the type for every non-terminal symbol. The non-terminal symbols will have a specific type later which will be defined using a class of the abstract tree. For now the placeholder is used because the tree was not developed yet.

The different rules should have an action associated with them that defines what the program should do when it recognizes that rule. For example, at same stage for parsing a rift file there would be a variable assign that would be recognized by this rule:

var_assign: postfix_expr T_EQUAL expr T_SEMICOLON { $$ = 10; };

The action of this rule is defined inside the curly braces. Inside the scope of the action there are variables that can be used and start with the dollar ($) symbol. This variables refer to the different symbols in the rule. In the example, $$ is the *var_assign* symbol, $1 is the postfix_expr symbol, $2 the T_EQUAL symbol, and so on. In the action the only thing that happens is that var_assign get the value 10 assign to it. Why?

Well, that happens because I just needed a placeholder action. Many of the rules don’t have any actions yet but some needed to have them to avoid some conflict. So I just left that in there. I’m sure that there might be a better wait for this temporary fix. But it works like this.

Tags: , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: