Calculator

Note

This tutorial assumes that you have Rustemo CLI properly installed. Refer to the CLI section if you have trouble running rcomp command.

In this tutorial we'll create a simple calculator with 4 arithmetic operations: +, -, *, and /.

We would like to parse and calculate expressions like this:

8 + 4 / 2 - 3.2 * 2

Note

This tutorial is rather lengthy as it covers all aspect of usage workflow, from creating Rust project, Rustemo grammar file to implementing types and actions. Also, this tutorial tries to be a gentle introduction providing all intermediate steps in details. Thus, this should serve as a full introduction and a prerequisite for the following tutorials as it explains the usual workflow in working with Rustemo. Following tutorials will assume that the user is familiar with the usual workflow.

Create the project

We assume that we are building a small demo program which accepts an expression from the user and prints the result or informative error if the expression doesn't conform to the grammar.

So, lets first build a new project:

cargo new --bin calculator

In this project we'll create a calculator.rustemo

cd calculator/src
touch calculator.rustemo

In the rest of the tutorial we assume that grammar is written in the above calculator.rustemo file.

The grammar

To parse such expressions we start with a grammar that describe the syntax and lexical structure of our expressions.

We start by a simple idea that each operation is binary, consisting of two operand and an infix operator:

<left operand> <operator> <right operand>

So, let's write that down in Rustemo grammar notation. First, we may say that our base expression is a single operation:

Expression: Operand Operator Operand;

Our operands are numbers. Let's define a lexical syntax for operands. We do that in a grammar section that starts with the terminals keyword. This section should go after the main part (syntax part) of the grammar. For lexical definitions, we either use plain strings if the content is fixed, or regular expressions, if the content is different between lexemes. See the section on terminals for the explanation.

In this case we have numbers which differs but we can encode their structure using regular expression:

terminals
Operand: /\d+(\.\d+)?/;

So, our operand should have at least one digit, after which optionally follows a dot and more digits. We could make this more elaborate but let's make it simple for now.

And what is the operator? It can be any of the +, -, *, / so we can encode that in a regex also:

Operator: /\+|-|\*|\//;

Symbols + and * have a special interpretation in regular expressions so we must escape them. Symbol / is used to start/end the regex in Rustemo so we must escape that also.

Now, our full grammar is:

Expression: Operand Operator Operand;

terminals
Operand: /\d+(\.\d+)?/;
Operator: /\+|-|\*|\//;

The problem with our grammar is that the only thing we could ever parse with it is a single operation, like 3 + 4. If we add more operations our parser, built with this grammar, won't work anymore. We'll see how to extend the parser later but let's now move on.

Generating the parser

Let's run rcomp command to generate the parser code from the grammar:

rcomp calculator.rustemo

If you get no output there were no errors. If you made an error in the grammar you will get a report with the line and column where the error was and what is expected at that location.

For example, if we forget colon after the rule name we get:

Error at calculator.rustemo:1:11:
	...Expression -->Operand Operato...
	Expected one of Colon, OBrace.
Parser(s) not generated.

After the parser generator is run successfully you should have files calculator.rs and calculator_actions.rs generated.

File calculator.rs is the parser while calculator_actions.rs in the default configuration contains deduced types for the AST (Abstract Syntax Tree) together with functions/actions used by the builder during the parsing process to construct the AST.

Regenerating the parser

calculator.rs is regenerated whenever you run rcomp command.

Actions in file calculator_actions.rs, on the other hand, are not fully regenerated. Only missing actions/types will be regenerated. This enables you to provide manual modifications of the actions/types as long as you retain the same name. As we shall soon see, this is a nice feature which provides a quick way to have a working parser with default AST output which can be later tuned as needed.

Adding dependencies

Our generated parser code calls Rustemo code so we must add rustemo crate as a dependency:

cargo add rustemo

Since we are using regular expressions in our grammar we also need regex and once_cell:

cargo add regex --no-default-features --features std,unicode-perl
cargo add once_cell

Your Cargo.toml should look like this:

[package]
name = "calculator1"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
once_cell = "1"
regex = { version = "1.7.1", default-features = false, features = ["std", "unicode-perl"] }
colored = "2"
# A relative path to rustemo crate is used here for usage in the rustemo project tree.
# In your projects you should just specify the version.
rustemo = { version = "0.6.0", path = "../../../../../rustemo" }

Running the parser

Now, let's open the main.rs file and write this for the header:

#![allow(unused)]
fn main() {
use rustemo::Parser;
use std::io;
// Use the generated parser
use crate::calculator::CalculatorParser;

// Include generated modules
#[rustfmt::skip]
mod calculator;
#[allow(unused)]
#[rustfmt::skip]
mod calculator_actions;
}

and this as the main function which will serve as entry point for our program:

fn main() {
    let mut expression = String::new();

    // Read the line from the input
    println!("Expression:");
    io::stdin()
        .read_line(&mut expression)
        .expect("Failed to read line.");

    // Parse the line and get the result.
    let result = CalculatorParser::new().parse(&expression);

    // Print the result using Debug formatter.
    println!("{:#?}", result);
}

So, our initial program will accept the string from the console input, parse it using the parser generated in the calculator module and print the result.

Run the program:

cargo run

You should see a prompt which expect you to enter the expression. If we enter:

2 + 3

We get debug output from the parser and at the end you should see:

Action: Accept
Ok(
    Expression {
        operand_1: "2",
        operator: "+",
        operand_3: "3",
    },
)

But if you make a mistake, like:

2 + / 3

You get:

Err(
    Error {
        message: "...2 + -->/ 3\n...\nExpected Operand.",
        file: Some(
            "<str>",
        ),
        location: Some(
            Location {
                start: LineBased(
                    LineBased {
                        line: 1,
                        column: 4,
                    },
                ),
                end: None,
            },
        ),
    },
)

Congrats! You have made your first Rustemo parser!

Although, it still have many issues which we'll fix in the rest of the tutorial.

Extending the grammar

Ok, let's see can we parse the example expression from the beginning of the tutorial:

8 + 4 / 2 - 3.2 * 2

Run the program and give it the expression:

$ cargo run
...
Expression:
8 + 4 / 2 - 3.2 * 2
...
Err(
    Error {
        message: "...8 + 4 -->/ 2 - 3.2 * 2\n...\nExpected STOP.",
        file: Some(
            "<str>",
        ),
        location: Some(
            Location {
                start: LineBased(
                    LineBased {
                        line: 1,
                        column: 6,
                    },
                ),
                end: None,
            },
        ),
    },
)

As we already discussed above, the grammar we created so far can only parse an expression consisting of a single operation. We can see that the parser reported the error at the occurrence of the second operation.

Let's extend our grammar to allow for expressions of arbitrary length.

First, we shall start from the idea the each expression can be represented as an Abstract Syntax Tree (AST):

These kinds of trees resemble the gist of the underlying expressions. We can observe two things:

  1. AST shows the priority of operations. * and / bind stronger than + and -. We see that operations with higher priority will be lower in the tree.
  2. Each node in the tree is a sub-expression. Numbers are the most basic expressions, each operation is a sub-expression consisting of left sub-tree/sub-expression, operation and right sub-tree/sub-expression.

So, we could write grammar rules as:

Add: Expression '+' Expression;
Sub: Expression '-' Expression;
...

Where Add, Sub... are all expressions, thus:

Expression: Add | Sub | Mul | Div;

The definition is recursive, e.g. Add references Expression through its operands and Expression references Add.

Finally, we can write this in a more compact form as a single grammar rule (contracting Expression to E):

E: E '+' E
 | E '-' E
 | E '*' E
 | E '/' E
 | Number;

terminals
Number: /\d+(\.\d+)?/;
Plus: '+';
Minus: '-';
Mul: '*';
Div: '/';

We can read this as:

Expression is:

  • Expression plus Expression or
  • Expression minus Expression or
  • ...
  • or a number

Note

We must add operations to the terminals section for the sake of giving names to symbols as the names are needed for the generated code. We can use those terminals directly in the syntax part e.g. like:

E: E Plus E

but I find it more readable to use strings in this case.

Of course, you can use either references or strings, or even combine the two approaches.

Let's run rcomp over our new grammar to generate the parser.

$ rcomp calculator.rustemo
In State 7:E
E: E Plus E .    {STOP, Plus, Minus, Mul, Div}
E: E . Plus E    {STOP, Plus, Minus, Mul, Div}
...
16 conflict(s). 16 Shift/Reduce and 0 Reduce/Reduce.
Error: Grammar is not deterministic. There are conflicts.
Parser(s) not generated.

Woops! What went wrong?

Rustemo is talking about some conflicts and that the grammar is not deterministic. This just means that LR parser cannot be produced with the grammar because LR is a deterministic style of parsing where parser must know exactly what operation to perform in each state based solely on the next token. With this grammar it is not possible.

The report also gives detailed information about problematic spots. The report talks about states, those are LR automaton states as the LR parsing is based on constructing a deterministic push-down automaton (PDA) which is used during parsing.

Tip

To learn more see the section on parsing and conflicts resolution.

In the report we see segments like this:

In State 10:E
E: E Div E .    {STOP, Plus, Minus, Mul, Div}
E: E . Plus E    {STOP, Plus, Minus, Mul, Div}
E: E . Minus E    {STOP, Plus, Minus, Mul, Div}
E: E . Mul E    {STOP, Plus, Minus, Mul, Div}
E: E . Div E    {STOP, Plus, Minus, Mul, Div}
When I saw E and see token Minus ahead I can't decide should I shift or reduce by production:
E: E Div E

Each parser state represent a location where parser can be in the parsing process. The location is depicted by the dot in the productions above. So in this state parser saw E or E Div E before and if there is Minus ahead it won't be able to decide if it should construct a Div sub-tree making Div operation to bind tighter or to consume the Minus symbol which follows in anticipation to construct Minus sub-tree first.

Obviously, we have an issue with our operation priorities. Indeed, this grammar is ambiguous as, if we forget about priorities, input expressions can yield many possible trees1. LR parsers must always produce only one tree as LR parsing is deterministic.

Note

GLR parsing is non-deterministic so it can be used with ambiguous grammars.

This could be resolved by transforming our grammar to encode priorities but the process is tedious, the resulting grammar becomes less readable and the resulting trees are far from intuitive to navigate and process.

Luckily, Rustemo has declarative disambiguation mechanisms which makes these issues easy to solve. These disambiguation information are specified inside of curly braces per production or per grammar rule.

The priority is specified by an integer number. The default priority is 10. Productions with higher priority will be the first to be reduced. Think of the reduction as producing a tree node where the node type is determined by the left-hand side of the production while the children are right-hand side.

Ok, knowing this let's extend our grammar:

E: E '+' E {1}
 | E '-' E {1}
 | E '*' E {2}
 | E '/' E {2}
 | Number;

terminals
Number: /\d+(\.\d+)?/;
Plus: '+';
Minus: '-';
Mul: '*';
Div: '/';

Nice, we now have priorities defined. + and - operations are of priority 1 while * and / are of priority 2. So, in the above conflict, division will be reduced before Minus symbol (subtraction) is taken into consideration.

Let's run rcomp again:

$ rcomp calculator.rustemo
...
8 conflict(s). 8 Shift/Reduce and 0 Reduce/Reduce.
Error: Grammar is not deterministic. There are conflicts.
Parser(s) not generated.

Hm... we still have ambiguities but we've cut them in half. Let's see what are those remaining issues:

In State 10:E
E: E Div E .    {STOP, Plus, Minus, Mul, Div}
E: E . Plus E    {STOP, Plus, Minus, Mul, Div}
E: E . Minus E    {STOP, Plus, Minus, Mul, Div}
E: E . Mul E    {STOP, Plus, Minus, Mul, Div}
E: E . Div E    {STOP, Plus, Minus, Mul, Div}
When I saw E and see token Mul ahead I can't decide should I shift or reduce by production:
E: E Div E

Ok, now we don't have issues with priorities but we have issues with associativities. When the parser saw division and there is multiplication symbol ahead it doesn't know what to do. Should it favor division or multiplication? They are of the same priority. We know that for all 4 operation parser should favor the operation it has already seen, or to put it simply, it should go from left to right when the priorities are the same2.

Rustemo also have declarative associativity specification. Associativity is specified by keywords left (or reduce) and right (or shift).

Let's fix our grammar:

E: E '+' E {1, left}
 | E '-' E {1, left}
 | E '*' E {2, left}
 | E '/' E {2, left}
 | Number;

terminals
Number: /\d+(\.\d+)?/;
Plus: '+';
Minus: '-';
Mul: '*';
Div: '/';

And run rcomp again.

$ rcomp calculator.rustemo
$

It seems that everything is fine. Let's check by running our project:

$ cargo run
...
Expression:
8 + 4 / 2 - 3.2 * 2
...
Action: Accept
Ok(
    C2(
        EC2 {
            e_1: C1(
                EC1 {
                    e_1: C5(
                        "8",
                    ),
                    e_3: C4(
                        EC4 {
                            e_1: C5(
                                "4",
                            ),
                            e_3: C5(
                                "2",
                            ),
                        },
                    ),
                },
            ),
            e_3: C3(
                EC3 {
                    e_1: C5(
                        "3.2",
                    ),
                    e_3: C5(
                        "2",
                    ),
                },
            ),
        },
    ),
)

That's it! We have a working grammar. It wasn't that hard after all, was it? :)

But, the work is not over yet. We got a parser but the AST is not that pretty and how can we evaluate our expressions? If you want to learn how to do that read on.

Improving AST

When we run rcomp for our grammar we got two files generated: calculator.rs and calculator_actions.rs. The first one is the parser that is regenerated from scratch whenever we run rcomp. The second contains AST nodes' types and actions used by the parser to construct AST nodes during parsing. This file can be manually modified and the modification will be retained when rcomp is run as we shall see in the next section.

At this point we shall focus on the form of AST types that Rustemo generated for us. Open calculator_actions.rs file and examine its content. We can see that for each grammar rule production we got struct generated with the name of the grammar rule followed by Cx sufix (C for Choice and x is an ordinal number).

#![allow(unused)]
fn main() {
/// This file is maintained by rustemo but can be modified manually.
/// All manual changes will be preserved except non-doc comments.
use ::rustemo::{Context, Token as BaseToken};
use super::calculator::{self, TokenKind};
pub type Input = str;
pub type Ctx<'i> = super::calculator::Context<'i, Input>;
#[allow(dead_code)]
pub type Token<'i> = BaseToken<'i, Input, TokenKind>;
pub type Number = String;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
    token.value.into()
}
#[derive(Debug, Clone)]
pub struct EC1 {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub struct EC2 {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub struct EC3 {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub struct EC4 {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub enum E {
    C1(EC1),
    C2(EC2),
    C3(EC3),
    C4(EC4),
    Number(Number),
}
pub fn e_c1(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::C1(EC1 {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_c2(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::C2(EC2 {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_c3(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::C3(EC3 {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_c4(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::C4(EC4 {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_number(_ctx: &Ctx, number: Number) -> E {
    E::Number(number)
}
}

Tip

You can expand and see the whole code in these snippets by using "Show hidden lines" option in the upper right corner of the code box.

And for each grammar rule we have an enum combining the variant structs:

#![allow(unused)]
fn main() {
/// This file is maintained by rustemo but can be modified manually.
/// All manual changes will be preserved except non-doc comments.
use ::rustemo::{Context, Token as BaseToken};
use super::calculator::{self, TokenKind};
pub type Input = str;
pub type Ctx<'i> = super::calculator::Context<'i, Input>;
#[allow(dead_code)]
pub type Token<'i> = BaseToken<'i, Input, TokenKind>;
pub type Number = String;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
    token.value.into()
}
#[derive(Debug, Clone)]
pub struct EC1 {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub struct EC2 {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub struct EC3 {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub struct EC4 {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub enum E {
    C1(EC1),
    C2(EC2),
    C3(EC3),
    C4(EC4),
    Number(Number),
}
pub fn e_c1(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::C1(EC1 {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_c2(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::C2(EC2 {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_c3(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::C3(EC3 {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_c4(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::C4(EC4 {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_number(_ctx: &Ctx, number: Number) -> E {
    E::Number(number)
}
}

Notice also that recursive types are boxed (e.g. pub e_1: Box<E>;) as they should be.

Ok, that is nice but look at those struct names and fields. They don't give us clue about operations, operands etc. It would be pretty hard to use these types.

Let's improve it a bit. First we can specify production kinds which is just a nice name for each production which can be used in the code.

We could workaround the issue by making a separate grammar rule for each operation like suggested above:

Add: E '+' E;
Sub: E '-' E;
...
E: Add | Sub |...

But let's do it with production kinds to see that alternative.

#![allow(unused)]
fn main() {
E: E '+' E {Add, 1, left}
 | E '-' E {Sub, 1, left}
 | E '*' E {Mul, 2, left}
 | E '/' E {Div, 2, left}
 | Number;

terminals
Number: /\d+(\.\d+)?/;
Plus: '+';
Minus: '-';
Mul: '*';
Div: '/';
}

Notice the additions of Add, Sub... in the curly braces.

Regenerate the parser, but first delete actions so they can be regenerated also.

$ rm calculator_actions.rs
$ rcomp calculator.rustemo

Now, if we open calculator_actions.rs we'll see that structs are named after productions kinds. Nice!

#![allow(unused)]
fn main() {
/// This file is maintained by rustemo but can be modified manually.
/// All manual changes will be preserved except non-doc comments.
use ::rustemo::{Context, Token as BaseToken};
use super::calculator::{self, TokenKind};
pub type Input = str;
pub type Ctx<'i> = super::calculator::Context<'i, Input>;
#[allow(dead_code)]
pub type Token<'i> = BaseToken<'i, Input, TokenKind>;
pub type Number = String;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
    token.value.into()
}
#[derive(Debug, Clone)]
pub struct Add {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Sub {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Mul {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Div {
    pub e_1: Box<E>,
    pub e_3: Box<E>,
}
#[derive(Debug, Clone)]
pub enum E {
    Add(Add),
    Sub(Sub),
    Mul(Mul),
    Div(Div),
    Number(Number),
}
pub fn e_add(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::Add(Add {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_sub(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::Sub(Sub {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_mul(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::Mul(Mul {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_div(_ctx: &Ctx, e_1: E, e_3: E) -> E {
    E::Div(Div {
        e_1: Box::new(e_1),
        e_3: Box::new(e_3),
    })
}
pub fn e_number(_ctx: &Ctx, number: Number) -> E {
    E::Number(number)
}
}

But, what about fields. It is certainly not nice to have those generic e_1, e_3 names. To fix these we can use assignments which is a mechanism to both define AST nodes' field names and specify to the parser what to retain in the AST during parsing. Some parts of the input are syntax noise and should not be kept in the AST.

To change field names add assignments for left and right operands in each operation:

#![allow(unused)]
fn main() {
E: left=E '+' right=E {Add, 1, left}
 | left=E '-' right=E {Sub, 1, left}
 | left=E '*' right=E {Mul, 2, left}
 | left=E '/' right=E {Div, 2, left}
 | Number;

terminals
Number: /\d+(\.\d+)?/;
Plus: '+';
Minus: '-';
Mul: '*';
Div: '/';
}

Delete actions and rerun rcomp. Now you'll see that the generated structs have nicely named fields.

#![allow(unused)]
fn main() {
/// This file is maintained by rustemo but can be modified manually.
/// All manual changes will be preserved except non-doc comments.
use ::rustemo::{Context, Token as BaseToken};
use super::calculator::{self, TokenKind};
pub type Input = str;
pub type Ctx<'i> = super::calculator::Context<'i, Input>;
#[allow(dead_code)]
pub type Token<'i> = BaseToken<'i, Input, TokenKind>;
pub type Number = String;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
    token.value.into()
}
#[derive(Debug, Clone)]
pub struct Add {
    pub left: Box<E>,
    pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Sub {
    pub left: Box<E>,
    pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Mul {
    pub left: Box<E>,
    pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Div {
    pub left: Box<E>,
    pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub enum E {
    Add(Add),
    Sub(Sub),
    Mul(Mul),
    Div(Div),
    Number(Number),
}
pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E {
    E::Add(Add {
        left: Box::new(left),
        right: Box::new(right),
    })
}
pub fn e_sub(_ctx: &Ctx, left: E, right: E) -> E {
    E::Sub(Sub {
        left: Box::new(left),
        right: Box::new(right),
    })
}
pub fn e_mul(_ctx: &Ctx, left: E, right: E) -> E {
    E::Mul(Mul {
        left: Box::new(left),
        right: Box::new(right),
    })
}
pub fn e_div(_ctx: &Ctx, left: E, right: E) -> E {
    E::Div(Div {
        left: Box::new(left),
        right: Box::new(right),
    })
}
pub fn e_number(_ctx: &Ctx, number: Number) -> E {
    E::Number(number)
}
}

Much easier to work with!

In general, it is a good practice to use assignments and production kinds from the start as they represent valuable information (a sort of docs). The AST types generator uses those information, as you have seen.

Let's run our calculator just to make sure it works:

$ cargo run
...
Expression:
8 + 4 / 2 - 3.2 * 2
...
Action: Accept
Ok(Sub(Sub { left: Add(Add { left: C5("8"),
right: Div(Div { left: C5("4"), right: C5("2") }) }),
right: Mul(Mul { left: C5("3.2"), right: C5("2") }) }))

Everything is fine. Let's move on.

Calculating expressions

The last piece of the puzzle is to make our calculator useful by doing the actual calculation.

When we are parsing input we are doing what we call "semantic analysis" which purpose is to transform the input to some other form which is of use in the domain we are working in.

Usually, the end result of the analysis is some form of AST or more often ASG (Abstract Semantic Graph). In our case, semantic analysis can directly produce the end result of the calculation so that the result of the full process of parsing is a single number which represents the result.

Semantic analysis is done by the actions in calculator_actions.rs. The actions are Rust functions which are called during reductions to reduce a production (right-hand side) to the sub-result (left-hand side of the grammar rule). In our case, each reduction should basically be a calculation of a sub-expression result.

As you have seen in the previous section, you need to delete actions if you want them regenerated on each Rustemo run. You can also remove some parts of it and those parts will get regenerated. The logic is that Rustemo will only add missing parts (identified by its name) of the file (missing AST types and action functions) but will not modify already existing. This enables manual modification of parts we want to tune to our likings.

We will now change types and actions to calculate sub-results instead of building AST nodes.

First, we'll start with the Number rule which is defined as a String:

#![allow(unused)]
fn main() {
pub type Number = String;
}

We know that our number should be float, so change it:

#![allow(unused)]
fn main() {
pub type Number = f32;
}

Just bellow the Number type we see the first action which is called to transform the parsed number to Number. token.value is a string slice containing the number.

#![allow(unused)]
fn main() {
pub fn number(_ctx: &Ctx, token: Token) -> Number {
    token.value.into()
}
}

Previously our Number was String but now it is a float so we should change this action to parse the string and produce a float:

#![allow(unused)]
fn main() {
pub fn number(_ctx: &Ctx, token: Token) -> Number {
    token.value.parse().unwrap()
}
}

Now, we see that E type which represents sub-results is enum.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub enum E {
    Add(Add),
    Sub(Sub),
    Mul(Mul),
    Div(Div),
    Number(Number),
}
}

A sub-result should be float also. So replace the enum with:

#![allow(unused)]
fn main() {
pub type E = f32;
}

Also, structs for values of variants Add, Sum... are not needed anymore so remove them from the file.

Okay, we have fixed generated types, now let's see our expressions' actions. They are of the form:

#![allow(unused)]
fn main() {
/// This file is maintained by rustemo but can be modified manually.
/// All manual changes will be preserved except non-doc comments.
use ::rustemo::{Context, Token as BaseToken};
use super::calculator::{self, TokenKind};
pub type Input = str;
pub type Ctx<'i> = super::calculator::Context<'i, Input>;
#[allow(dead_code)]
pub type Token<'i> = BaseToken<'i, Input, TokenKind>;
pub type Number = String;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
    token.value.into()
}
#[derive(Debug, Clone)]
pub struct Add {
    pub left: Box<E>,
    pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Sub {
    pub left: Box<E>,
    pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Mul {
    pub left: Box<E>,
    pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub struct Div {
    pub left: Box<E>,
    pub right: Box<E>,
}
#[derive(Debug, Clone)]
pub enum E {
    Add(Add),
    Sub(Sub),
    Mul(Mul),
    Div(Div),
    Number(Number),
}
pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E {
    E::Add(Add {
        left: Box::new(left),
        right: Box::new(right),
    })
}
pub fn e_sub(_ctx: &Ctx, left: E, right: E) -> E {
    E::Sub(Sub {
        left: Box::new(left),
        right: Box::new(right),
    })
}
pub fn e_mul(_ctx: &Ctx, left: E, right: E) -> E {
    E::Mul(Mul {
        left: Box::new(left),
        right: Box::new(right),
    })
}
pub fn e_div(_ctx: &Ctx, left: E, right: E) -> E {
    E::Div(Div {
        left: Box::new(left),
        right: Box::new(right),
    })
}
pub fn e_number(_ctx: &Ctx, number: Number) -> E {
    E::Number(number)
}
}

So they produce AST node while they should do the calculation. So let's change all of them:

#![allow(unused)]
fn main() {
/// This file is maintained by rustemo but can be modified manually.
/// All manual changes will be preserved except non-doc comments.
use ::rustemo::{Context, Token as BaseToken};
use super::calculator::{self, TokenKind};
pub type Input = str;
pub type Ctx<'i> = super::calculator::Context<'i, Input>;
#[allow(dead_code)]
pub type Token<'i> = BaseToken<'i, Input, TokenKind>;
pub type Number = f32;
pub fn number(_ctx: &Ctx, token: Token) -> Number {
    token.value.parse().unwrap()
}
pub type E = f32;
pub fn e_add(_ctx: &Ctx, left: E, right: E) -> E {
    left + right
}
pub fn e_sub(_ctx: &Ctx, left: E, right: E) -> E {
    left - right
}
pub fn e_mul(_ctx: &Ctx, left: E, right: E) -> E {
    left * right
}
pub fn e_div(_ctx: &Ctx, left: E, right: E) -> E {
    left / right
}
pub fn e_number(_ctx: &Ctx, number: Number) -> E {
    number
}
#[allow(dead_code)]
type Dummy = u32;
}

Tip

Just a reminder that you can see the whole code by using "Show hidden lines".

Very simple, each action is doing its calculation! Notice the last one which is called to reduce Number to E. It just returns the number as it is already a float produced by the number action.

That's it. We have a fully working calculator!

Let's run it just to verify:

$ cargo run
...
Expression:
8 + 4 / 2 - 3.2 * 2
...
Action: Accept
Ok(3.6)

If you have made it through here, well done!

Now, just to make sure that you have connected all the dots try to solve exercises bellow.

Exercises

  1. Extend the language to support power operator (^). What are priority and associativity of this operation?

  2. Support unary minus operation.

  3. Support trigonometric operations in a function call style (e.g. sin(34) + 2 * cos(12))

  4. Support variables before an expression. E.g:

    a = 5
    b = 2 * a
    a + 2 * sin(b)
    

    Extend main.rs to accept input line-by-line. Whenever the user enter assignment to the variable the program will update its internal state by evaluating RHS of the assignment and keeping variable values in some mapping data structure of your choice. Each defined variable can be used in subsequent expressions. When the user enters just an expression without assignment the parser should evaluate and print the result.

Footnotes

1

The number of trees (interpretations) of an arithmetic expression can be calculated by the Catalan number sequence. For our example with 4 operations from the beginning the number of trees will be 14.

2

This doesn't hold for some other operations. See Exercises and think about associativity of the power operation.