en
de

Diesel, Part 3: Bootstrapping Diesel

20 May 2015
| |
Reading time: 18 minutes

I ‘ve followed with interest the efforts of my colleagues to implement a lightweight DSL approach using an LL(1) parser in Ruby. Daniel Mölle has shown in his blog post how easily a Melody DSL can be implemented using his Diesel approach. I question myself whether this simple approach is an advantage for more realistic tasks. The Melody DSL is a sort of a Hello World example that is too simplistic in my opinion. Once a project is complex enough for a DSL, I expect to see a significant increase in the number of keywords and language constructs.

Looking for a realistic example I hit on the following idea: Why not implement a Diesel-parser for any LL(1) language generated by Diesel? A DSL to describe languages has already be invented by John Backus and Peter Naur. Using the so-called Extended Backus-Naur Form (EBNF) by Niklaus Wirth I can express the syntax of programming languages formally.

From the Wikipedia article about the EBNF and a matching discussion on StackOverflow the following LL(1) EBNF grammar can be assembled quickly:

gramar = { rule };
rule = ID, "=", rhs, ";";
rhs = factor, { ",", factor };
factor = primary, { "|", primary };
primary = ID | terminal | optional | repeated | grouped;
terminal = ("'", ID | SYMBOL, "'") | ('"', ID | SYMBOL, '"');
optional = "[", rhs, "]";
repeated = "{", rhs, "}";
grouped = "(", rhs, ")";

EBNF parser done with Diesel

A parser for this grammar in Diesel is just as easy as the Melody example. Copy and paste here, search and replace there… done. Amazing. The following parser is able to parse the given grammar definition and creates an Abstract Syntax Tree (AST) of the EBNF itself.

require_relative 'types'

module Diesel
    module Parser
        SYMBOLS = "[{[()]}\"'.:,;#*+~\/\\$%&=|?!-~"
        ID = /\w+/

        # Tokenizes and parses a complete input file.
        def parse_file(fn)
            # Tokenize and parse the input file.
            @history = Array.new
            tokens = tokenize_file(fn)
            begin
                root = expect_gramar_body(nil, tokens)
            rescue Exception => e
                fail_verbosely(e)
            end
            # Return the parse result.
            return root
        end

        private

        # Tokenizes an input file defined by its name.
        def tokenize_file(fn)
            tokens = Array.new
            File.readlines(fn).each do |line|
                linetokens = line.gsub(/([#{Regexp.quote(SYMBOLS)}])/, ' \1 ').split(/\s+/)
                linetokens.each do |linetoken|
                    strippedlinetoken = linetoken.strip
                    if !strippedlinetoken.empty? then
                        tokens.push(strippedlinetoken)
                    end
                end
            end
            return tokens
        end

        def append(field, value)
            return field unless value
            if field == nil
                return value
            elsif field.kind_of?(Array)
                return field.push(value)
            else
                return [ field, value ]
            end
        end

        # Parses all the tokens in the input stream.
        def expect_gramar_body(gramar, tokens)
            gramar = Gramar.new
            while (lambda {
                rule = accept_rule_body(gramar, tokens)
                gramar.rules = append(gramar.rules, rule)
                return rule
            }.call)
            end
            return gramar
        end

        # Parses the entire body of a "definition" statement.
        def accept_rule_body(gramar, tokens)
            lhs = accept_id(tokens)
            return nil unless lhs
            expect_keyword(tokens, "=")
            rhs = parse_rhs_body(gramar, tokens)
            expect_keyword(tokens, ";")
            return Rule.new(lhs, rhs)
        end

        # Parses the entire body of a "rhs" statement.
        def parse_rhs_body(gramar, tokens)
            rhs = RHS.new(Array.new)
            begin
                factor = parse_factor_body(gramar, tokens)
                raise "Unexpected '#{tokens[0]}' expecting factor while parsing rhs body" unless factor
                rhs.factors.push(factor)
            end while factor = accept_keyword(tokens, ",")
            return rhs
        end

        # Parses the entire body of a "factor" statement.
        def parse_factor_body(gramar, tokens)
            factor = Factor.new(Array.new)
            begin
                primary = parse_primary_body(gramar, tokens)
                raise "Unexpected '#{tokens[0]}' expecting primary while parsing factor body" unless primary
                factor.primaries.push(primary)
            end while primary = accept_keyword(tokens, "|")
            return factor
        end

        # Parses the entire body of a "primary" statement.
        def parse_primary_body(gramar, tokens)
            if (id = accept_id(tokens))
                return id
            elsif (terminal = parse_terminal_body(gramar, tokens))
                return terminal
            elsif (optional = parse_optional_body(gramar, tokens))
                return optional
            elsif (repeated = parse_repeated_body(gramar, tokens))
                return repeated
            elsif (grouped = parse_grouped_body(gramar, tokens))
                return grouped
            end
            return nil
        end

        # Parses the entire body of a "terminal" statement.
        def parse_terminal_body(gramar, tokens)
            if accept_keyword(tokens, "'")                
                id = accept_id(tokens)
                symbol = expect_symbol(tokens) unless id
                expect_keyword(tokens, "'")
                return Terminal.new(id, symbol)
            elsif accept_keyword(tokens, '"')                
                id = accept_id(tokens)
                symbol = expect_symbol(tokens) unless id
                expect_keyword(tokens, '"')
                return Terminal.new(id, symbol)
            end
            return nil
        end

        # Parses the entire body of a "optional" statement.
        def parse_optional_body(gramar, tokens)
            if accept_keyword(tokens, "[")
                rhs = parse_rhs_body(gramar, tokens)
                expect_keyword(tokens, "]")
                return Optional.new(rhs)
            end
            return nil
        end

        # Parses the entire body of a "repeated" statement.
        def parse_repeated_body(gramar, tokens)
            if accept_keyword(tokens, "{")
                rhs = parse_rhs_body(gramar, tokens)
                expect_keyword(tokens, "}")
                return Repeated.new(rhs)
            end
            return nil
        end

        # Parses the entire body of a "grouped" statement.
        def parse_grouped_body(gramar, tokens)
            if accept_keyword(tokens, "(")
                rhs = parse_rhs_body(gramar, tokens)
                expect_keyword(tokens, ")")
                return Grouped.new(rhs)
            end
            return nil
        end        

        # Extracts the next token from the token queue.
        def get_next(tokens)
            first = tokens.delete_at(0)
            @history.push({:token => first, :caller => caller})
            return first
        end

        def expect_id(tokens)
            id = accept_id(tokens)
            raise "Unexpected '#{tokens[0]}' (not ID)" unless id
            return id
        end

        def accept_id(tokens)
            return nil if tokens.empty?
            return get_next(tokens) if ID.match(tokens[0])
            return nil
        end

        def expect_symbol(tokens)
            symbol = accept_symbol(tokens)
            raise "Unexpected '#{tokens[0]}' (not SYMBOL)" unless symbol
            return symbol
        end

        def accept_symbol(tokens)
            return nil if tokens.empty?
            return get_next(tokens) if /([#{Regexp.quote(SYMBOLS)}])/.match(tokens[0])
            return nil
        end

        # Returns the given keyword if it happens to match the next token,
        # removing that token from the input queue in the process.
        # Otherwise, nil is returned, and the queue remains unchanged.
        def accept_keyword(tokens, keyword)
            return nil if tokens.empty?
            return get_next(tokens) if tokens[0] == keyword
            return nil
        end

        # Ensures that the given keyword happens to match the next token.
        # Returns the token on success.
        def expect_keyword(tokens, keyword)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Unexpected '#{token}' (not '#{keyword}')" unless token == keyword
            return token
        end

        # Ensures that the next token matches an entry in the given array.
        # On success, the respective token is returned.
        def expect_old(tokens, array)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Unknown identifier '#{token}'" unless array.include?(token)
            return token
        end

        # Ensures that the next token does NOT match an entry in the given array.
        # On success, the respective token is returned.
        def expect_new(tokens, array)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Identifier '#{token}' already in use" unless !array.include?(token)
            return token
        end

        # Parses the given string for a positive non-zero integer.
        def get_int(s)
            i = s.to_i
            raise "Failed to parse integer: '#{s}'" unless i > 0
            return i
        end

        # Parses the given string for a positive non-zero float.
        def get_float(s)
            f = s.to_f
            raise "Failed to parse float: '#{s}'" unless f > 0.0
            return f
        end

        # Parses the given string for a positive non-zero rational.
        def get_rational(s)
            r = s.to_r
            raise "Failed to parse rational: '#{s}'" unless r > 0
            return r
        end

        # Returns the plain array of names of an array of named items.
        def get_names(array)
            result = Array.new
            array.each do |item|
                result.push(item.name)
            end
            return result
        end

        # Augments an exception message with a token history.
        def fail_verbosely(msg)
            puts "Token history at point of error:"
            puts "================================"
            pp @history
            puts "================================"
            raise msg
        end
    end
end

But the exciting part is yet to come: The generator. The first parser for the EBNF was written by hand and was strongly oriented on Daniel’s Melody parser. I spent no thought on generic aplicability so far. I have to admit that I will not be able to build a Daniel-like generator. I decided to skip elegance and concentrate on the function instead.

Free choice of generator technology

In contrast to the Melody example (where the generated content is emmited via puts) I took this opportunity to try a template-based approach using Embedded Ruby(ERB). The ability to choose the best fitting generator pattern is a great strength of the Diesel approach. For “quick and dirty” applications the inline-emit (AKA puts) approach is adequate while for more complex cases templates will provide more felxibility in the generator.

After a few attempts I’ve assembled the following template:

require_relative 'types'

module 
    module Parser
        SYMBOLS = "[{[()]}\"'.:,;#*+~\/\\$%&=|?!-~"
        ID = /\w+/
        
        # Tokenizes and parses a complete input file.
        def parse_file(fn)
            # Tokenize and parse the input file.
            @history = Array.new
            tokens = tokenize_file(fn)
            begin
                root = expect__body(nil, tokens)
            rescue Exception => e
                fail_verbosely(e)
            end
            # Return the parse result.
            return root
        end

        private

        # Tokenizes an input file defined by its name.
        def tokenize_file(fn)
            tokens = Array.new
            File.readlines(fn).each do |line|
                linetokens = line.gsub(/([#{Regexp.quote(SYMBOLS)}])/, ' \1 ').split(/\s+/)
                linetokens.each do |linetoken|
                    strippedlinetoken = linetoken.strip
                    if !strippedlinetoken.empty? then
                        tokens.push(strippedlinetoken)
                    end
                end
            end
            return tokens
        end
        
        def append(field, value)
            return field unless value
            if field == nil
                return value
            elsif field.kind_of?(Array)
                return field.push(value)
            else
                return [ field, value ]
            end
        end
        
            
            id = _id(tokens)
            .id = append(.id, id)
            return id 
            symbol = _symbol(tokens)
            .symbol = append(.symbol, symbol)
            return symbol 
            intNum = _int(tokens)
            .int = append(.int, intNum)
            return intNum 
            floatNum = _float(tokens)
            .float = append(.float, floatNum)
            return floatNum 
            token = __body(, tokens)
            . = append(., token)
            return token 
            return _keyword(tokens, ) 
            while (lambda { 
            }.call)
            end 
            return  1 %>
            p = nil 
            p = lambda { 
            }.call unless p
            raise "Unexpected '#{tokens[0]}' while parsing " unless p 
            return p 
            return lambda { 
            }.call
            f = 
            f = lambda { 
            }.call if f
            return f 
        def accept__body(, tokens)
             = .new
            rhs = lambda { 
            }.call
            return nil unless rhs
            return 
        end

        def expect__body(, tokens)
             = .new
            rhs = lambda { 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing " unless rhs
            return 
        end
        
           
        # Extracts the next token from the token queue.
        def get_next(tokens)
            first = tokens.delete_at(0)
            @history.push(first)
            return first
        end
        
        def expect_id(tokens)
            id = accept_id(tokens)
            raise "Unexpected '#{tokens[0]}' (not ID)" unless id
            return id
        end
        
        def accept_id(tokens)
            return nil if tokens.empty?
            return get_next(tokens) if ID.match(tokens[0])
            return nil
        end

        def expect_symbol(tokens)
            symbol = accept_symbol(tokens)
            raise "Unexpected '#{tokens[0]}' (not SYMBOL)" unless symbol
            return symbol
        end

        def accept_symbol(tokens)
            return nil if tokens.empty?
            return get_next(tokens) if /([#{Regexp.quote(SYMBOLS)}])/.match(tokens[0])
            return nil
        end

        def expect_int(tokens)
            return Integer(get_next(tokens))
        end

        def accept_int(tokens)
            begin
                Integer(tokens[0])
            rescue
                return nil
            end
            return expect_int(tokens)
        end

        def expect_float(tokens)
            return Float(get_next(tokens) + get_next(tokens) + get_next(tokens))
        end

        def accept_float(tokens)
            begin
                Float(tokens[0..2])
            rescue
                return nil
            end
            return expect_float(tokens)
        end
        
        # Returns the given keyword if it happens to match the next token,
        # removing that token from the input queue in the process.
        # Otherwise, nil is returned, and the queue remains unchanged.
        def accept_keyword(tokens, keyword)
            return nil if tokens.empty?
            return get_next(tokens) if tokens[0] == keyword
            return nil
        end

        # Ensures that the given keyword happens to match the next token.
        # Returns the token on success.
        def expect_keyword(tokens, keyword)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Unexpected '#{token}' (not '#{keyword}')" unless token == keyword
            return token
        end

        # Ensures that the next token matches an entry in the given array.
        # On success, the respective token is returned.
        def expect_old(tokens, array)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Unknown identifier '#{token}'" unless array.include?(token)
            return token
        end

        # Ensures that the next token does NOT match an entry in the given array.
        # On success, the respective token is returned.
        def expect_new(tokens, array)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Identifier '#{token}' already in use" unless !array.include?(token)
            return token
        end

        # Parses the given string for a positive non-zero integer.
        def get_int(s)
            i = s.to_i
            raise "Failed to parse integer: '#{s}'" unless i > 0
            return i
        end

        # Parses the given string for a positive non-zero float.
        def get_float(s)
            f = s.to_f
            raise "Failed to parse float: '#{s}'" unless f > 0.0
            return f
        end

        # Parses the given string for a positive non-zero rational.
        def get_rational(s)
            r = s.to_r
            raise "Failed to parse rational: '#{s}'" unless r > 0
            return r
        end

        # Returns the plain array of names of an array of named items.
        def get_names(array)
            result = Array.new
            array.each do |item|
                result.push(item.name)
            end
            return result
        end

        # Augments an exception message with a token history.
        def fail_verbosely(msg)
            puts "Token history at point of error:"
            puts "================================"
            pp @history
            puts "================================"
            raise msg
        end
    end
end

For the EBNF parser generator, the ERB templates prove to be advantageous over a puts generator: I can write down static parts in the target format directly and embed variable portions using Ruby expressions. More complex tasks I have combined in embedded functions so that the template is no longer processed linearly and is therefore harder to read. I can already hear the Ruby gurus cry. The template has certainly potential for improvement to be sure.

ERB templates uncomfortable

Within the scope of this blog post I am more interested on whether the resulting generator works at all. This requires that the generator.rb by Daniel to be extended for the execution of an ERB template:

require 'erb'
require 'fileutils'
require 'ostruct'

require_relative 'types'

module Diesel
    module Generators
        class Values
            @name = nil
            @root = nil
        
            attr_accessor :name, :root
        
            def initialize(name, root)
                @name = name
                @root = root
            end
        
            def get_binding()
                return binding
            end
        end
        
        def generate_parser(name, gramar)
            values = Values.new(name.capitalize, gramar)
            indir = "./input"
            outdir = "./output"
            
            raise "Input directory #{indir} does not exist!" unless File.directory?(indir)
            FileUtils.mkdir_p(outdir) unless File.directory?(outdir)
            
            Dir.glob("#{indir}/*.erb") do |infile|
                outfile = "#{outdir}/#{File.basename(infile, '.erb')}"
                puts "Generating #{infile} => #{outfile} ..."
                erb = ERB.new(File.read("#{infile}"))     
                File.open("#{outfile}", "w") { |f|
                    f.puts erb.result(values.get_binding)
                }                
            end
        end
    end
end

Here I had to fiddle a bit until the ERB interpreter accepted the value bindings. The indirection via the Values ​​class seems more like a necessary crutch, but I accept this for the moment.

Diesel-generated EBNF parser

Currently I am more excited about the first execution and a look at the result: The Diesel-generated EBNF parser.

require_relative 'types'

module Diesel
    module Parser
        SYMBOLS = "[{[()]}\"'.:,;#*+~\/\\$%&=|?!-~"
        ID = /\w+/
        
        # Tokenizes and parses a complete input file.
        def parse_file(fn)
            # Tokenize and parse the input file.
            @history = Array.new
            tokens = tokenize_file(fn)
            begin
                root = expect_gramar_body(nil, tokens)
            rescue Exception => e
                fail_verbosely(e)
            end
            # Return the parse result.
            return root
        end

        private

        # Tokenizes an input file defined by its name.
        def tokenize_file(fn)
            tokens = Array.new
            File.readlines(fn).each do |line|
                linetokens = line.gsub(/([#{Regexp.quote(SYMBOLS)}])/, ' \1 ').split(/\s+/)
                linetokens.each do |linetoken|
                    strippedlinetoken = linetoken.strip
                    if !strippedlinetoken.empty? then
                        tokens.push(strippedlinetoken)
                    end
                end
            end
            return tokens
        end
        
        def append(field, value)
            return field unless value
            if field == nil
                return value
            elsif field.kind_of?(Array)
                return field.push(value)
            else
                return [ field, value ]
            end
        end
        
        
        def accept_gramar_body(gramar, tokens)
            gramar = Gramar.new
            rhs = lambda { 
                f = gramar
                f = lambda { 
                    return lambda { 
                        while (lambda { 
                            f = gramar
                            f = lambda { 
                                return lambda { 
                                    token = accept_rule_body(gramar, tokens)
                                    gramar.rule = append(gramar.rule, token)
                                    return token 
                                }.call
                            }.call if f
                            return f 
                        }.call)
                        end 
                        return gramar
                    }.call
                }.call if f
                return f 
            }.call
            return nil unless rhs
            return gramar
        end

        def expect_gramar_body(gramar, tokens)
            gramar = Gramar.new
            rhs = lambda { 
                f = gramar
                f = lambda { 
                    return lambda { 
                        while (lambda { 
                            f = gramar
                            f = lambda { 
                                return lambda { 
                                    token = accept_rule_body(gramar, tokens)
                                    gramar.rule = append(gramar.rule, token)
                                    return token 
                                }.call
                            }.call if f
                            return f 
                        }.call)
                        end 
                        return gramar
                    }.call
                }.call if f
                return f 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing gramar" unless rhs
            return gramar
        end
        
        def accept_rule_body(gramar, tokens)
            rule = Rule.new
            rhs = lambda { 
                f = rule
                f = lambda { 
                    return lambda { 
                        id = accept_id(tokens)
                        rule.id = append(rule.id, id)
                        return id 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, "=") 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        token = expect_rhs_body(gramar, tokens)
                        rule.rhs = append(rule.rhs, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, ";") 
                    }.call
                }.call if f
                return f 
            }.call
            return nil unless rhs
            return rule
        end

        def expect_rule_body(gramar, tokens)
            rule = Rule.new
            rhs = lambda { 
                f = rule
                f = lambda { 
                    return lambda { 
                        id = expect_id(tokens)
                        rule.id = append(rule.id, id)
                        return id 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, "=") 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        token = expect_rhs_body(gramar, tokens)
                        rule.rhs = append(rule.rhs, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, ";") 
                    }.call
                }.call if f
                return f 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing rule" unless rhs
            return rule
        end
        
        def accept_rhs_body(gramar, tokens)
            rhs = Rhs.new
            rhs = lambda { 
                f = rhs
                f = lambda { 
                    return lambda { 
                        token = accept_factor_body(gramar, tokens)
                        rhs.factor = append(rhs.factor, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        while (lambda { 
                            f = rhs
                            f = lambda { 
                                return lambda { 
                                    return accept_keyword(tokens, ",") 
                                }.call
                            }.call if f
                            f = lambda { 
                                return lambda { 
                                    token = expect_factor_body(gramar, tokens)
                                    rhs.factor = append(rhs.factor, token)
                                    return token 
                                }.call
                            }.call if f
                            return f 
                        }.call)
                        end 
                        return rhs
                    }.call
                }.call if f
                return f 
            }.call
            return nil unless rhs
            return rhs
        end

        def expect_rhs_body(gramar, tokens)
            rhs = Rhs.new
            rhs = lambda { 
                f = rhs
                f = lambda { 
                    return lambda { 
                        token = expect_factor_body(gramar, tokens)
                        rhs.factor = append(rhs.factor, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        while (lambda { 
                            f = rhs
                            f = lambda { 
                                return lambda { 
                                    return accept_keyword(tokens, ",") 
                                }.call
                            }.call if f
                            f = lambda { 
                                return lambda { 
                                    token = expect_factor_body(gramar, tokens)
                                    rhs.factor = append(rhs.factor, token)
                                    return token 
                                }.call
                            }.call if f
                            return f 
                        }.call)
                        end 
                        return rhs
                    }.call
                }.call if f
                return f 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing rhs" unless rhs
            return rhs
        end
        
        def accept_factor_body(gramar, tokens)
            factor = Factor.new
            rhs = lambda { 
                f = factor
                f = lambda { 
                    return lambda { 
                        token = accept_primary_body(gramar, tokens)
                        factor.primary = append(factor.primary, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        while (lambda { 
                            f = factor
                            f = lambda { 
                                return lambda { 
                                    return accept_keyword(tokens, "|") 
                                }.call
                            }.call if f
                            f = lambda { 
                                return lambda { 
                                    token = expect_primary_body(gramar, tokens)
                                    factor.primary = append(factor.primary, token)
                                    return token 
                                }.call
                            }.call if f
                            return f 
                        }.call)
                        end 
                        return factor
                    }.call
                }.call if f
                return f 
            }.call
            return nil unless rhs
            return factor
        end

        def expect_factor_body(gramar, tokens)
            factor = Factor.new
            rhs = lambda { 
                f = factor
                f = lambda { 
                    return lambda { 
                        token = expect_primary_body(gramar, tokens)
                        factor.primary = append(factor.primary, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        while (lambda { 
                            f = factor
                            f = lambda { 
                                return lambda { 
                                    return accept_keyword(tokens, "|") 
                                }.call
                            }.call if f
                            f = lambda { 
                                return lambda { 
                                    token = expect_primary_body(gramar, tokens)
                                    factor.primary = append(factor.primary, token)
                                    return token 
                                }.call
                            }.call if f
                            return f 
                        }.call)
                        end 
                        return factor
                    }.call
                }.call if f
                return f 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing factor" unless rhs
            return factor
        end
        
        def accept_primary_body(gramar, tokens)
            primary = Primary.new
            rhs = lambda { 
                f = primary
                f = lambda { 
                    p = nil 
                    p = lambda { 
                        id = accept_id(tokens)
                        primary.id = append(primary.id, id)
                        return id 
                    }.call unless p
                    p = lambda { 
                        token = accept_terminal_body(gramar, tokens)
                        primary.terminal = append(primary.terminal, token)
                        return token 
                    }.call unless p
                    p = lambda { 
                        token = accept_optional_body(gramar, tokens)
                        primary.optional = append(primary.optional, token)
                        return token 
                    }.call unless p
                    p = lambda { 
                        token = accept_repeated_body(gramar, tokens)
                        primary.repeated = append(primary.repeated, token)
                        return token 
                    }.call unless p
                    p = lambda { 
                        token = accept_grouped_body(gramar, tokens)
                        primary.grouped = append(primary.grouped, token)
                        return token 
                    }.call unless p
                    return p 
                }.call if f
                return f 
            }.call
            return nil unless rhs
            return primary
        end

        def expect_primary_body(gramar, tokens)
            primary = Primary.new
            rhs = lambda { 
                f = primary
                f = lambda { 
                    p = nil 
                    p = lambda { 
                        id = accept_id(tokens)
                        primary.id = append(primary.id, id)
                        return id 
                    }.call unless p
                    p = lambda { 
                        token = accept_terminal_body(gramar, tokens)
                        primary.terminal = append(primary.terminal, token)
                        return token 
                    }.call unless p
                    p = lambda { 
                        token = accept_optional_body(gramar, tokens)
                        primary.optional = append(primary.optional, token)
                        return token 
                    }.call unless p
                    p = lambda { 
                        token = accept_repeated_body(gramar, tokens)
                        primary.repeated = append(primary.repeated, token)
                        return token 
                    }.call unless p
                    p = lambda { 
                        token = accept_grouped_body(gramar, tokens)
                        primary.grouped = append(primary.grouped, token)
                        return token 
                    }.call unless p
                    raise "Unexpected '#{tokens[0]}' while parsing primary" unless p 
                    return p 
                }.call if f
                return f 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing primary" unless rhs
            return primary
        end
        
        def accept_terminal_body(gramar, tokens)
            terminal = Terminal.new
            rhs = lambda { 
                f = terminal
                f = lambda { 
                    p = nil 
                    p = lambda { 
                        f = terminal
                        f = lambda { 
                            return lambda { 
                                return accept_keyword(tokens, "'") 
                            }.call
                        }.call if f
                        f = lambda { 
                            p = nil 
                            p = lambda { 
                                id = accept_id(tokens)
                                terminal.id = append(terminal.id, id)
                                return id 
                            }.call unless p
                            p = lambda { 
                                symbol = accept_symbol(tokens)
                                terminal.symbol = append(terminal.symbol, symbol)
                                return symbol 
                            }.call unless p
                            raise "Unexpected '#{tokens[0]}' while parsing terminal" unless p 
                            return p 
                        }.call if f
                        f = lambda { 
                            return lambda { 
                                return expect_keyword(tokens, "'") 
                            }.call
                        }.call if f
                        return f 
                    }.call unless p
                    p = lambda { 
                        f = terminal
                        f = lambda { 
                            return lambda { 
                                return accept_keyword(tokens, "\"") 
                            }.call
                        }.call if f
                        f = lambda { 
                            p = nil 
                            p = lambda { 
                                id = accept_id(tokens)
                                terminal.id = append(terminal.id, id)
                                return id 
                            }.call unless p
                            p = lambda { 
                                symbol = accept_symbol(tokens)
                                terminal.symbol = append(terminal.symbol, symbol)
                                return symbol 
                            }.call unless p
                            raise "Unexpected '#{tokens[0]}' while parsing terminal" unless p 
                            return p 
                        }.call if f
                        f = lambda { 
                            return lambda { 
                                return expect_keyword(tokens, "\"") 
                            }.call
                        }.call if f
                        return f 
                    }.call unless p
                    return p 
                }.call if f
                return f 
            }.call
            return nil unless rhs
            return terminal
        end

        def expect_terminal_body(gramar, tokens)
            terminal = Terminal.new
            rhs = lambda { 
                f = terminal
                f = lambda { 
                    p = nil 
                    p = lambda { 
                        f = terminal
                        f = lambda { 
                            return lambda { 
                                return accept_keyword(tokens, "'") 
                            }.call
                        }.call if f
                        f = lambda { 
                            p = nil 
                            p = lambda { 
                                id = accept_id(tokens)
                                terminal.id = append(terminal.id, id)
                                return id 
                            }.call unless p
                            p = lambda { 
                                symbol = accept_symbol(tokens)
                                terminal.symbol = append(terminal.symbol, symbol)
                                return symbol 
                            }.call unless p
                            raise "Unexpected '#{tokens[0]}' while parsing terminal" unless p 
                            return p 
                        }.call if f
                        f = lambda { 
                            return lambda { 
                                return expect_keyword(tokens, "'") 
                            }.call
                        }.call if f
                        return f 
                    }.call unless p
                    p = lambda { 
                        f = terminal
                        f = lambda { 
                            return lambda { 
                                return accept_keyword(tokens, "\"") 
                            }.call
                        }.call if f
                        f = lambda { 
                            p = nil 
                            p = lambda { 
                                id = accept_id(tokens)
                                terminal.id = append(terminal.id, id)
                                return id 
                            }.call unless p
                            p = lambda { 
                                symbol = accept_symbol(tokens)
                                terminal.symbol = append(terminal.symbol, symbol)
                                return symbol 
                            }.call unless p
                            raise "Unexpected '#{tokens[0]}' while parsing terminal" unless p 
                            return p 
                        }.call if f
                        f = lambda { 
                            return lambda { 
                                return expect_keyword(tokens, "\"") 
                            }.call
                        }.call if f
                        return f 
                    }.call unless p
                    raise "Unexpected '#{tokens[0]}' while parsing terminal" unless p 
                    return p 
                }.call if f
                return f 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing terminal" unless rhs
            return terminal
        end
        
        def accept_optional_body(gramar, tokens)
            optional = Optional.new
            rhs = lambda { 
                f = optional
                f = lambda { 
                    return lambda { 
                        return accept_keyword(tokens, "[") 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        token = expect_rhs_body(gramar, tokens)
                        optional.rhs = append(optional.rhs, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, "]") 
                    }.call
                }.call if f
                return f 
            }.call
            return nil unless rhs
            return optional
        end

        def expect_optional_body(gramar, tokens)
            optional = Optional.new
            rhs = lambda { 
                f = optional
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, "[") 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        token = expect_rhs_body(gramar, tokens)
                        optional.rhs = append(optional.rhs, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, "]") 
                    }.call
                }.call if f
                return f 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing optional" unless rhs
            return optional
        end
        
        def accept_repeated_body(gramar, tokens)
            repeated = Repeated.new
            rhs = lambda { 
                f = repeated
                f = lambda { 
                    return lambda { 
                        return accept_keyword(tokens, "{") 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        token = expect_rhs_body(gramar, tokens)
                        repeated.rhs = append(repeated.rhs, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, "}") 
                    }.call
                }.call if f
                return f 
            }.call
            return nil unless rhs
            return repeated
        end

        def expect_repeated_body(gramar, tokens)
            repeated = Repeated.new
            rhs = lambda { 
                f = repeated
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, "{") 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        token = expect_rhs_body(gramar, tokens)
                        repeated.rhs = append(repeated.rhs, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, "}") 
                    }.call
                }.call if f
                return f 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing repeated" unless rhs
            return repeated
        end
        
        def accept_grouped_body(gramar, tokens)
            grouped = Grouped.new
            rhs = lambda { 
                f = grouped
                f = lambda { 
                    return lambda { 
                        return accept_keyword(tokens, "(") 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        token = expect_rhs_body(gramar, tokens)
                        grouped.rhs = append(grouped.rhs, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, ")") 
                    }.call
                }.call if f
                return f 
            }.call
            return nil unless rhs
            return grouped
        end

        def expect_grouped_body(gramar, tokens)
            grouped = Grouped.new
            rhs = lambda { 
                f = grouped
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, "(") 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        token = expect_rhs_body(gramar, tokens)
                        grouped.rhs = append(grouped.rhs, token)
                        return token 
                    }.call
                }.call if f
                f = lambda { 
                    return lambda { 
                        return expect_keyword(tokens, ")") 
                    }.call
                }.call if f
                return f 
            }.call
            raise "Unexpected token '#{tokens[0]}' while parsing grouped" unless rhs
            return grouped
        end
        
           
        # Extracts the next token from the token queue.
        def get_next(tokens)
            first = tokens.delete_at(0)
            @history.push(first)
            return first
        end
        
        def expect_id(tokens)
            id = accept_id(tokens)
            raise "Unexpected '#{tokens[0]}' (not ID)" unless id
            return id
        end
        
        def accept_id(tokens)
            return nil if tokens.empty?
            return get_next(tokens) if ID.match(tokens[0])
            return nil
        end

        def expect_symbol(tokens)
            symbol = accept_symbol(tokens)
            raise "Unexpected '#{tokens[0]}' (not SYMBOL)" unless symbol
            return symbol
        end

        def accept_symbol(tokens)
            return nil if tokens.empty?
            return get_next(tokens) if /([#{Regexp.quote(SYMBOLS)}])/.match(tokens[0])
            return nil
        end

        def expect_int(tokens)
            return Integer(get_next(tokens))
        end

        def accept_int(tokens)
            begin
                Integer(tokens[0])
            rescue
                return nil
            end
            return expect_int(tokens)
        end

        def expect_float(tokens)
            return Float(get_next(tokens) + get_next(tokens) + get_next(tokens))
        end

        def accept_float(tokens)
            begin
                Float(tokens[0..2])
            rescue
                return nil
            end
            return expect_float(tokens)
        end
        
        # Returns the given keyword if it happens to match the next token,
        # removing that token from the input queue in the process.
        # Otherwise, nil is returned, and the queue remains unchanged.
        def accept_keyword(tokens, keyword)
            return nil if tokens.empty?
            return get_next(tokens) if tokens[0] == keyword
            return nil
        end

        # Ensures that the given keyword happens to match the next token.
        # Returns the token on success.
        def expect_keyword(tokens, keyword)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Unexpected '#{token}' (not '#{keyword}')" unless token == keyword
            return token
        end

        # Ensures that the next token matches an entry in the given array.
        # On success, the respective token is returned.
        def expect_old(tokens, array)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Unknown identifier '#{token}'" unless array.include?(token)
            return token
        end

        # Ensures that the next token does NOT match an entry in the given array.
        # On success, the respective token is returned.
        def expect_new(tokens, array)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Identifier '#{token}' already in use" unless !array.include?(token)
            return token
        end

        # Parses the given string for a positive non-zero integer.
        def get_int(s)
            i = s.to_i
            raise "Failed to parse integer: '#{s}'" unless i > 0
            return i
        end

        # Parses the given string for a positive non-zero float.
        def get_float(s)
            f = s.to_f
            raise "Failed to parse float: '#{s}'" unless f > 0.0
            return f
        end

        # Parses the given string for a positive non-zero rational.
        def get_rational(s)
            r = s.to_r
            raise "Failed to parse rational: '#{s}'" unless r > 0
            return r
        end

        # Returns the plain array of names of an array of named items.
        def get_names(array)
            result = Array.new
            array.each do |item|
                result.push(item.name)
            end
            return result
        end

        # Augments an exception message with a token history.
        def fail_verbosely(msg)
            puts "Token history at point of error:"
            puts "================================"
            pp @history
            puts "================================"
            raise msg
        end
    end
end

It’s not really elegant but let’s try if it works. The real test follows with the second execution: The Diesel^2-generated EBNF parser. I do not repeat the code because it looks identical to the first result as expected. Goal achieved.

Back to the Melody

To close the circle I have created the following EBNF syntax for Daniels Melody DSL:

gramar = { note }, { melody };
note = "note", ID, FLOAT;
melody = "melody", ID, time, "at", bpm, "{", { bar } ,"}";
time = INT, "/", INT;
bpm = INT, "bpm";
bar = "bar", "{", { appliedNote },  "}";
appliedNote = time, ID;

The previously generated Diesel parser generates a correct Melody parser. This in turn applied to Daniels original Melody example, along with two new ERB templates for the generated C code, gives a wonderful WAV generator for Jingle Bells.

note C 261.63
note D 293.66
note E 329.63
note F 349.23
note G 392.00
note A 440.00
note B 493.38

melody JingleBells 4/4 at 125 bpm {
    bar { 1/4 E   1/4 E   1/2 E }
    bar { 1/4 E   1/4 E   1/2 E }
    bar { 1/4 E   1/4 G   3/8 C   1/8 D }
    bar { 1/1 E }
    bar { 1/4 F   1/4 F   3/8 F   1/8 F }
    bar { 1/4 F   1/4 E   1/4 E   1/8 E   1/8 E }
    bar { 1/4 E   1/4 D   1/4 D   1/4 E }
    bar { 1/2 D           1/2 G }
    bar { 1/4 E   1/4 E   1/2 E }
    bar { 1/4 E   1/4 E   1/2 E }
    bar { 1/4 E   1/4 G   3/8 C   1/8 D }
    bar { 1/1 E }
    bar { 1/4 F   1/4 F   1/4 F   1/4 F }
    bar { 1/4 F   1/4 E   1/4 E   1/8 E   1/8 E }
    bar { 1/4 G   1/4 G   1/4 F   1/4 D }
    bar { 1/1 C }
}

The ERB templates show to have a certain practicality for code generation.

#ifndef MELODIES_H
#define MELODIES_H

typedef struct tag_TimeFreq_t {
    double time_s;
    double freq_hz;
} TimeFreq_t;

extern const TimeFreq_t [];

#endif /* MELODIES_H */
#include "melodies.h"

const TimeFreq_t [] = {

    /* Bar */ 
    { f, f }, /*  */ 
}; 

Diesel or Diesel^2

It is exciting to see that the Diesel approach is well suited to implement a DSL in a very fast and lightweight fashion. For more complex applications the manual way of implementing the parser (a lot of copy-paste) is rather tough. However, a Diesel parser generator (Diesel^2) is also straightforward to implement with the Diesel approach. That way I can document my language defined in an EBNF notation and get a working parser generated. Changes and enhancements to the language are carried out as quickly.

But of course I lose another piece of control by generating the parser. With the current Diesel^2 approach the parser cannot incorporate semantic aspects of the DSL. The error messages of the parser are considerably less specific and potentially complicate troubleshooting. These freedoms I can exploit only when I write the parse by hand for now.

Diesel and Diesel^2

I think it is worthwhile to invest some work in the mentioned disadvantages. At three places one could start:

  1. The rudimentary EBNF syntax could be extended to allow more influence on the generator result.
  2. The parser generator could be extended to include approaches to selectively adjust the generator when needed.
  3. The parser template could also be extended to include approaches to complement the generated parser with manual code.

This does not affect the other benefits already stated by Daniel:

  • Generators are easy to create with Ruby.
  • An integration with build systems is smoothly possible.
  • The necessary Ruby code is well manageable.

I look forward to the first project using Diesel and the everyday experience.

Comments (0)

×

Sign up for our Updates

Sign up now for our updates.

This field is required
This field is required
This field is required

I'm interested in:

Select at least one category
You were signed up successfully.

Receive regular updates from our blog

Subscribe

Or would you like to discuss a potential project with us? Contact us »