en
de

Diesel, Part 4: From Reference Implementation to Framework

17 June 2015
| |
Reading time: 6 minutes

In this post we outline the steps taken to evolve the Diesel approach by extracting re-usable parts and establishing coding style conventions. Our aim is to leverage the development speed advantage Diesel provides and ensure long term maintainability of the Diesel code across different parser implementations.

Background

tl:dr; Diesel is a ligthweight approach to building a DSL-based code generator.

Diesel, Part 1: A Plea for a Lightweight Approach to DSLs outlines the rational behind the approach and the concept.

Diesel, Part 2: A Hands-On Example for a Lightweight Approach to DSLs presents a very simple DSL to describe melodies and proceeds to create the parser and generator for it.

Diesel, Part 3: Bootstrapping Diesel is what you get when you communicate ideas into a room full of smart people: they go meta on you!

Evolution

In the last year we @Zühlke have used the Diesel approach on several projects with very good results. After a few iterations the major drawback of Diesel seems to be it’s major advantage: DSL creation is so easy that we end up with several independently developed parsers all with their own individual style, all slightly different even though they follow the same principles.

And while speed is essential in the beginning, maintainability is critical in the long run. So when in a project we ended up with three different versions of a Diesel implementation it was time to break out the software engineering toolbox.

KISS & YAGNI meet DRY

Diesel exists on the principles of KISS and YAGNI. Our love of acronyms demands we also apply DRY in an attempt to solve the problem of many deviating parser/generator implementations.

Analyzing Diesel

It turns out that although a recursive descent parser is a very specialized piece of software tightly integrated with the DSL syntax, there are some common parts and patterns that lend themselves to re-use.

The largest of these is the tokenizer. In all previous examples tokenization is built into the parser. After several Diesel implementations (at the time of writing there were 7 different implementations, all in Ruby) it was easy to see that the tokenization logic never really changed.

In one particular case a feature was added that differentiates between comments in the DSL and comments to be used as inline documentation in the generated code. The relevant code is

# Comment Type 1: Completely ignore all text between '##' and EOL
input = line.gsub(/##.*$/, "").strip
# Comment Type 2: Interpret all text between '%%' and EOL as docs
parts = input.split("%%")
# Comment Type 2: Interpret all text between '%%' and EOL as docs
parts = input.split("%%")
# Encode comments of type 2 as a pair of special tokens.
for i in 1..(parts.length-1) do
  tokens.push("!DOC!") tokens.push(parts[i].strip)
end

The consolidated version of the tokenizer was extracted as a Ruby module and is now shared between parser implementations.
For brevity purposes we won’t post the complete tokenizer code. It can be found in this gist.

Using this tokenizer the melodies example parser becomes:

require_relative 'types'
require_relative 'tokenizer'

module Diesel
  module Parser
    include Diesel::Tokenizer

    def parse source, rootSpec
      case source
      when Array
        rootSpec=parse_content(source, rootSpec)
      when String
        rootSpec=parse_content(source.lines, rootSpec)
      else
        raise Diesel::Error, "The tokenizer can only handle strings or arrays of strings"
      end
      return rootSpec
    end

    private
    def parse_content source, rootSpec
      # If not yet initialized, create the root element that the parser result will be stored in it.
      rootSpec ||= Root.new(Array.new, Array.new)
      # Tokenize the input, then parse it.
      @history = Array.new
      tokens = tokenize(source)
      parse_tokens(rootSpec, tokens)
      # Return the parser result.
      return rootSpec
    end
    # Parses all the tokens in the input stream.
    def parse_tokens(root, tokens)
      while !tokens.empty?
        first = get_next(tokens)
        case first
        when "note"
          parse_note_body(root, tokens)
        when "melody"
          parse_melody_body(root, tokens)
        else
          raise Diesel::Error, "Unexpected token '#{first}'"
        end
      end
    end
    # Parses the entire body of a "note" statement.
    def parse_note_body(root, tokens)
      name = expect_new(get_names(root.notes), tokens)
      freq = get_float(get_next(tokens))
      root.notes.push(Note.new(name, freq))
    end
    # Parses the body of a "melody" statement.
    def parse_melody_body(root, tokens)
      name = expect_new(get_names(root.melodies), tokens)
      time = get_rational(get_next(tokens))
      expect_keyword("at", tokens)
      bpm = get_int(get_next(tokens))
      expect_keyword("bpm", tokens)
      expect_keyword("{", tokens)
      bars = Array.new
      while !accept_keyword("}", tokens)
        expect_keyword("bar")
        parse_bar_body(root, bars, tokens)
      end
      root.melodies.push(Melody.new(name, time, bpm, bars))
    end
    # Parses the body of a "bar" statement.
    def parse_bar_body(root, bars, tokens)
      expect_keyword("{",tokens)
      notes = Array.new
      while !accept_keyword("}",tokens)
        parse_appliednote(root, notes, tokens)
      end
      bars.push(Bar.new(notes))
    end
    # Parses an entire applied note (inside a bar).
    def parse_appliednote(root, tokens, notes)
      duration = get_rational(get_next(tokens))
      note = expect_old(get_names(root.notes),tokens)
      notes.push(AppliedNote.new(duration, note))
    end
    # Parses the given string for a positive non-zero integer.
    def get_int(s)
      i = s.to_i
      raise Diesel::Error, "Failed to parse integer: '#{s}'" unless i > 0
      return i
    end
    # Parses the given string for a positive non-zero float.
    def get_float(s)
      f = s.to_f
      raise Diesel::Error, "Failed to parse float: '#{s}'" unless f > 0.0
      return f
    end
    # Parses the given string for a positive non-zero rational.
    def get_rational(s)
      r = s.to_r
      raise Diesel::Error, "Failed to parse rational: '#{s}'" unless r > 0
      return r
    end
    # Returns the plain array of names of an array of named items.
    def get_names(array)
      result = Array.new
      array.each do |item|
        result.push(item.name)
      end
      return result
    end
  end
end

Code Generation

One drawback of code generation is that out of the box it recreates all files. Given that generally generated code sits at the bottom of the stack providing boilerplate and framework functionality this means that code generation has a negative effect on build times as it forces the build system to rebuild even if the actual content of the file has not changed.

It would be nice to have the generators build incrementally. Fortunately such functionality in Ruby is part of the standard library. To use it we require one addition to the Root structure:

# The root element of our DSL contains a set of notes and a set of melodies.
 # notes: An array of Note items
 # melodies: An array of Melody items
 # source: the source file
 Root = Struct.new(:notes, :melodies,:source)

We then use the ‘rake’ standard library to implement incremental generation:

require 'rake'
module Diesel
  module Generator
    #returns the name of the generated file
    def filename rootSpec
      "foo"
    end

    def generate rootSpec
      #Create a rake file task
      #that executes if the file is missing or is older than the list of dependencies
      file filename(rootSpec)  => [rootSpec.source] do |t|
        #assume we have a template and everything is fine for the sake of the example
        template_content=File.read("template_file")
        template=ERB.new(template_content)
        generated_content=template.result(binding)
        #write in the files here
        mkdir_p(File.dirname(t.name))
        File.write(t.name,"wb"){|f| f.write(generated_content)}
      end
      #invoke the file task
      Rake::Task[filename(rootSpec)].invoke
    end
  end
end
[/code]

This is also the standard structure we use for generator code: The filename method encapsulates the naming conventions for the generated files and the generate method is the entry point for all calling code, everything nicely contained in a module definition.

While nothing prevents us from generating multiple files in one generator (like in the original example the .h and .c files) the use of rake for incremental generation is made easier when we have one generator per file type.

Writing Ruby in Ruby

There are a couple of differences on the programming language level between the two example versions.

In the current example the parser does not operate on filenames, it expects a String or an Array and it is the responsibility of the calling code to provide the content. This makes testing significantly simpler and eliminates I/O dependencies that would increase testing run times.

Method arguments are consistently aligned from specific to generic, left to right, few to many (i.e. the melody instance, the collection of notes, the collection of tokens). This is just a style convention but helps identify argument errors faster.

Templates

The biggest difference between the original version and our current efforts is generation based on templates. Ruby comes with the ERB template engine in it’s standard library.
The header generation code using puts emission is:

File.open(File.join(dir, "melodies.h"), "w") { |f| f.puts("#ifndef MELODIES_H")

f.puts("#define MELODIES_H")

f.puts

f.puts("typedef struct tag_TimeFreq_t {")

f.puts(" double time_s;")

f.puts(" double freq_hz;")

f.puts("} TimeFreq_t;")

f.puts

root.melodies.each do |melody|

n = count_melody_elements(melody)

f.puts("extern const TimeFreq_t #{melody.name}[#{n}];")

f.puts end f.puts("#endif /* MELODIES_H */") }

Using the file task structure, the template to produce the same content as the puts code will be:

#ifndef MELODIES_H
#define MELODIES_H
typedef struct tag_TimeFreq_t {
  double time_s;
  double freq_hz;
} TimeFreq_t;
<%= root.melodies.each do |melody| %>
  extern const TimeFreq_t <%=melody.name%>[<%=count_melody_elements(melody)%>];
<%end%>
#endif /* MELODIES_H */

The template approach has several advantages. It keeps the code generator simple. Any boiler plate changes (og which there are a lot in C or C++) can be done in the template file without having to shift through Ruby code to get at the C bits. With templates we can keep evolving the generated content much faster without little need to update the Diesel code.

Conclusion

With a few style conventions and judicious use of the standard Ruby libraries the Diesel approach not only allows us to create code generators in a short amount of time but also gives us consistent and easily maintainable code that is easily integrated in build processes.

It’s magic!

Comments (0)

×

Sign up for our Updates

Sign up now for our updates.

This field is required
This field is required
This field is required

I'm interested in:

Select at least one category
You were signed up successfully.

Receive regular updates from our blog

Subscribe

Or would you like to discuss a potential project with us? Contact us »