en
de

Diesel, Part 2: A Hands-On Example for a Lightweight Approach to DSLs

19 December 2014
| |
Reading time: 11 minutes

This is the second and last part of our little series of blog posts on the design of lightweight DSL toolkits. Please make sure that you have read the first part, namely the Diesel rationale, before proceeding further. It explains all the underlying ideas and provides the context for the following hands-on.

Introduction

As outlined in the rationale, Diesel aims at being a straightforward approach to implementing lightweight DSL toolkits, and each instantiation of it requires the following steps:

  1. Define the core elements of the DSL using Ruby structs.
  2. Extend the default tokenizer by any additional features you need.
  3. Write down the parse functions that reflect the production rules of the input language.
  4. If desired, add validators.
  5. Implement one generator for each type of output artifact.

In this posting, we will exemplify this construction process using a simple DSL for the description of melodies.

The Diesel approach is not about writing down explicit grammars in clumsy notations. Of course we could do that in order to have production rules from which to derive our hierarchy of parse functions, but there simply is no need for that: The simple DSLs we are focusing on allow for a straightforward implementation of the parser. As you will see, the hierarchy of parse functions will be blatantly obvious by the time we start coding the parser. Hence, let’s dive into the process directly by writing down some example input:

[[ File diesel-blog-example/input/example.dsl ]]

note C 261.63
note D 293.66
note E 329.63
note F 349.23
note G 392.00
note A 440.00
note B 493.38

melody JingleBells 4/4 at 125 bpm {
    bar { 1/4 E   1/4 E   1/2 E }
    bar { 1/4 E   1/4 E   1/2 E }
    bar { 1/4 E   1/4 G   3/8 C   1/8 D }
    bar { 1/1 E }
    bar { 1/4 F   1/4 F   3/8 F   1/8 F }
    bar { 1/4 F   1/4 E   1/4 E   1/8 E   1/8 E }
    bar { 1/4 E   1/4 D   1/4 D   1/4 E }
    bar { 1/2 D           1/2 G }
    bar { 1/4 E   1/4 E   1/2 E }
    bar { 1/4 E   1/4 E   1/2 E }
    bar { 1/4 E   1/4 G   3/8 C   1/8 D }
    bar { 1/1 E }
    bar { 1/4 F   1/4 F   1/4 F   1/4 F }
    bar { 1/4 F   1/4 E   1/4 E   1/8 E   1/8 E }
    bar { 1/4 G   1/4 G   1/4 F   1/4 D }
    bar { 1/1 C }
}

I’ll assume that this speaks for itself: we have a textual specification for a melody (in four-four time), which is comprised of bars, where each bar contains some duration/note pairs (how long to play what tone?).

Step 1: Define the Core Elements

What do we want to achieve with our DSL? First we want to specify some musical notes, and then we want to specify melodies based on those notes. Therefore, the root element — the result of any successful run of the parser — is just a pair of sets. But what is a note? Well, as far as we are concerned for now, a note is just a name (such as “C” or “D”, you get the idea) that stands for a certain frequency. Proceeding in this fashion, we derive a tree of Ruby structs as follows:

[[ File diesel-blog-example/diesel/types.rb ]]

module Diesel
    # The root element of our DSL contains a set of notes and a set of melodies.
    # notes: An array of Note items
    # melodies: An array of Melody items
    Root = Struct.new(:notes, :melodies)

    # A note is comprised of a name and a frequency.
    # name: A string
    # freq: A float (frequency in Hz)
    Note = Struct.new(:name, :freq)

    # A melody has a name, a time (e.g., 3/4), a bpm value, and a set of bars.
    # name: A string
    # time: A fraction
    # bpm: An integer
    # bars: An array of Bar items
    Melody = Struct.new(:name, :time, :bpm, :bars)

    # A bar consists of a set of applied notes.
    # The notes of a bar must match the time defined for the melody.
    # notes: An array of AppliedNote items
    Bar = Struct.new(:notes)

    # An applied note is comprised of a duration and a note name.
    # duration: A fraction
    # note: A string that identifies a note (cf. Note.name)
    AppliedNote = Struct.new(:duration, :note)
end

Step 2: Tune the Tokenizer

Our input language uses whitespace for separating tokens (like many DSLs do). Therefore, we could simply split the incoming lines of text about any whitespace in order to extract the tokens. In Ruby, that would be very easy to achieve, for instance by using the split function of the String class. But there is one tiny convenience feature we want to include as well: We want curly brackets to be interpreted as individual tokens even if they are not surrounded by whitespace. A straightforward way of doing this in Ruby is via the gsub function of the String class, adding spaces around curly brackets before splitting the line (see the tokenize_file() function below for the actual code). And that is about all we need for tokenization.

Step 3: Write Down the Parse Functions

The entry point of our parser is a function that takes the name of the input file as an argument and returns the parser result, which of course is of the root element type discussed in step 1 above. Internally, it tokenizes the input and feeds the resulting token queue into the actual parser. As generally desired in Diesel, there is exactly one token-based parse function for each type defined in step 1:

Core Element Parse Function
Root parse_tokens()
Note parse_note_body()
Melody parse_melody_body()
Bar parse_bar_body()
AppliedNote parse_appliednote()

By convention, we add the suffix body when the parse function is designed to read only the body of a statement, that is, only the part following the keyword that introduced the statement.

The parser functions rely on the helper functions get_next(), which just returns and dequeues the next token from the input queue, as well as accept_keyword() and expect_*(), which are explained in what follows. The accept_keyword() function can be used to express conditional parse steps of the following form: “If the next token happens to match the given keyword, then dequeue the token and continue in a certain way; otherwise, leave the queue untouched and proceed in another way.” The expect_*() functions demand that the next token meet a certain condition (e.g., matches a keyword, matches a string from a given array, does not match a string from a given array). They are convenient whenever we need to check for new or old identifiers in dynamically growing namespaces.

The important thing to note is that this very limited set of generic helper functions suffice to implement all the basic aspects of a parser (at least for the LL(1) variety of DSLs we have in mind):

  • Sometimes, we need to peek at the next token in order to understand what the next statement is.
  • At other times, we know exactly what is supposed to come next (e.g., a keyword or a string-based name), and simply want to verify this.
  • Moreover, we almost always need to deal with identifiers. They allow us to give unambiguous names to elements defined in the language, and later on to refer to those elements using their names.

[[ File diesel-blog-example/diesel/parser.rb ]]

require_relative 'types'

module Diesel
    module Parser
        # Tokenizes and parses a complete input file.
        def parse_file(fn)
            # Initialize the root element (to store the parse result in).
            root = Root.new(Array.new, Array.new)
            # Tokenize and parse the input file.
            @history = Array.new
            tokens = tokenize_file(fn)
            parse_tokens(root, tokens)
            # Return the parse result.
            return root
        end

        private

        # Tokenizes an input file defined by its name.
        def tokenize_file(fn)
            tokens = Array.new
            File.readlines(fn).each do |line|
                linetokens = line.gsub(/([{}])/, ' \1 ').split(/\s+/)
                linetokens.each do |linetoken|
                    strippedlinetoken = linetoken.strip
                    if !strippedlinetoken.empty? then
                        tokens.push(strippedlinetoken)
                    end
                end
            end
            return tokens
        end

        # Parses all the tokens in the input stream.
        def parse_tokens(root, tokens)
            while !tokens.empty?
                first = get_next(tokens)
                case first
                when "note"
                    parse_note_body(root, tokens)
                when "melody"
                    parse_melody_body(root, tokens)
                else
                    raise "Unexpected token '#{first}'"
                end
            end
        end

        # Parses the entire body of a "note" statement.
        def parse_note_body(root, tokens)
            name = expect_new(tokens, get_names(root.notes))
            freq = get_float(get_next(tokens))
            root.notes.push(Note.new(name, freq))
        end

        # Parses the body of a "melody" statement.
        def parse_melody_body(root, tokens)
            name = expect_new(tokens, get_names(root.melodies))
            time = get_rational(get_next(tokens))
            expect_keyword(tokens, "at")
            bpm = get_int(get_next(tokens))
            expect_keyword(tokens, "bpm")
            expect_keyword(tokens, "{")
            bars = Array.new
            while !accept_keyword(tokens, "}")
                expect_keyword(tokens, "bar")
                parse_bar_body(root, tokens, bars)
            end
            root.melodies.push(Melody.new(name, time, bpm, bars))
        end

        # Parses the body of a "bar" statement.
        def parse_bar_body(root, tokens, bars)
            expect_keyword(tokens, "{")
            notes = Array.new
            while !accept_keyword(tokens, "}")
                parse_appliednote(root, tokens, notes)
            end
            bars.push(Bar.new(notes))
        end

        # Parses an entire applied note (inside a bar).
        def parse_appliednote(root, tokens, notes)
            duration = get_rational(get_next(tokens))
            note = expect_old(tokens, get_names(root.notes))
            notes.push(AppliedNote.new(duration, note))
        end

        # Extracts the next token from the token queue.
        def get_next(tokens)
            first = tokens.delete_at(0)
            @history.push(first)
            return first
        end

        # Returns the given keyword if it happens to match the next token,
        # removing that token from the input queue in the process.
        # Otherwise, nil is returned, and the queue remains unchanged.
        def accept_keyword(tokens, keyword)
            if tokens.empty? then return nil end
            if tokens[0] == keyword then return get_next(tokens) end
            return nil
        end

        # Ensures that the given keyword happens to match the next token.
        # Returns the token on success.
        def expect_keyword(tokens, keyword)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Unexpected '#{token}' (not '#{keyword}')" unless token == keyword
            return token
        end

        # Ensures that the next token matches an entry in the given array.
        # On success, the respective token is returned.
        def expect_old(tokens, array)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Unknown identifier '#{token}'" unless array.include?(token)
            return token
        end

        # Ensures that the next token does NOT match an entry in the given array.
        # On success, the respective token is returned.
        def expect_new(tokens, array)
            raise "Unexpected EOF" unless !tokens.empty?
            token = get_next(tokens)
            raise "Identifier '#{token}' already in use" unless !array.include?(token)
            return token
        end

        # Parses the given string for a positive non-zero integer.
        def get_int(s)
            i = s.to_i
            raise "Failed to parse integer: '#{s}'" unless i > 0
            return i
        end

        # Parses the given string for a positive non-zero float.
        def get_float(s)
            f = s.to_f
            raise "Failed to parse float: '#{s}'" unless f > 0.0
            return f
        end

        # Parses the given string for a positive non-zero rational.
        def get_rational(s)
            r = s.to_r
            raise "Failed to parse rational: '#{s}'" unless r > 0
            return r
        end

        # Returns the plain array of names of an array of named items.
        def get_names(array)
            result = Array.new
            array.each do |item|
                result.push(item.name)
            end
            return result
        end

        # Augments an exception message with a token history.
        def fail_verbosely(msg)
            puts "Token history at point of error:"
            puts "================================"
            puts @history
            puts "================================"
            raise msg
        end
    end
end

Step 4: Add Validators

Let’s implement a validator that verifies the correct timing in each bar of a melody. If a melody is defined to be in three-four time, for example, then we should make sure that the durations of the notes in each of its bars add up to three quarters. Without further ado:

[[ File diesel-blog-example/diesel/validation.rb ]]

require_relative 'types'

module Diesel
    module Validator
        # Validates a complete parser result.
        def validate(root)
            root.melodies.each do |melody|
                validate_melody(melody)
            end
        end

        private

        # Validates a single melody.
        def validate_melody(melody)
            melody.bars.each do |bar|
                validate_bar(melody.time, bar)
            end
        end

        # Validates a single bar.
        def validate_bar(time, bar)
            bar_time = 0
            bar.notes.each do |note|
                bar_time = bar_time + note.duration
            end
            raise "Bar #{bar} does not adhere to melody time" unless bar_time == time
        end
    end
end

Step 5: Implement the Generators

In the Diesel approach, implementing generators is as easy as writing validators: You have the full power of Ruby on your hands, and you can access the entire parse result throughout the entire transformation chain. There is no new language to learn and no additional infrastructure to master. If things get hard, this should be because the input DSL or the desired output is very complex, not because the DSL toolkit itself makes it so.

But enough of that generic talk. What about generating some C code from our melody DSL? For example, we might want to store the melody in a way that is easy to process by a C program:

[[ File diesel-blog-example/diesel/gen_c.rb ]]

require_relative 'types'
require 'fileutils'

module Diesel
    module Generators
        # Generates the C output for a given parser result.
        def generate_c(root)
            dir = "../output"
            FileUtils.mkdir_p(dir, :verbose=>false)
            generate_header(dir, root)
            generate_source(dir, root)
        end

        private

        # Generates a C header with common typedefs and melody declarations.
        def generate_header(dir, root)
            File.open(File.join(dir, "melodies.h"), "w") { |f|
                f.puts("#ifndef MELODIES_H")
                f.puts("#define MELODIES_H")
                f.puts
                f.puts("typedef struct tag_TimeFreq_t {")
                f.puts("    double time_s;")
                f.puts("    double freq_hz;")
                f.puts("} TimeFreq_t;")
                f.puts
                root.melodies.each do |melody|
                    n = count_melody_elements(melody)
                    f.puts("extern const TimeFreq_t #{melody.name}[#{n}];")
                    f.puts
                end
                f.puts("#endif /* MELODIES_H */")
            }
        end

        # Generates a C source that contains the actual melody data.
        def generate_source(dir, root)
            File.open(File.join(dir, "melodies.c"), "w") { |f|
                f.puts("#include \"melodies.h\"")
                f.puts
                root.melodies.each do |melody|
                    s_per_bar = 4.0 * (60.0 / melody.bpm)
                    n = count_melody_elements(melody)
                    f.puts("const TimeFreq_t #{melody.name}[#{n}] = {")
                    melody.bars.each do |bar|
                        f.puts("    /* Bar */")
                        bar.notes.each do |note|
                            duration = (note.duration * s_per_bar).to_f
                            freq = get_freq_by_note_name(root.notes, note.note)
                            f.puts("    { #{duration}, #{freq} }, /* #{note.note} */")
                        end
                    end
                    f.puts("};")
                    f.puts
                end
            }
        end

        # Counts the elements (number of applied notes) in a melody.
        def count_melody_elements(melody)
            n = 0
            melody.bars.each do |bar|
                bar.notes.each do |note|
                    n = n + 1
                end
            end
            return n
        end

        # Looks up the frequency of a note based on its name.
        def get_freq_by_note_name(notes, name)
            notes.each do |note|
                if note.name == name then return note.freq end
            end
            raise "Unknown note '#{name}'"
        end
    end
end

Putting Things Together

Now let us combine the parser, the validator and the generator into a meaningful program that reflects the complete transformation chain:

[[ File diesel-blog-example/diesel/main.rb ]]

require_relative "parser"
require_relative "validation"
require_relative "gen_c"

include Diesel::Parser
include Diesel::Validator
include Diesel::Generators

# This is the complete transformation chain.
# The tokenizer is called by parse_file() internally.
root = parse_file("../input/example.dsl")
validate(root)
generate_c(root)

The Example Output

This is the resulting C header file, melodies.h:

[[ File diesel-blog-example/output/melodies.h ]]

#ifndef MELODIES_H
#define MELODIES_H

typedef struct tag_TimeFreq_t {
    double time_s;
    double freq_hz;
} TimeFreq_t;

extern const TimeFreq_t JingleBells[51];

#endif /* MELODIES_H */

And this is the resulting C source file, melodies.c:

[[ File diesel-blog-example/output/melodies.c ]]

#include "melodies.h"

const TimeFreq_t JingleBells[51] = {
    /* Bar */
    { 0.48, 329.63 }, /* E */
    { 0.48, 329.63 }, /* E */
    { 0.96, 329.63 }, /* E */
    /* Bar */
    { 0.48, 329.63 }, /* E */
    { 0.48, 329.63 }, /* E */
    { 0.96, 329.63 }, /* E */
    /* Bar */
    { 0.48, 329.63 }, /* E */
    { 0.48, 392.0 }, /* G */
    { 0.72, 261.63 }, /* C */
    { 0.24, 293.66 }, /* D */
    /* Bar */
    { 1.92, 329.63 }, /* E */
    /* Bar */
    { 0.48, 349.23 }, /* F */
    { 0.48, 349.23 }, /* F */
    { 0.72, 349.23 }, /* F */
    { 0.24, 349.23 }, /* F */
    /* Bar */
    { 0.48, 349.23 }, /* F */
    { 0.48, 329.63 }, /* E */
    { 0.48, 329.63 }, /* E */
    { 0.24, 329.63 }, /* E */
    { 0.24, 329.63 }, /* E */
    /* Bar */
    { 0.48, 329.63 }, /* E */
    { 0.48, 293.66 }, /* D */
    { 0.48, 293.66 }, /* D */
    { 0.48, 329.63 }, /* E */
    /* Bar */
    { 0.96, 293.66 }, /* D */
    { 0.96, 392.0 }, /* G */
    /* Bar */
    { 0.48, 329.63 }, /* E */
    { 0.48, 329.63 }, /* E */
    { 0.96, 329.63 }, /* E */
    /* Bar */
    { 0.48, 329.63 }, /* E */
    { 0.48, 329.63 }, /* E */
    { 0.96, 329.63 }, /* E */
    /* Bar */
    { 0.48, 329.63 }, /* E */
    { 0.48, 392.0 }, /* G */
    { 0.72, 261.63 }, /* C */
    { 0.24, 293.66 }, /* D */
    /* Bar */
    { 1.92, 329.63 }, /* E */
    /* Bar */
    { 0.48, 349.23 }, /* F */
    { 0.48, 349.23 }, /* F */
    { 0.48, 349.23 }, /* F */
    { 0.48, 349.23 }, /* F */
    /* Bar */
    { 0.48, 349.23 }, /* F */
    { 0.48, 329.63 }, /* E */
    { 0.48, 329.63 }, /* E */
    { 0.24, 329.63 }, /* E */
    { 0.24, 329.63 }, /* E */
    /* Bar */
    { 0.48, 392.0 }, /* G */
    { 0.48, 392.0 }, /* G */
    { 0.48, 349.23 }, /* F */
    { 0.48, 293.66 }, /* D */
    /* Bar */
    { 1.92, 261.63 }, /* C */
};

Concluding Remarks

For the sake of demonstration, we include a tiny C program that shows how the code generated above can be used:

[[ File diesel-blog-example/cdemo/cdemo.c ]]

#include <math.h>
#include <stdint.h>
#include <stdio.h>

#include "../output/melodies.h"

static const double TwoPi = 2.0 * 3.141592654;

/* Default WAV header (8 Bit, mono, 11025 Hz) without size information.
 * The size fields need to be fixed when the file size is known:
 * The four bytes starting at offset 4 must equal the file size minus 8.
 * The four bytes starting at offset 40 must equal the file size minus 44. */
static const uint8_t EmptyHeader[] = {
    0x52u, 0x49u, 0x46u, 0x46u, /* Chunk ID:       "RIFF" */
    0x00u, 0x00u, 0x00u, 0x00u, /* Chunk Size:     0 for now */
    0x57u, 0x41u, 0x56u, 0x45u, /* Format:         "WAVE" */
    0x66u, 0x6Du, 0x74u, 0x20u, /* Subchunk1 ID:   "fmt " */
    0x10u, 0x00u, 0x00u, 0x00u, /* Subchunk1 Size: 16 for PCM */
    0x01u, 0x00u,               /* Audio Format:   1 for PCM */
    0x01u, 0x00u,               /* #Channels:      1 (mono) */
    0x11u, 0x2Bu, 0x00u, 0x00u, /* Sample Rate:    11025 */
    0x11u, 0x2Bu, 0x00u, 0x00u, /* Byte Rate:      11025*1*1 */
    0x01u, 0x00u,               /* Block Alignm.:  1 Byte/sample */
    0x08u, 0x00u,               /* Bits/Sample:    8 */
    0x64u, 0x61u, 0x74u, 0x61u, /* Subchunk2 ID:   "data" */
    0x00u, 0x00u, 0x00u, 0x00u  /* Subchunk2 size: 0 for now */
};

/* Append the above default WAV header to a file. */
static void add_wave_header(FILE *f)
{
    fwrite(EmptyHeader, 44, 1, f);
}

/* Append some WAV data to a file. */
static void add_wave_data(FILE *f, TimeFreq_t chunk)
{
    int i;
    const int nBytes = (int)(chunk.time_s * 11025);
    for (i = 0; i < nBytes; i++)
    {
        const double progress = ((double)i)/nBytes;
        const double amplitude = (progress < 0.9) ? 64.0 : ((1.0-progress)*640.0);
        const uint8_t b = (amplitude * sin((chunk.freq_hz*TwoPi*i)/11025.0)) + 128u;
        fwrite(&b, 1, 1, f);
    }
}

/* Helper function for writing uint32 values in little-endian format. */
static void write_uint32_le(FILE *f, uint32_t n)
{
    uint8_t repr[4];
    repr[0] = n & 0xFFu;
    repr[1] = (n>>8) & 0xFFu;
    repr[2] = (n>>16) & 0xFFu;
    repr[3] = (n>>24) & 0xFFu;
    fwrite(&repr, 4, 1, f);
}

/* Update the size fields in a header of a WAV file. */
static void fix_wave_header(FILE *f)
{
    uint32_t fsize = (uint32_t)ftell(f);
    fseek(f, 4, SEEK_SET);
    write_uint32_le(f, fsize-8);
    fseek(f, 40, SEEK_SET);
    write_uint32_le(f, fsize-44);
}

/* Entry point. */
int main(int argc, char *argv[])
{
    size_t i;
    FILE *f = fopen("JingleBells.wav", "wb");
    add_wave_header(f);
    for (i = 0u; i < sizeof(JingleBells)/sizeof(JingleBells[0]); i++)
    {
        add_wave_data(f, JingleBells[i]);
    }
    fix_wave_header(f);
    fclose(f);
    (void)argc; /* Not used */
    (void)argv; /* Not used */
    return 0;
}

Comments (0)

×

Sign up for our Updates

Sign up now for our updates.

This field is required
This field is required
This field is required

I'm interested in:

Select at least one category
You were signed up successfully.