metalanguage and language set

Introduction to ngrease

Ville Oikarinen

generated on 2009-02-24

The goal

Multi-stage transformation from specification to product

ngrease is modestly an effort to provide a tool to formalize and automatize the whole software development process.

All the "phases" of the software development process are basically the same: each phase consumes specifications of the application features at a higher abstraction level and transforms them into a lower level of abstraction, producing new, more detailed specifications to be consumed by the next phase, until the specification is executable by a computer.

Transformation preserves the semantics of the specification, but makes it unambiguous for the current level of abstraction.

Usually there are an indefinite number of ways to transform a specification to a lower level. Choosing between the alternatives is called architecture, implementation, style, optimization etc, depending on the "phase" .

Minimal model-to-model transformers

It is important to reduce abstraction as little as possible at each stage. This way each transformation phase gets the primary definitions and not something derived from them, which makes it easier to make more sophisticated transformations, to preserve the intended semantics of the specification. (The same principle also applies to traditional programming and specification. Avoiding the urge to jump from the functional specifications directly to bit operations helps make the code correct and easy to reuse and maintain.)

Transformers that reduce abstraction as little as possible as maximally general - they have good cohesion. The higher the abstraction of the concepts they produce, the more usable they are as source for further transformations. The ideal transformation chain has many "branch points" in which it is possible to use transformation results as source for many further transformations at the same time. For example, it is generally difficult to generate java from c, but both are easier to generate from a more abstract "common denominator" imperative language.

This contradicts to the typical way of applying code-generation (see Generating Code with DSM by Steven Kelly for an example), to avoid model-to-model transformations and generate "code" directly. ngrease tries to remove the distinction between model and code: A model can be called "model" by transformers that consume it and "code" by transformers that produce it. The final product of code generation is just a model for a microprocessor.

Metaprogramming

ngrease provides a metalanguage that can be used to write any definition as naturally as possible, using concepts from the current problem domain(s) , i.e. using domain specific languages.

So languages tools (transformers) are also definitions that can be written in the ngrease metalanguage. In this kind of metaprogramming creating new programming languages as part of a software project is very common. Like Sergey Dmitriev writes in his Language Oriented Programming article, this is actually nothing new: traditional programmers create new frameworks where metaprogrammers create languages.

Creating a language by using it

An ngrease programmer tries (saying what one really wants is surprisingly difficult!) to write definitions in a maximally expressive and natural language without worrying if the language is transformable yet. (Compare this to coding by intention or one layer thinking: in traditional programming it's also wise to just write a short function that delegates much of its work and only later worry about implementing the other functions it delegates to.)

When a definition is ready (enough), it's time to ensure its transformability to a lower abstraction level, i.e. to choose or create semantics for it. Virtually every application project contains unique problems so usually existing languages need to be extended and new ones created during a project.

What is a language in ngrease

ngrease completely ignores metamodel (like XML DTD and Schema) based definitions of language. (At least currently - later metamodels may be used if an easy way to define them is found, because they are useful for creating very powerful editor tools.)

Instead, in ngrease a language is simply a set of abstract syntax trees that the selected transformer agrees to transform. So transformers define both the abstract syntax and the semantics of a language: the syntax is whatever structural rule the transformer requires, and the semantics is whatever it produces as result. (A transformer actually implicitly defines the metamodel, so the ideal solution would be to either derive the metamodel from the transfomer or derive both from a common definition and thus avoid the redundancy between traditional transformers and their metamodels.)

Compare this to the definition of formal language commonly used in computer science, which only defines the syntax of language.

(Philosophical sidenote: this definition of language seems to work well with natural languages, too. The Finnish language as a whole is defined by transformations to other languages, especially to the internal language of each Finn. Any sentence transformable by these transformations to another language is valid Finnish, and its "meaning" is whatever the same sentence is in another language. The subjectiveness of this definition seems to be a natural characteristic of the concept of language.)

Examples

At this stage these examples show a glimpse of what ngrease is about in practice. To fully understand them you are advised to read the rest of the document.

The examples can be viewed by clicking the links to their latest versions in svn, or by downloading ngrease and executing the given shell commands in the ngrease directory. Windows users may either use Cygwin or use corresponding dos commands.

Java extended with the property idiom

This example demonstrates extending an existing language. It adds support for the property idiom to the ngrease version of java. The extended language is called ntity, because later it will be a more abstract entity language.

The previous example section as an example

Fowler's fixed width field mapping DSL

This example is a very straight-forward implementation of the language sketched by Martin Fowler in his language workbenches article.

String length checking editor for an entity

This is the first almost real-world example. An entity definition with maximum lenghts for its string properties is transformed into a simple Swing editor that enforces valid string lengths at save.

This example also contains a demo shar that can be used to run the example (requires ant ):

bin/ngrease -f examples/net/sf/ngrease/examples/entityeditpanel/person-as-editpanel-demo-shar.ngr > your_tmp_dir/demo.sh
cd your_tmp_dir
bash demo.sh
ant -f demo/build.xml

ngrease as a technology

An AST library

ngrease handles elements. An element consists of

So an element is a tree of strings. Or actually it is two hierarchies of elements: the child hierarchy and the attribute hierarchy, the latter of which is intended mainly for controlling the metaevaluation, not for user languages.

As a language tool ngrease can be thought of as a universal lexer that produces a tree of symbols instead of a stream of symbols.

Metaparser and one concrete parser

In the future ngrease will support many concrete syntaxes to define elements. The metaparser defines the metasyntax that allows the user to define the parameters (like the concrete syntax to use) of the parsing phase in the header of the source text.

Currently ngrease contains one concrete syntax: ngr

An example of the ngr syntax:

# ngrease 1
#
# The above line states that version 1 of the metasyntax is used.
# This first line is the only static part of the metasyntax.
# The current metaparser does nothing else that skips the rest
# of the header i.e. these lines that start with #

root {
  child1
  "this is the second child"
  'third child'  (we are the attributes of the third child) {
    we are the children of the third child
  }
  # {'if the symbol is #, the parser ignores it'}
  #:'if there is exactly one child, it can be separated from its parent'
  #:'using colon instead of curly braces.'
  #:'this syntactic sugar enables key:value maps without explicit support for maps!'
  a-map {
    key0:value0
    key1:value1 { children of value1 }
  }
}

Although the ngr syntax is very AST-oriented like XML and lisp, it often looks surprisingly "concrete". See the source for the first section of this article for an example. Elements inside a paragraph (a p or paragraph element) can easily be interpreted as raw text, if they have no children. Otherwise they are interpreted as html level concepts like modifiers, lists, links etc. So while the syntax is designed for more structural element trees, it is quite usable for mostly-text documents, too.

Here is another invented example of a concrete-like syntax that has no semantics yet:

query {
  select * from my_table where id = $:id-to-query
}

A pure functional AST template language

ngrease is also a language. The language is mainly intended as a metalanguage i.e. a language for creating new languages. The ngrease language provides tools for defining transformers (see the definition of language above).

The next section describes the language in more detail.

The ngrease language

Naturally the ngrease metalanguage is based on the element AST. It interpretes an element as an expression and evaluates it. The result of the evaluation is a new element.

Elements as expressions

ngrease is a template language in the sense that by default all elements are constants i.e. they evaluate to themselves.

In a way ngrease has only one reserved word, the metasymbol that is by default the dollar symbol ($). Later the metaparser will allow the user to define another symbol to be used as the metasymbol in case the dollar symbol is needed a lot as such.

ngrease evaluates the metasymbol according to its only child. For example

$:identity:a
evaluates to
a
so quite obviously $:identity is the identity function that evaluates to the only child of the "identity" element.

Defining user expressions

When ngrease starts evaluating an element, the default context is effective. All the built-in expressions are defined in the default context. To learn them see the default context and their acceptance tests: ls examples/net/sf/ngrease/acceptancetests

A new context can be defined as an element with symbol "context" . For example, to make $:hello-world evaluate to "Hello World!" , the following context definition is needed:

context {
  user-expression-builders {
    hello-world:"Hello World!"
  }
}

Any user expression can be evaluated using a new context by giving the context as an attribute to the metaelement. The new context is effective for the whole subtree, unless overridden in a child. For example

$ (
    context {
      user-expression-builders {
        hello-world:"Hello World!"
      }
    }
  ):hello-world
evaluates to
"Hello World!"

In the previous example a new context was created from scratch, so none of the built-in expressions would have been available. The following example demonstrates using the $:with expression that - while more ergonomic than a context attribute - extends the current context with the given expressions, much like the let form in lisp dialects:

$:with {
  hello-world:"Hello World!"
  $:hello-world
}
evaluates to
"Hello World!"

(Currently $:with is the only way to extend the current context. The default context is available for extension via the $:default-context expression.)

The expression definition can also refer to the "expression call" element, if the definition defines the symbol for it. The expression will be evaluated in a context that binds the source symbol to the source element. For example

$:with {
  wrap-with-a(source):$:quote:
    a:$:child {index:0 of:$:source}
  $:wrap-with-a:c{d e}
}
effectively first evaluates to
$:with {
  source:wrap-with-a:c{d e}
  a:$:child {index:0 of:$:source}
}
which finally evaluates to
a:c{d e}
(The built-in $:child expression evaluates to a child of its "of" parameter, either by index or by symbol.)

Using traditional terms, ngrease expressions are either variables that need the expression element only for its symbol (the symbol determines which definition to evaluate) or functions that transform the function call into the return value.

The previous example demonstrates an important difference between ngrease and other functional languages: in ngrease each function gets only one parameter: the whole function call element, and the definition needs to parse the parameters from the call element. This way the choice between positional or named parameters (or a combination of both) is up to each function.

Evaluation order

There are four types of expressions:

Proper constants evaluate to themselves.

Parent expressions evaluate to themselves, but with children evaluated. This is what makes ngrease a template language: an element can be mostly constant, containing dynamic content only deep within the hierarchy. (In practice parent expressions are called constants, taking for granted the natural children-first evaluation order.)

A metaexpression evaluates to the definition of its user expression, as described above. Note that the children-first order is not applied for metaexpressions, but all evaluation is delegated to the user expression. An exception to this rule is when metaexpressions are nested: if the user expression of a metaexpression is a metaexpression, the inner metaexpression is evaluated before proceeding. This is handy for generating expressions before evaluating them.

A user expression is responsible for its own evaluation order. It can evaluate parts of its expression element as it sees fit. For example the built-in $:if only evaluates one of its branches, the true branch or the false branch, depending on the condition, which it evaluates first. Here is another example written in the ngrease language:

$:with {
  #:"This expression evaluates to its second parameter with those children evaluated"
  #:"whose symbol is equal to the first parameter."
  #:"Other children are kept untouched."
  eval-children-by-symbol(source): $:quote:
    $:replace-children (c) {
      of:$:child {index:1 of:$:source}
      with:$:evaluate:$:c
      if:$:equals {$:child {index:0 of:$:source} $:symbol-of:$:c}
    }
  $:eval-children-by-symbol {
    a
    parent {
      a:$:identity:0
      b:$:identity:1
      a:$:identity:2
      a:$:identity:3
      c:$:error:"Child 4 got unexpectedly evaluated!"
    }
  }
}
evaluates to
parent {
  a:0
  b:$:identity:1
  a:2
  a:3
  c:$:error:"Child 4 got unexpectedly evaluated!"
}

Splicing

Although an expression always evaluates to a single element, a parent element can be made to substitute a metaexpression child with the children of its result instead of the result by adding attribute @ to the child metaelement.

For example

a {
  $:identity:b {c d}
  $(@):identity:e {f g}
}
evaluates to
a {
  b {c d}
  f
  g
}

This feature (copied from lisp) is called splicing, and it enables very template-like transformers.

Note that splicing only works under parent expressions, so the following $:if expression fails instead of splicing the condition, then and else branches from the child expression:

$:if {
  $(@):identity: non-working-cond-then-else {
    $:equals {a a}
    "a is equal to a"
    "a is not equal to a"
  }
}

Traditional tricks - the ngrease way

The data model and the evaluation model of ngrease are together a very simple and powerful combination: elements are interpreted as expressions, expressions produce elements, elements are interpreted as expressions, ...

So manipulating ngrease code isn't any different from manipulating any other data.

The data-centric philosophy of ngrease approaches many programming techniques as code manipulation tasks. As described above, function-like expressions are evaluated by transforming the expression element into the value element.

Element manipulation as uncurrying

Another programming tecnique, currying/uncurrying can be emulated by building a partly formed expression and then filling up the missing parts before evaluating it. For example the expression

$:with {
  curried-expr:+:1
  $:$:append-child {$:curried-expr 2}
}
evaluates to
3

Here is the same trick using named parameters. The expression

$:with {
  replacement-expression:$:quote:$:quote:replace-children (c) {
                           with:foo
                           if:$:equals {b $:symbol-of:$:c}
                         }
  $:$:append-child {$:replacement-expression of:a {b c b:bb d}}
}
first evaluates to
$:replace-children (c) {
  with:foo
  if:$:equals {b $:symbol-of:$:c}
  of:a {b c b:bb d}
}
which then evaluates to
a {foo c foo d}

Transformers

Transformers are user expressions for a given target concept. In other words, user expressions can be thought of as default transformers, transformers that are applied when the target concept is not given.

In practice transformers are used in a different way, but maybe in the future user expressions and transformers will be unified so the above statement will be true in a more literal way.

Transformers are defined, like user expressions, in the context element:

context {
  transformers {
    source-concept:target-concept:expression
  }
}

Transformers are called using the $:transform user expression:

$:transform {
  to:target-concept
  from:source-concept {
         we are the children of the source to transform
       }
}

Ontology

Currently transformers require exact matches for the source and target concepts, but in the (near) future they will use an "is a kind of" check if needed. The $:is-a user expression already works, and the "is a kind of" relations between concepts can be defined in the context. See the test of the $:is-a expression for an example: less examples/net/sf/ngrease/acceptancetests/ontology/is-a.ngr

(Strictly speaking, in its current form the ontology only defines a taxonomy, but the more general term ontology is however used.)

The Command Line Interface

As a pure functional language ngrease doesn't generate real files. Instead it outputs the evaluation result to stdout.

Currently ngrease supports string output. If the symbol of the evaluation result is 'string', ngrease concatenates the symbols of its children to stdout. Otherwise it prints an error message and the result element to stderr.

For binary output ngrease will probably later support the following syntax:

bytearray {
  byte:110 byte:103 byte:114 byte:101 byte:97 byte:115 byte:0x65
}
or something similar.

The syntax of the ngrease cli is

ngrease PATHTYPE SOURCEPATH
where PATHTYPE is one of the following:

By default the classpath (the path where java resources can be found) contains the languages and examples directories of the distribution, but it can be extended using the NGREASEPATH environment variable. The syntax of the variable unfortunately inherits the unportability of the java classpath syntax, but later ngrease will provide a better solution for extending the resource path.

Enable - ensure - enforce

ngrease is primarily meant to make good things possible, and only later - with more experience on what the good things are, and only if needed - is the time to think how to ensure or even enforce them.

So at least currently ngrease lacks for example language schemas, type inference, namespace hygiene and security. The author is fully aware that some of these may be very difficult to add afterwards, but on the other hand, he has never really missed them when programming bash, for example.

Comparison to other languages

XML and XSLT

The concrete syntax of XML is arguably quite verbose and clumsy. Of course XML is more than the (only official) concrete syntax, but in practice it is mostly tied to it.

The author knows of no way to use XSLT transformations "inside" an XML document. Instead external scripts are needed to dynamically define parts of a document. In ngrease any part of a tree can be "animated" with the metasymbol, which unleashes the full power of the language.

In the XML AST attributes are a string->string map.

Elements could be used for anything attributes can be used, so probably one of the reasons for the existence of attributes is that their concrete syntax is easier...

Attributes in ngrease are completely different: they are a parallel hierarchy which helps in separating "meta-level" content from the "user-level" content. But in ngrease full hierarchical trees are always available instead of restricting the structure to mere raw strings.

Lisp dialects

Some lisp dialect would probably have been quite enough for achieving the goals of ngrease, but the author felt more comfortable with a fresh language free from legacy assumptions.

Furthermore, element is an interface that can have many implementations. Different lazy implementations and adapters to other systems are easier to integrate to the language than what would be possible with an existing language.

S-expressions are fine, but as a base for domain specific languages they have one flaw: they are not a natural match for trees. An s-expression is either a childless singular or an anonymous list. A tree is generally represented as an anonymous list, whose first element is interpreted as the root and the rest are interpreted as the children. Not only is the mapping to the conceptual tree indirect, but when inspected closely, the actual language has poor semantic density: 50% of it consists of anonymous lists.

In natural speech things almost always have explicit, domain-specific names, and anonymous collections are seldom used. ngrease doesn't even support anonymous plurals. There can only be explicitly named singulars, which may or may not contain other singulars.

Finally, lisp has data types. An element can be a string, symbol, number etc. In ngrease elements are just trees of strings, and their interpretation always depends on the context. This makes ngrease maximally "meta": it leaves as much decision-making to the concrete languages as possible.

Bash

The lack of typing makes the Unix shells very powerful glue languages . They treat everything as raw data (string) that is potentially usable as program code any time. ngrease is an effort to combine this freedom with the "ngreased" ease of interpreting and modifying structural data.

Template languages

Unlike typical template languages, ngrease transformers are capable of reducing abstraction gradually instead of directly producing the final string result.

This also makes it very natural to use ngrease to produce whole directory structures. Template languages are usually capable of generating string content for a single file at a time, and controlling the file creation must happen "outside" the actual template language.

There is a progression from template languages through bash to ngrease:

The source code for a template language is a string which includes "meta strings", dynamic content. The string template syntax of groovy demonstrates this:

"""constant text ${string-generating-groovy-code} more constant text"""

Bash raises the abstraction a little. For a bash programmer a command line is not only a string, but a list of strings. In this list each string can include "meta strings":

command $(bash-code-generating-command) "constant with a $variable_reference in it" <(filename-generating-command)

As stated above, ngrease tries to make things even easier for the programmer by treating the source as a tree of strings with "meta strings".

Meta Programming System

Due to the very limited resources of the author, ngrease only supports the first two user interface types of the "stack of user interfaces types:" the API and the CLI.

The graphical editors of MPS seem very promising, but on the other hand, as history has proven, textual interfaces tend to live long, because they are more easily "abused" in new simple ways, invented long after the original tool.

Stratego and ATerm

Stratego is a model-to-model transformer like ngrease. Its transformation model is more complex, splitting into rules and strategies. ngrease approaches similar goals from a more minimalistic direction.

The support for concrete syntaxes seems to be one of the strongest features of Stratego.

The AST library of Stratego, ATerm, is very similar to the ngrease AST. ngrease reinvents the wheel mainly to keep itself 100% Java and to make future implementations of the AST interface possible, like explained in the Lisp comparison section.

ngrease as a build system

ngrease is primarily a code generator (although code is a very general concept: this very article is "code" generated by ngrease).

ngrease can also be thought of as a mechanism that allows the content of an object to be different from its definition. So every object is its own build script.

In traditional file systems files are almost always defined by their content. The only exceptions are symbolic links, which provide a very restricted form of dynamism, and device drivers, which aren't practical to write as an everyday task.

This is why external build scripts need to be used for dynamically generated objects. But this solution is not sufficient, because there is always the risk of seeing an outdated version of a generated object, if the build script isn't run after modifying the definition.

The pure functionality of ngrease suits very well for providing a solution to the problem. The object definitions can safely refer to other objects. "This image is the black-and-white version of image X" or "This directory is the class directory of the java source directory src" .

All references in ngrease are urls to make parallel computing easier (this will probably be more and more important in the future).