Visual AST

Introduction

Building a concise Abstract Syntax Tree (AST) representing source code is a key step for conveniently producing any compiled program. The program presented here is written in C# using VS2010 and it uses .NET GDI+ library to draw an image for AST returned from parser after parsing input source code. The parse is generated with ANTLR and it comes with an editor ANTLRWorks which can also shows AST graphically and is written in Java. This program shows how to use C# to create a graphical AST which is easily integrated into C# project while building DSL (Domain-specific language) module. Also, I am going to formalize my Entity Mapping language grammar and used it as an example for building the parser and draw the AST image. The Entity Mapping language is first described in my previous article Entity mapping language implementation using bsn-goldparser with CodeDom.

Background

ANTLR Tree Grammar

Before talking about the drawing program, I would like to discuss on the tree grammar when defining parser with ANTLR.  ANTLR has several ways to let you build your parser with desired AST output format and using the tree grammar is one I found it very powerful and straight forward to twist the output format wanted.

Let us consider a simple tree which may be produced by a parser generated from ANTLR after parsing an expression x = 12 + 5.  Using BNF syntax in ANTLR style as below, 
assignment : IDENTIFIER '=' expression;
expression : INT '+' INT;
what we get from the parser output will be a flatten, one dimensional list of tokens as shown in below figure. Such flatten one dimensional AST will be very difficult to manipulate.  If you need a better tree structure output, you need to use other ANTLR techniques like including semantic actions to assist in building your AST structure.  Semantic actions are actually fragment of codes in target language (e.g. C#) spitted to the generated parser and lexer classes.  So you need to have good understand of the ANTLR auto-generate mechanism before comfortably adding code to help generating lexer and parser classes with additional features you wanted.

Standard Tree Ouput

Using ANTLR tree grammar,  to turn the output as a 2-dimensional tree, we annotate the token that is the root of a sub-tree by appending a ^ character to it when defining parser with BNF syntax.  For the previous example, we can express it as
assignment : IDENTIFIER '='^ expression;
expression : INT '+'^ INT;
Then what we get a is a nicer AST structure as shown below which can be much easier for further manipulation. But if you look carefully the diagram, you will find that the root node nil is a bit odd and often not we wanted.  Actually, we have only used shorthand syntax of ANTLR tree grammar, I shall discuss the full tree grammar when reviewing the sample Entity Mapping language grammar in later section. Full tree grammar syntax helps to add imaginary nodes such that our tree looks nicer for further processing and surely the nil node will be gone.

AST with tree grammar

Define Entity Mapping Language

My proposed Entity Mapping Language is a simple collection of assignment statements to let user specifying how they want one entity's properties mapped to another entity's properties. She needs to specify what source and target entity type are in first line of statements.  I call it mapping declaration statement and BNF syntax [Target Entity Type] <- [Source Entity Type] is used.  After the mapping declaration, there are series of assignment statements to set properties of target entity and the BNF syntax [Targte Property] = Expression is used.  Expression in each assignment statement can be computed from combinations of source and target properties and global function call(s).  Each statement is terminated by semicolon ; character so that it can span multiple lines.  Also, conditional IF expression is supported (similar to Excel IF expression) to allow user entering branching expressions with condition.  The following is an example program allowed to be entered by user to specify how an invoice entity be assigned its properties from an order entity.  For more information, please refer to my previous article which details the concept for defining an entity mapping language.  Comparing to previous article, I have omitted static method call and replaced with global function call as it looks more natural for user who just entering assignment statements like what she did in Excel while entering formula.

// Listing 1
Invoice <- Order ;
InvoiceDate = Order.OrderDate + 30 ;
InvoiceNo = GetNextInvoiceNo() ;
Freight = Order.TotalCBM * 1.5 + Order.TotalWeight * 2.2;
ShipVia = IF(Order.IsLocal, "Express", "Mail") ;

The following listing is the grammar for the lexer and parser defined in ANTLR grammar file for the Entity Mapping language.  I use what ANTLR called combined grammar to define my language that it combines both lexer and parer definitions in a single file.  Also, I have specified C# 3 as the target language to create lexer and parser classes that is indicated in options section with lanaguage = CSharp3 setting.  The setting output = AST means returning AST by the parser and format is in CommonTree type.  Also the lines starting with @lexer and @parser allow us to specify the generated classes in .NET namespace and I use EntMapping here. The @header section will insert whatever code written inside to generate class headers and I specify C# using clauses here.  Actually, you have full control on how to generate your lexer and parser classes like inserting fields, properties, methods and semantic actions (C# codes) in them.  As these topics are out of my scope here, please refer to ANTLR website for more details.