Parsing and Grammars - Part 4
Abstract Syntax Tree
In part 2 of this series you saw that a ParserFunc takes source code and returns a ParseResult consisting of a tree representation of the matched source code and the rest of the source code which could not be matched:
public class ParseResult
{
public IAst Ast;
public Source Rest;
}
Since in my last post I already presented the Source class, there is only one thing left to explain — the tree representation of the source code. Lets imagine we want to parse the simple expression:
2 + 3 * 5
One possible tree representation of this expression might be

This tree encodes the content and the structure of the source code, e.g. the precedence of * over +.
I will start out with a very simple implemention of abstract syntax trees which I will refine as I go along with my project. The two main ideas are
- use an interface to define the basic behavior of nodes (so we have more freedom in plugging in various node types later on and are not forced to include all AST nodes in a single class hierarchy)
- nodes are immutable (to play nice with a functional programming style)
Interface IAst
The following implementation reflects these design choices:
/// <summary>
/// Interface for nodes in an abstract syntax tree (AST).
/// </summary>
public interface IAst
{
/// <summary>
/// Gets source code corresponding to this node.
/// </summary>
Source Source { get; }
/// <summary>
/// Gets index-th child of this node.
/// </summary>
IAst this[int childIndex] { get; }
/// <summary>
/// Enumerates all children of this node.
/// </summary>
IEnumerable<IAst> Children { get; }
}
For a start I defined 4 concrete node types which should be sufficient for some first steps.
- AstNode — stores source code corresponding to this node, nothing else (no children)
- AstValue<T> — stores a value of type T and associated source code
- AstBinaryOp —stores the operator name and references ASTs representing the left and right operands
- AstGroup —stores an arbitrary number of child nodes and associated source code
The following sections present the source code of these classes.
AstNode
public class AstNode : IAst
{
private Source m_source;
public AstNode(Source source) { m_source = source; }
#region IAst Members
public Source Source
{
get { return m_source; }
}
public virtual IAst this[int index]
{
get { throw new IndexOutOfRangeException(); }
}
public virtual IEnumerable<IAst> Children
{
get { yield break; }
}
#endregion
}
AstValue<T>
public class AstValue<T> : AstNode
{
public readonly T Value;
public AstValue(T value, Source source)
: base(source)
{
Value = value;
}
public override string ToString()
{
return Value.ToString();
}
}
AstBinaryOp
public class AstBinaryOp : AstNode
{
public readonly IAst Left;
public readonly IAst Right;
public readonly string Op;
public AstBinaryOp(string operatorName, IAst left, IAst right, Source src)
: base(src)
{
Op = operatorName;
Left = left;
Right = right;
}
public override IAst this[int index]
{
get
{
if (index == 0) return Left;
else if (index == 1) return Right;
else throw new IndexOutOfRangeException();
}
}
public override IEnumerable<IAst> Children
{
get { yield return Left; yield return Right; }
}
public override string ToString()
{
return Op;
}
}
AstGroup
public class AstGroup : AstNode
{
private IAst[] m_items;
public AstGroup(IEnumerable<IAst> items)
: base(Source.Union(items.Select(x => x.Source)))
{
m_items = items.ToArray();
}
public AstGroup(params IAst[] items)
: this(items.AsEnumerable())
{ }
public override IAst this[int index]
{
get { return m_items[index]; }
}
public override IEnumerable<IAst> Children
{
get { return m_items; }
}
}
In my next post I plan to present the basic parser functions (Part 5), and then we can finally define a simple grammar and start parsing some source code (Part 6). I also plan to post the complete source code starting with Part 6.