Home > Uncategorized > Adventures in Roslyn: A new kind of managed lvalue pointer

Adventures in Roslyn: A new kind of managed lvalue pointer

It’s already the evening and I haven’t yet added anything to the C# compiler today, so here goes!

Properties have special support in C#, but they are not “first class”. You can’t get a reference to a property and pass it around as a value. Methods are much better served in this regard: delegates are a way to treat a method as a value. But they are just objects with an Invoke method.

So all we need is an interface with a Value property. Objects supporting that interface can represent a single property that can be passed around like any other value:

public interface IProperty<T>
{
    T Value { get; set; }
}

This is closely analogous to an old-fashioned pointer in C and C++, as I mused aloud all those years ago. Let’s turn that whole idea into a strangely alluring language feature, which I’ll call “property references”, and then occasionally forget that terminology and call them pointers instead.

Firstly, syntax. We could use actual pointer syntax, but I already used some of that in yesterday’s feature. Dagnabbit! Fortunately C++/CX has already paved the way: it has the concept of a reference to a fancy object that must be explicitly dereferenced. The syntax is like this:

MyClass ^r = %x;

(*r).Foo();

If this looks weird, try replacing ^ with * and % with &. It’s then exactly like C/C++. ^ is a postfix modifier on a type declaration that means “Will store a pointer to one of those”, and % is a unary prefix operator that means “Give me a pointer to whatever comes next”. And for the sake of uniformity in common code, C++/CX always uses * to dereference.

Before getting into the nitty-gritty of changing the compiler, let’s survey the glorious sunny uplands we wish to invade. We should be able to do this in C#:

var x = 5;
var px = %x; // Take a "pointer" to x

Console.WriteLine(*px); // Prints 5

*px = 6; // Can assign "through" a pointer
(*px)++; // Or increment

Console.WriteLine(x); // Prints 7 - we modified x via px

Unlike old-school pointers, we can – of course – quite safely return one of these things from a method, referring to a local variable:

static int^ GetCounter(int init)
{
    var counter = init;
    return %counter;
}

// elsewhere...
var c = GetCounter(100);
Console.WriteLine(*c); // Prints 100

(*c)++;
Console.WriteLine(*c); // Prints 101

The trick is that each bit of new syntax expands very simply into some standard C#. All the heavy lifting is done by the compiler’s existing support for lambdas:

var x = 5;
System.IProperty<int> px = System.Property.Bind(v => x = v, () => x); // int *p = %x;
var y = px.Value; // var y = *x;

But of course, those helper types are not part of the standard System namespace. We need to add them:

namespace System
{
    public interface IProperty<T>
    {
        T Value { get; set; }
    }

    public static class Property
    {
        private class PtrImpl<T> : IProperty<T>
        {
            Action<T> _set;
            Func<T> _get;

            public PtrImpl(Action<T> set, Func<T> get)
            {
                _set = set;
                _get = get;
            }

            public T Value
            {
                get { return _get(); }
                set { _set(value); }
            }
        }

        public static IProperty<T> Bind<T>(Action<T> set, Func<T> get)
        {
            return new PtrImpl<T>(set, get);
        }
    }
}

These don’t need to be in mscorlib.dll (some of the later C# features rely on types in System.Core.dll). So we can just create a new “System.Extras.dll” assembly and stick them in there.

So, one thing that makes this a whole ‘nother level of crazy compared to my first two forays in to compiler features is that here we are adding new syntax. Fortunately Roslyn makes this quite easy. There’s a file called Syntax.xml from which the project generates classes for all the nodes that can appear in a syntax tree. We can (as usual) follow the example of what we find in there.

All the unary prefix operators are in here, so they share a single class that can be distinguished by the Kind property:

<Node Name="PrefixUnaryExpressionSyntax" Base="ExpressionSyntax">
  <Kind Name="UnaryPlusExpression"/>
  <Kind Name="UnaryMinusExpression"/>
  <Kind Name="BitwiseNotExpression"/>
  <Kind Name="LogicalNotExpression"/>
  <Kind Name="PreIncrementExpression"/>
  <Kind Name="PreDecrementExpression"/>
  <Kind Name="AddressOfExpression"/>
  <Kind Name="PointerIndirectionExpression"/>
  <Kind Name="AwaitExpression"/>
  <Kind Name="PropertyReferenceExpression"/>    
  <Field Name="OperatorToken" Type="SyntaxToken">
    <Kind Name="PlusToken"/>
    <Kind Name="MinusToken"/>
    <Kind Name="TildeToken"/>
    <Kind Name="ExclamationToken"/>
    <Kind Name="PlusPlusToken"/>
    <Kind Name="MinusMinusToken"/>
    <Kind Name="AmpersandToken"/>
    <Kind Name="AsteriskToken"/>
    <Kind Name="AwaitKeyword"/>
    <Kind Name="PercentToken"/>
    <PropertyComment>
      <summary>SyntaxToken representing the kind of the operator of the prefix unary expression.</summary>
    </PropertyComment>
  </Field>
  <Field Name="Operand" Type="ExpressionSyntax">
    <PropertyComment>
      <summary>ExpressionSyntax representing the operand of the prefix unary expression.</summary>
    </PropertyComment>
  </Field>
  <TypeComment>
    <summary>Class which represents the syntax node for prefix unary expression.</summary>
  </TypeComment>
  <FactoryComment>
    <summary>Creates an PrefixUnaryExpressionSyntax node.</summary>
  </FactoryComment>
</Node>

I’ve added the PropertyReferenceExpression and the PercentToken. For the type modifier ^ I have to cook up a whole new node type:

<Node Name="PropertyReferenceTypeSyntax" Base="TypeSyntax">
  <Kind Name="PropertyReferenceType"/>
  <Field Name="ElementType" Type="TypeSyntax">
    <PropertyComment>
      <summary>TypeSyntax node that represents the element type of the property reference.</summary>
    </PropertyComment>
  </Field>
  <Field Name="CaretToken" Type="SyntaxToken">
    <Kind Name="CaretToken"/>
    <PropertyComment>
      <summary>SyntaxToken representing the caret symbol.</summary>
    </PropertyComment>
  </Field>
  <TypeComment>
    <summary>Class which represents the syntax node for property reference type.</summary>
  </TypeComment>
  <FactoryComment>
    <summary>Creates a PropertyReferenceTypeSyntax node.</summary>
  </FactoryComment>
</Node>

Now, if we try to build the compiler we’ll get errors about missing names in the enum SyntaxKind, so we need to add them:

public enum SyntaxKind : ushort
{
    ...

    PropertyReferenceType,
    PropertyReferenceExpression
}

In SyntaxKindFacts.cs there’s a workaday switch statement that we need to modify so it takes care of mapping % tokens to our new unary operator:

public static SyntaxKind GetPrefixUnaryExpression(SyntaxKind token)
{
    switch (token)
    {
        case SyntaxKind.PlusToken:
            return SyntaxKind.UnaryPlusExpression;
        case SyntaxKind.MinusToken:
            return SyntaxKind.UnaryMinusExpression;
        case SyntaxKind.TildeToken:
            return SyntaxKind.BitwiseNotExpression;
        case SyntaxKind.ExclamationToken:
            return SyntaxKind.LogicalNotExpression;
        case SyntaxKind.PlusPlusToken:
            return SyntaxKind.PreIncrementExpression;
        case SyntaxKind.MinusMinusToken:
            return SyntaxKind.PreDecrementExpression;
        case SyntaxKind.AmpersandToken:
            return SyntaxKind.AddressOfExpression;
        case SyntaxKind.AsteriskToken:
            return SyntaxKind.PointerIndirectionExpression;

        // The new part:
        case SyntaxKind.PercentToken:
            return SyntaxKind.PropertyReferenceExpression;

        default:
            return SyntaxKind.None;
    }
}

And there’s a another that defines the precedence of operators, which is how the compiler decides what to do when you don’t fully parenthesise your expressions. I figure that the new % operator should copy the existing & operator:

private static uint GetPrecedence(SyntaxKind op)
{
    switch (op)
    {
        ...

        case SyntaxKind.AddressOfExpression:
        case SyntaxKind.PropertyReferenceExpression: // the new part
            return 16;
        default:
            return 0;
    }
}

For the all-new ^ operator we have to throw some code in to deal with it. Like I said yesterday, Roslyn’s structure seems surprisingly procedural. It’s not using functional parser combinators or anything “cool” and/or “academic”. It’s just a bunch of methods that examine the current token, do switch statements, etc. On the plus side, it is very easy to learn how it works by stepping through it in the debugger. That’s the saving grace of procedural code: ease of hacking.

I hooked into the same place that handles pointer syntax, as (again) its closely analogous.

private TypeSyntax ParsePointerTypeMods(TypeSyntax type)
{
    // Check for pointer types
    while (this.CurrentToken.Kind == SyntaxKind.AsteriskToken)
    {
        type = syntaxFactory.PointerType(type, this.EatToken());
    }

    // Check for property reference types (new)
    while (this.CurrentToken.Kind == SyntaxKind.CaretToken)
    {
        type = syntaxFactory.PropertyReferenceType(type, this.EatToken());
    }

    return type;
}

Note: we don’t have to write that syntaxFactory.PropertyReferenceType method. It’s one of the pieces that are auto-generated from what we added to Syntax.xml.

Now, we have the syntax. All we need now is to sort out the binding phase, which figures out whether the syntax actually makes sense (that when it refers to a variable called x, there actually is one called x, and every expression has a type, etc.)

And it is here that I am overcome with one of those attacks of laziness that are the hallmark of the truly great programmer, hem-hem. Faced with a pattern like this:

System.Property.Bind(v => o = v, () => o)

We don’t want to have to write screeds of code that builds the BoundExpression that make up that pattern (believe me: I got about half-way through the first lambda before realising I would be retired before finishing the whole thing). In any case, the compiler can already do it – that’s its job. Ideally we could use the existing parser to get a kind of “syntax template”, in which we can replace certain identifiers with chunks of other syntax, and then ask the existing binder to bind it. Then we’d have to do almost no thinking at all! Bliss.

So for example:

private static readonly SyntaxTemplate _propertyReferenceTemplate
    = new SyntaxTemplate("System.Property.Bind(__v_pr__ => o = __v_pr__, () => o)");

private BoundExpression BindPropertyReferenceExpression(PrefixUnaryExpressionSyntax node, DiagnosticBag diagnostics)
{
    return RedirectDiagnostics(diagnostics, node, redirected =>
        BindExpression(_propertyReferenceTemplate.Replace("o", node.Operand).Syntax, redirected));
}

We’ll come back to that RedirectDiagnostics part later. The key point is that I’m creating an instance of my new class SyntaxTemplate as a static, so it is reused for the lifetime of the compiler. It’s immutable, hence thread-safe. Then every time I need to bind something like %foo, I can just replace the o in the template with foo (which is in node.Operand). Replace returns a new SyntaxTemplate rather than modifying the original (that’s what immutability is all about).

Again, binding is a recursive, procedural system. There’s a big switch statement that calls out to methods that bind various things, so we need to hook our new method BindPropertyReferenceExpression into that:

private BoundExpression BindExpressionInternal(ExpressionSyntax node, DiagnosticBag diagnostics, bool invoked, bool indexed)
{
    if (IsEarlyAttributeBinder && !EarlyWellKnownAttributeBinder.CanBeValidAttributeArgument(node, this))
    {
        return BadExpression(node, LookupResultKind.NotAValue);
    }

    Debug.Assert(node != null);
    switch (node.Kind)
    {
        ...

        // New part
        case SyntaxKind.PropertyReferenceExpression:
            return BindPropertyReferenceExpression((PrefixUnaryExpressionSyntax)node, diagnostics);

See how there’s a switch statement on an enum, then a cast – all the kinds of thing that beginners are told not to do when they learn C#, because supposedly virtual method dispatch on a single object solves all problems. (Oh wait, no it doesn’t.) But still, it wouldn’t make sense to have a built-in type switch in languages like C#, Java or C++ (except apparently in one situation).

Anyway, BindExpression calls BindExpressionInternal, which calls our new BindPropertyReferenceExpression method, which expands our template and passes it to BindExpression… we’re going in circles! But it’s okay. The reason this doesn’t asplode the stack is because our template doesn’t include further references to %.

Now, about that RedirectDiagnostics wrinkle. The binding process has a DiagnosticBag object that gets passed around. Any errors found are thrown in the bag. Each error has a Location object, identifying the place in the user’s source code where the error was spotted, so the locations were discovered at the parsing stage. The problem we have is that we parse our template code separately, so the locations bear no relation to the user’s source code. This means that the IDE’s text editor cannot put red squiggles in the right place.

To fix this, I literally fix the diagnostics:

private T RedirectDiagnostics<T>(DiagnosticBag diagnostics, 
             CSharpSyntaxNode nodeWithLocation, 
             Func<DiagnosticBag, T> generate)
{
    var captured = new DiagnosticBag();
    var result = generate(captured);

    foreach (var diag in captured.AsEnumerable().OfType<DiagnosticWithInfo>())
        diagnostics.Add(new CSDiagnostic(diag.Info, nodeWithLocation.Location));

    return result;
}

The generate function does the binding, but to a “fake” temporary DiagnosticBag, which we then copy into the real one but replacing all the Location objects with a single good location. This isn’t ideal. Recall that some of the syntax tree was inserted from the user’s source and so had perfectly good locations. I need to figure out a way of detecting whether a location is junk or not. But it sort of works.

So, we have binding for %. For * we have to enhance the existing code, branching based on whether operand is a pointer:

private static readonly SyntaxTemplate _pointerIndirectionTemplate = new SyntaxTemplate("p.Value");

// Based on ExpressionBinder::bindPtrIndirection.
private BoundExpression BindPointerIndirectionExpression(PrefixUnaryExpressionSyntax node, DiagnosticBag diagnostics)
{
    BoundExpression operand = BindValue(node.Operand, diagnostics, GetUnaryAssignmentKind(node.Kind));

    // Try using the template on anything that isn't a pointer
    if (!operand.Type.IsPointerType())
        return RedirectDiagnostics(diagnostics, node, redirected => 
            BindExpression(_pointerIndirectionTemplate.Replace("p", node.Operand).Syntax, redirected));

    TypeSymbol pointedAtType;
    bool hasErrors;
    BindPointerIndirectionExpressionInternal(node, operand, diagnostics, out pointedAtType, out hasErrors);

    return new BoundPointerIndirectionOperator(node, operand, pointedAtType ?? CreateErrorType(), hasErrors);
}

We can even use the template technique for type declarations – just replace T with whatever we get from the user:

private static readonly SyntaxTemplate _propertyReferenceTypeTemplate = new SyntaxTemplate("System.IProperty<T>");

internal Symbol BindNamespaceOrTypeOrAliasSymbol(ExpressionSyntax syntax, DiagnosticBag diagnostics, ConsList<Symbol> basesBeingResolved, bool suppressUseSiteDiagnostics)
{
    switch (syntax.Kind)
    {
        ...

        case SyntaxKind.PropertyReferenceType:
            {
                return RedirectDiagnostics(diagnostics, syntax, redirected => BindNamespaceOrTypeOrAliasSymbol(
                    _propertyReferenceTypeTemplate.Replace("T", ((PropertyReferenceTypeSyntax)syntax).ElementType).Syntax, 
                    redirected, basesBeingResolved, suppressUseSiteDiagnostics));
            }

That’s every change needed to support the feature. If you want to play with it (and even add features of your own using the SyntaxTemplate class), I’ve updated my fork with all these changes. You will need to define the System.IProperty and System.Property types – it will work if you just paste the code.

Advertisements
Categories: Uncategorized Tags:
  1. CarlJ
    May 7, 2014 at 1:34 am

    Should “5” be “v” in this line?:
    System.IProperty px = System.Property.Bind(v => x = 5, () => x); // int *p = %x;

    • earwicker
      May 17, 2014 at 7:25 pm

      It should! Have fixed it.

  1. April 29, 2014 at 5:31 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: