Archive

Archive for April, 2014

Adventures in Roslyn: A new kind of managed lvalue pointer

April 27, 2014 3 comments

It’s already the evening and I haven’t yet added anything to the C# compiler today, so here goes!

Properties have special support in C#, but they are not “first class”. You can’t get a reference to a property and pass it around as a value. Methods are much better served in this regard: delegates are a way to treat a method as a value. But they are just objects with an Invoke method.

So all we need is an interface with a Value property. Objects supporting that interface can represent a single property that can be passed around like any other value:

public interface IProperty<T>
{
    T Value { get; set; }
}

This is closely analogous to an old-fashioned pointer in C and C++, as I mused aloud all those years ago. Let’s turn that whole idea into a strangely alluring language feature, which I’ll call “property references”, and then occasionally forget that terminology and call them pointers instead.

Firstly, syntax. We could use actual pointer syntax, but I already used some of that in yesterday’s feature. Dagnabbit! Fortunately C++/CX has already paved the way: it has the concept of a reference to a fancy object that must be explicitly dereferenced. The syntax is like this:

MyClass ^r = %x;

(*r).Foo();

If this looks weird, try replacing ^ with * and % with &. It’s then exactly like C/C++. ^ is a postfix modifier on a type declaration that means “Will store a pointer to one of those”, and % is a unary prefix operator that means “Give me a pointer to whatever comes next”. And for the sake of uniformity in common code, C++/CX always uses * to dereference.

Before getting into the nitty-gritty of changing the compiler, let’s survey the glorious sunny uplands we wish to invade. We should be able to do this in C#:

var x = 5;
var px = %x; // Take a "pointer" to x

Console.WriteLine(*px); // Prints 5

*px = 6; // Can assign "through" a pointer
(*px)++; // Or increment

Console.WriteLine(x); // Prints 7 - we modified x via px

Unlike old-school pointers, we can – of course – quite safely return one of these things from a method, referring to a local variable:

static int^ GetCounter(int init)
{
    var counter = init;
    return %counter;
}

// elsewhere...
var c = GetCounter(100);
Console.WriteLine(*c); // Prints 100

(*c)++;
Console.WriteLine(*c); // Prints 101

The trick is that each bit of new syntax expands very simply into some standard C#. All the heavy lifting is done by the compiler’s existing support for lambdas:

var x = 5;
System.IProperty<int> px = System.Property.Bind(v => x = v, () => x); // int *p = %x;
var y = px.Value; // var y = *x;

But of course, those helper types are not part of the standard System namespace. We need to add them:

namespace System
{
    public interface IProperty<T>
    {
        T Value { get; set; }
    }

    public static class Property
    {
        private class PtrImpl<T> : IProperty<T>
        {
            Action<T> _set;
            Func<T> _get;

            public PtrImpl(Action<T> set, Func<T> get)
            {
                _set = set;
                _get = get;
            }

            public T Value
            {
                get { return _get(); }
                set { _set(value); }
            }
        }

        public static IProperty<T> Bind<T>(Action<T> set, Func<T> get)
        {
            return new PtrImpl<T>(set, get);
        }
    }
}

These don’t need to be in mscorlib.dll (some of the later C# features rely on types in System.Core.dll). So we can just create a new “System.Extras.dll” assembly and stick them in there.

So, one thing that makes this a whole ‘nother level of crazy compared to my first two forays in to compiler features is that here we are adding new syntax. Fortunately Roslyn makes this quite easy. There’s a file called Syntax.xml from which the project generates classes for all the nodes that can appear in a syntax tree. We can (as usual) follow the example of what we find in there.

All the unary prefix operators are in here, so they share a single class that can be distinguished by the Kind property:

<Node Name="PrefixUnaryExpressionSyntax" Base="ExpressionSyntax">
  <Kind Name="UnaryPlusExpression"/>
  <Kind Name="UnaryMinusExpression"/>
  <Kind Name="BitwiseNotExpression"/>
  <Kind Name="LogicalNotExpression"/>
  <Kind Name="PreIncrementExpression"/>
  <Kind Name="PreDecrementExpression"/>
  <Kind Name="AddressOfExpression"/>
  <Kind Name="PointerIndirectionExpression"/>
  <Kind Name="AwaitExpression"/>
  <Kind Name="PropertyReferenceExpression"/>    
  <Field Name="OperatorToken" Type="SyntaxToken">
    <Kind Name="PlusToken"/>
    <Kind Name="MinusToken"/>
    <Kind Name="TildeToken"/>
    <Kind Name="ExclamationToken"/>
    <Kind Name="PlusPlusToken"/>
    <Kind Name="MinusMinusToken"/>
    <Kind Name="AmpersandToken"/>
    <Kind Name="AsteriskToken"/>
    <Kind Name="AwaitKeyword"/>
    <Kind Name="PercentToken"/>
    <PropertyComment>
      <summary>SyntaxToken representing the kind of the operator of the prefix unary expression.</summary>
    </PropertyComment>
  </Field>
  <Field Name="Operand" Type="ExpressionSyntax">
    <PropertyComment>
      <summary>ExpressionSyntax representing the operand of the prefix unary expression.</summary>
    </PropertyComment>
  </Field>
  <TypeComment>
    <summary>Class which represents the syntax node for prefix unary expression.</summary>
  </TypeComment>
  <FactoryComment>
    <summary>Creates an PrefixUnaryExpressionSyntax node.</summary>
  </FactoryComment>
</Node>

I’ve added the PropertyReferenceExpression and the PercentToken. For the type modifier ^ I have to cook up a whole new node type:

<Node Name="PropertyReferenceTypeSyntax" Base="TypeSyntax">
  <Kind Name="PropertyReferenceType"/>
  <Field Name="ElementType" Type="TypeSyntax">
    <PropertyComment>
      <summary>TypeSyntax node that represents the element type of the property reference.</summary>
    </PropertyComment>
  </Field>
  <Field Name="CaretToken" Type="SyntaxToken">
    <Kind Name="CaretToken"/>
    <PropertyComment>
      <summary>SyntaxToken representing the caret symbol.</summary>
    </PropertyComment>
  </Field>
  <TypeComment>
    <summary>Class which represents the syntax node for property reference type.</summary>
  </TypeComment>
  <FactoryComment>
    <summary>Creates a PropertyReferenceTypeSyntax node.</summary>
  </FactoryComment>
</Node>

Now, if we try to build the compiler we’ll get errors about missing names in the enum SyntaxKind, so we need to add them:

public enum SyntaxKind : ushort
{
    ...

    PropertyReferenceType,
    PropertyReferenceExpression
}

In SyntaxKindFacts.cs there’s a workaday switch statement that we need to modify so it takes care of mapping % tokens to our new unary operator:

public static SyntaxKind GetPrefixUnaryExpression(SyntaxKind token)
{
    switch (token)
    {
        case SyntaxKind.PlusToken:
            return SyntaxKind.UnaryPlusExpression;
        case SyntaxKind.MinusToken:
            return SyntaxKind.UnaryMinusExpression;
        case SyntaxKind.TildeToken:
            return SyntaxKind.BitwiseNotExpression;
        case SyntaxKind.ExclamationToken:
            return SyntaxKind.LogicalNotExpression;
        case SyntaxKind.PlusPlusToken:
            return SyntaxKind.PreIncrementExpression;
        case SyntaxKind.MinusMinusToken:
            return SyntaxKind.PreDecrementExpression;
        case SyntaxKind.AmpersandToken:
            return SyntaxKind.AddressOfExpression;
        case SyntaxKind.AsteriskToken:
            return SyntaxKind.PointerIndirectionExpression;

        // The new part:
        case SyntaxKind.PercentToken:
            return SyntaxKind.PropertyReferenceExpression;

        default:
            return SyntaxKind.None;
    }
}

And there’s a another that defines the precedence of operators, which is how the compiler decides what to do when you don’t fully parenthesise your expressions. I figure that the new % operator should copy the existing & operator:

private static uint GetPrecedence(SyntaxKind op)
{
    switch (op)
    {
        ...

        case SyntaxKind.AddressOfExpression:
        case SyntaxKind.PropertyReferenceExpression: // the new part
            return 16;
        default:
            return 0;
    }
}

For the all-new ^ operator we have to throw some code in to deal with it. Like I said yesterday, Roslyn’s structure seems surprisingly procedural. It’s not using functional parser combinators or anything “cool” and/or “academic”. It’s just a bunch of methods that examine the current token, do switch statements, etc. On the plus side, it is very easy to learn how it works by stepping through it in the debugger. That’s the saving grace of procedural code: ease of hacking.

I hooked into the same place that handles pointer syntax, as (again) its closely analogous.

private TypeSyntax ParsePointerTypeMods(TypeSyntax type)
{
    // Check for pointer types
    while (this.CurrentToken.Kind == SyntaxKind.AsteriskToken)
    {
        type = syntaxFactory.PointerType(type, this.EatToken());
    }

    // Check for property reference types (new)
    while (this.CurrentToken.Kind == SyntaxKind.CaretToken)
    {
        type = syntaxFactory.PropertyReferenceType(type, this.EatToken());
    }

    return type;
}

Note: we don’t have to write that syntaxFactory.PropertyReferenceType method. It’s one of the pieces that are auto-generated from what we added to Syntax.xml.

Now, we have the syntax. All we need now is to sort out the binding phase, which figures out whether the syntax actually makes sense (that when it refers to a variable called x, there actually is one called x, and every expression has a type, etc.)

And it is here that I am overcome with one of those attacks of laziness that are the hallmark of the truly great programmer, hem-hem. Faced with a pattern like this:

System.Property.Bind(v => o = v, () => o)

We don’t want to have to write screeds of code that builds the BoundExpression that make up that pattern (believe me: I got about half-way through the first lambda before realising I would be retired before finishing the whole thing). In any case, the compiler can already do it – that’s its job. Ideally we could use the existing parser to get a kind of “syntax template”, in which we can replace certain identifiers with chunks of other syntax, and then ask the existing binder to bind it. Then we’d have to do almost no thinking at all! Bliss.

So for example:

private static readonly SyntaxTemplate _propertyReferenceTemplate
    = new SyntaxTemplate("System.Property.Bind(__v_pr__ => o = __v_pr__, () => o)");

private BoundExpression BindPropertyReferenceExpression(PrefixUnaryExpressionSyntax node, DiagnosticBag diagnostics)
{
    return RedirectDiagnostics(diagnostics, node, redirected =>
        BindExpression(_propertyReferenceTemplate.Replace("o", node.Operand).Syntax, redirected));
}

We’ll come back to that RedirectDiagnostics part later. The key point is that I’m creating an instance of my new class SyntaxTemplate as a static, so it is reused for the lifetime of the compiler. It’s immutable, hence thread-safe. Then every time I need to bind something like %foo, I can just replace the o in the template with foo (which is in node.Operand). Replace returns a new SyntaxTemplate rather than modifying the original (that’s what immutability is all about).

Again, binding is a recursive, procedural system. There’s a big switch statement that calls out to methods that bind various things, so we need to hook our new method BindPropertyReferenceExpression into that:

private BoundExpression BindExpressionInternal(ExpressionSyntax node, DiagnosticBag diagnostics, bool invoked, bool indexed)
{
    if (IsEarlyAttributeBinder && !EarlyWellKnownAttributeBinder.CanBeValidAttributeArgument(node, this))
    {
        return BadExpression(node, LookupResultKind.NotAValue);
    }

    Debug.Assert(node != null);
    switch (node.Kind)
    {
        ...

        // New part
        case SyntaxKind.PropertyReferenceExpression:
            return BindPropertyReferenceExpression((PrefixUnaryExpressionSyntax)node, diagnostics);

See how there’s a switch statement on an enum, then a cast – all the kinds of thing that beginners are told not to do when they learn C#, because supposedly virtual method dispatch on a single object solves all problems. (Oh wait, no it doesn’t.) But still, it wouldn’t make sense to have a built-in type switch in languages like C#, Java or C++ (except apparently in one situation).

Anyway, BindExpression calls BindExpressionInternal, which calls our new BindPropertyReferenceExpression method, which expands our template and passes it to BindExpression… we’re going in circles! But it’s okay. The reason this doesn’t asplode the stack is because our template doesn’t include further references to %.

Now, about that RedirectDiagnostics wrinkle. The binding process has a DiagnosticBag object that gets passed around. Any errors found are thrown in the bag. Each error has a Location object, identifying the place in the user’s source code where the error was spotted, so the locations were discovered at the parsing stage. The problem we have is that we parse our template code separately, so the locations bear no relation to the user’s source code. This means that the IDE’s text editor cannot put red squiggles in the right place.

To fix this, I literally fix the diagnostics:

private T RedirectDiagnostics<T>(DiagnosticBag diagnostics, 
             CSharpSyntaxNode nodeWithLocation, 
             Func<DiagnosticBag, T> generate)
{
    var captured = new DiagnosticBag();
    var result = generate(captured);

    foreach (var diag in captured.AsEnumerable().OfType<DiagnosticWithInfo>())
        diagnostics.Add(new CSDiagnostic(diag.Info, nodeWithLocation.Location));

    return result;
}

The generate function does the binding, but to a “fake” temporary DiagnosticBag, which we then copy into the real one but replacing all the Location objects with a single good location. This isn’t ideal. Recall that some of the syntax tree was inserted from the user’s source and so had perfectly good locations. I need to figure out a way of detecting whether a location is junk or not. But it sort of works.

So, we have binding for %. For * we have to enhance the existing code, branching based on whether operand is a pointer:

private static readonly SyntaxTemplate _pointerIndirectionTemplate = new SyntaxTemplate("p.Value");

// Based on ExpressionBinder::bindPtrIndirection.
private BoundExpression BindPointerIndirectionExpression(PrefixUnaryExpressionSyntax node, DiagnosticBag diagnostics)
{
    BoundExpression operand = BindValue(node.Operand, diagnostics, GetUnaryAssignmentKind(node.Kind));

    // Try using the template on anything that isn't a pointer
    if (!operand.Type.IsPointerType())
        return RedirectDiagnostics(diagnostics, node, redirected => 
            BindExpression(_pointerIndirectionTemplate.Replace("p", node.Operand).Syntax, redirected));

    TypeSymbol pointedAtType;
    bool hasErrors;
    BindPointerIndirectionExpressionInternal(node, operand, diagnostics, out pointedAtType, out hasErrors);

    return new BoundPointerIndirectionOperator(node, operand, pointedAtType ?? CreateErrorType(), hasErrors);
}

We can even use the template technique for type declarations – just replace T with whatever we get from the user:

private static readonly SyntaxTemplate _propertyReferenceTypeTemplate = new SyntaxTemplate("System.IProperty<T>");

internal Symbol BindNamespaceOrTypeOrAliasSymbol(ExpressionSyntax syntax, DiagnosticBag diagnostics, ConsList<Symbol> basesBeingResolved, bool suppressUseSiteDiagnostics)
{
    switch (syntax.Kind)
    {
        ...

        case SyntaxKind.PropertyReferenceType:
            {
                return RedirectDiagnostics(diagnostics, syntax, redirected => BindNamespaceOrTypeOrAliasSymbol(
                    _propertyReferenceTypeTemplate.Replace("T", ((PropertyReferenceTypeSyntax)syntax).ElementType).Syntax, 
                    redirected, basesBeingResolved, suppressUseSiteDiagnostics));
            }

That’s every change needed to support the feature. If you want to play with it (and even add features of your own using the SyntaxTemplate class), I’ve updated my fork with all these changes. You will need to define the System.IProperty and System.Property types – it will work if you just paste the code.

Advertisements
Categories: Uncategorized Tags:

Adventures in Roslyn: Using pointer syntax as a shorthand for IEnumerable

April 26, 2014 1 comment

Another quickie extension to C#. In the current language, a type declaration T! is shorthand for Nullable.

But equally important in modern C# programs are sequences of values, so a similar shorthand for IEnumerable would be ideal. The asterisk symbol is underused (you can suffix a type with asterisk to make a pointer, but only in unsafe contexts), and this was the choice made by the intriguing research language Cω that influenced LINQ, so let’s copy that:

    int* numbers = new[] { 1, 6, 55, 4, 11 };

    foreach (var n in numbers)
        Console.WriteLine(n);

In Binder_Symbols.cs we find:

    case SyntaxKind.PointerType:
        {
            var node = (PointerTypeSyntax)syntax;            
            ReportUnsafeIfNotAllowed(node, diagnostics);

            var elementType = BindType(node.ElementType, diagnostics, basesBeingResolved);
            if (elementType.IsManagedType)
            {
                // "Cannot take the address of, get the size of, or declare a pointer to a managed type ('{0}')"
                Error(diagnostics, ErrorCode.ERR_ManagedAddr, node, elementType);
            }

            return new PointerTypeSymbol(elementType);
        }

The call ReportUnsafeIfNotAllowed is what checks for the unsafe context. So I commented that out and added my own check for not-unsafe, before doing something very similar to what happens for T? -> Nullable elsewhere in the same file:

    case SyntaxKind.PointerType:
        {
            var node = (PointerTypeSyntax)syntax;

            //ReportUnsafeIfNotAllowed(node, diagnostics);
            if (this.IsIndirectlyInIterator || !this.InUnsafeRegion)
            {
                NamedTypeSymbol enumerableT = GetSpecialType(SpecialType.System_Collections_Generic_IEnumerable_T, diagnostics, syntax);
                TypeSymbol typeArgument = BindType(node.ElementType, diagnostics, basesBeingResolved);
                NamedTypeSymbol constructedType = enumerableT.Construct(typeArgument);
                if (ShouldCheckConstraints)
                {
                    constructedType.CheckConstraints(this.Compilation, this.Conversions, syntax.Location, diagnostics);
                }
                return constructedType;
            }

            var elementType = BindType(node.ElementType, diagnostics, basesBeingResolved);
            if (elementType.IsManagedType)
            {
                // "Cannot take the address of, get the size of, or declare a pointer to a managed type ('{0}')"
                Error(diagnostics, ErrorCode.ERR_ManagedAddr, node, elementType);
            }

            return new PointerTypeSymbol(elementType);
        }

And we’re done! We can now say things like:

static int *Squares(int *nums)
{
    return nums.Select(n => n * n);
}

Roslyn may not be very pretty – it’s a hand-crafted top-down system, so it looks nothing like the “text book” functional parsers. But it’s a real product as opposed to a clean example, so that’s hardly surprising. Regardless, it is proving easy (even fun) to hack.

Categories: Uncategorized Tags:

Adventures in Roslyn: Adding crazily powerful operator overloading to C# 6

April 24, 2014 7 comments

I’m going to show you how to enable a new kind of operator overloading by adding exactly four (4) lines of code to a single file in the C# 6 compiler preview. Yes, I was surprised too!

After seeing the video of Anders Hejlsberg showing how easy it is to hack the new open source C# compiler, I had to give it a try.

My aim was (I assumed) a lot more ambitious and crazy than his demo. I thought it would take ages to figure out. But it was still tempting to aim high and actually implement a substantial new feature, because there are a few I’ve been wondering about over the years.

Ever since LINQ query syntax was added to the language, I’ve wished that operator overloading worked the same way. The where keyword gets turned into a call to a Where method. And it doesn’t matter where or how that method is defined. It can be an extension method, an ordinary method, a virtual method. In fact there are few other language features that map to well known member names: a collection initializer is turned into several calls to a method called Add, and the await keyword expands into calls to several methods.

So my use case is very simple. Suppose I have a couple of sequences of values:

var nums1 = new[] { 1, 2, 3 };
var nums2 = new[] { 4, 5 };

var nums3 = nums1 + nums2;

That last line, which I’d like to concatenate the two sequences, simply isn’t going to work because operator + is not defined on IEnumerable. Nor is there a way to make it work for all sequences in standard C#. That would require the ability to implement an operator overload using extension methods! Such a thing does not exist. But it would be pretty useful.

Suppose if the compiler couldn’t find a standard meaning for +, it tried looking for a method available the left-hand-side value called Addition. (NB. Add is already taken by collection initializers as I previously noted).

public static class EnumerableOperatorExtensions
{
    public static IEnumerable<T> Addition<T>(this IEnumerable<T> left, IEnumerable<T> right)
    {
        return left.Concat(right);
    }
}

Of course, Concat is already there to do the real work for us: the above incantation just makes it available under a standardised name.

So let’s get to work. To play along at home, download the Roslyn source, read the instructions for building/debugging, get all the prerequisites (the instructions seem to be flawless as far as I can tell), and make sure you’re able to build and hit F5 to bring up Visual Studio. You’ll find you can set breakpoints and they will be hit (from multiple threads) as VS2013 compiles code on the fly as you edit it, to provide intellisense, etc.

The first thing I had to do was find those well-known member names, such as Where. Obviously it wouldn’t be that easy, but I tried a simple search for the quoted string "Where"… Oh, turns out it really is that easy!

This is the first hit:

void ReduceWhere(WhereClauseSyntax where, QueryTranslationState state, DiagnosticBag diagnostics)
{
    // A query expression with a where clause
    //     from x in e
    //     where f
    //     ...
    // is translated into
    //     from x in ( e ) . Where ( x => f )
    var lambda = MakeQueryUnboundLambda(state.RangeVariableMap(), state.rangeVariable, where.Condition);
    var invocation = MakeQueryInvocation(where, state.fromExpression, "Where", lambda, diagnostics);
    state.fromExpression = MakeQueryClause(where, invocation, queryInvocation: invocation);
}

That MakeQueryInvocation looked intriguing. It calls onto another helper called MakeInvocationExpression, which takes a receiver for the method call, a method name and an immutable array of arguments, and is commented as:

// Helper method to create a synthesized method invocation expression.

On searching for calls to it, as you’d expect, I found it being used for collection initializers and await in exactly the same way. All I needed was to find a spot in the binding of operators where we’re just about to give up and emit an error, and then try MakeInvocationExpression.

The next part I did with a mixture of searching for likely words in the source and then setting breakpoints to see if they got hit. Eventually I found a method Binder.BindSimpleBinaryOperator in the file Binder_Operator.cs. Actually there are two overloads of it: the four-argument overload does the real work. (The two-argument overload is just a wrapper that avoids too much recursion when dealing with chained operators by implementing its own stack.)

Anyway, it works by calling another helper, BinaryOperatorOverloadResolution, which implements the standard C# rules, and then it checks if it worked:

if (!best.HasValue)
{
    resultOperatorKind = kind;
    resultType = CreateErrorType();
    hasErrors = true;
}

That’s where it gives up! So that’s where we need MOAR CODE:

if (!best.HasValue)
{
    string methodName = Enum.GetName(typeof(BinaryOperatorKind), kind);                
    var methodCall = MakeInvocationExpression(node, left, methodName, ImmutableArray.Create(right), diagnostics);
    if (methodCall != null && !methodCall.HasAnyErrors)
        return methodCall;

    resultOperatorKind = kind;
    resultType = CreateErrorType();
    hasErrors = true;
}

Look how damn lazy I was. The enum BinaryOperatorKind defines Addition, Subtraction, etc., so I just get the string name of the value to use as the method name. If MakeInvocationExpression seems to have worked, I return the result.

But I was also quite careful. By ensuring the standard is followed first, and the new behaviour only kicks in for code that would otherwise be malformed, I don’t change the meaning of existing programs.

And that’s it. Here’s a look at what happens in Visual Studio when I run it and enter my test case, but first without defining an extension method:

Rosyln shows error message

Note the error message! It’s telling us we need to write an Addition method that takes one argument. In the intellisense! I didn’t have to do anything in particular to make that happen.

Then when we add the declaration of the extension method:

Rosyln shows no error message

The red squiggle has gone, and num3 has the right type. And when I hit F5, I see the expected concatenated output.

I am astonished.

Here’s a fork with this small change.

There is still more to investigate. For example:

interface IThing
{
    IThing Addition(IThing other);
    IThing Subtraction(IThing other);
}

static void Test<T>(T a, T b) where T : IThing
{
    var r1 = a + b;
    var r2 = a - b;
}

That works! Oh yes, you can do operators on generic parameter types. Couldn’t do that before.

However, what about ==? That doesn’t work – it seems the compiler handles equality comparison separately, and doesn’t get to my new code. Maybe that’s a good thing… But on the other hand, maybe not. Will take another look tomorrow.

Categories: Uncategorized Tags: