Stopping Garbage Collection in .NET Core 3.0 (part II)

Let’s see how it’s implemented. For why it is implemented, see Part I.

using System;
using System.Diagnostics.Tracing;
using System.Runtime;

The FxCop code analyzers get upset if I don’t declare this, which also impede me from using unsigned numeral types in interfaces.

[assembly: CLSCompliant(true)]

namespace LNativeMemory
{

The first piece of the puzzle is to implement an event listener. It is a not-obvious (for me) class. I don’t fully understand the lifetime semantics, but the code below seems to do the right thing.

The interesting piece is _started and the method Start(). The constructor for EventListener allocates plenty of stuff. I don’t want to do those allocations after calling TryStartNoGCRegion because they would use part of the GC Heap that I want for my program.

Instead, I create it before such call, but then I make it ‘switch on’ just after the Start() method is called.

    internal sealed class GcEventListener : EventListener
    {
        Action _action;
        EventSource _eventSource;
        bool _active = false;

        internal void Start() { _active = true; }
        internal void Stop() { _active = false; }


As described in part one, you pass a delegate at creation time, which is called when garbage collection is restarted.

        internal GcEventListener(Action action) => _action = action ?? throw new ArgumentNullException(nameof(action));


We register to all the events coming from .NET. We want to call the delegate at the exact point when garbage collection is turned on again.
We don’t have a clean way to do that (aka there is no runtime event we can hook up to, see here, so listening to every single GC event gives us the most chances of doing it right. Also it ties us the least to any pattern of events, which
might change in the future.

        // from https://docs.microsoft.com/en-us/dotnet/framework/performance/garbage-collection-etw-events
        private const int GC_KEYWORD = 0x0000001;
        private const int TYPE_KEYWORD = 0x0080000;
        private const int GCHEAPANDTYPENAMES_KEYWORD = 0x1000000;

        protected override void OnEventSourceCreated(EventSource eventSource)
        {
            if (eventSource.Name.Equals("Microsoft-Windows-DotNETRuntime", StringComparison.Ordinal))
            {
                _eventSource = eventSource;
                EnableEvents(eventSource, EventLevel.Verbose, (EventKeywords)(GC_KEYWORD | GCHEAPANDTYPENAMES_KEYWORD | TYPE_KEYWORD));
            }
        }


For each event, I check if the garbage collector has exited the NoGC region. If it has, then let’s invoke the delegate.

        protected override void OnEventWritten(EventWrittenEventArgs eventData)
        {
            var eventName = eventData.EventName;
            if(_active && GCSettings.LatencyMode != GCLatencyMode.NoGCRegion)
            {
                _action?.Invoke();
            }
        }
    }


Now that we have our event listener, we need to hook it up. The code below implements what I described earlier.
1. Do your allocations for the event listener
2. Start the NoGc region
3. Start monitoring the runtime for the start of the NoGC region

    public static class GC2
    {
        static private GcEventListener _evListener;

        public static bool TryStartNoGCRegion(long totalSize, Action actionWhenAllocatedMore)
        {

            _evListener = new GcEventListener(actionWhenAllocatedMore);
            var succeeded = GC.TryStartNoGCRegion(totalSize, disallowFullBlockingGC: false);
            _evListener.Start();

            return succeeded;
        }


As puzzling as this might be, I provisionally believe it to be correct. Apparently, even if the GC is not in a NoGC region, you still need to call
EndNoGCRegion if you have called TryStartNoGCRegion earlier, otherwise your next call to TryStartNoGCRegion will fail. EndNoGCRegion will throw an exception, but that’s OK. Your next call to TryStartNoGCRegion will now succeed.

Now read the above repeatedly until you got. Or just trust that it works somehow.

        public static void EndNoGCRegion()
        {
            _evListener.Stop();

            try
            {
                GC.EndNoGCRegion();
            } catch (Exception)
            {

            }
        }
    }


This is used as the default behavior for the delegate in the wrapper class below. I was made aware by the code analyzer that I shouldn’t be throwing an OOF exception here. At first, I dismissed it, but then it hit me. It is right.

We are not running out of memory here. We simply have allocated more memory than what we declared we would. There is likely plenty of memory left on the machine. Thinking more about it, I grew ashamed of my initial reaction. Think about a support engineer getting an OOM exception at that point and trying to figure out why. So, always listen to Lint …

    public class OutOfGCHeapMemoryException : OutOfMemoryException {
        public OutOfGCHeapMemoryException(string message) : base(message) { }
        public OutOfGCHeapMemoryException(string message, Exception innerException) : base(message, innerException) { }
        public OutOfGCHeapMemoryException() : base() { }

    }


This is an utility class that implements the IDisposable pattern for this scenario. The size of the default ephemeral segment comes fromhere.

    public sealed class NoGCRegion: IDisposable
    {
        static readonly Action defaultErrorF = () => throw new OutOfGCHeapMemoryException();
        const int safeEphemeralSegment = 16 * 1024 * 1024;

        public NoGCRegion(int totalSize, Action actionWhenAllocatedMore)
        {
            var succeeded = GC2.TryStartNoGCRegion(totalSize, actionWhenAllocatedMore);
            if (!succeeded)
                throw new InvalidOperationException("Cannot enter NoGCRegion");
        }

        public NoGCRegion(int totalSize) : this(totalSize, defaultErrorF) { }
        public NoGCRegion() : this(safeEphemeralSegment, defaultErrorF) { }

        public void Dispose() => GC2.EndNoGCRegion();
    }
}

Stopping Garbage Collection in .NET Core 3.0 (part I)

For how all of the below is implemented, see Part II.

Code at https://github.com/lucabol/LNativeMemory/tree/master/LNativeMemory

Scenario

You have an application or a particular code path of your application that cannot take the pauses that GC creates. Typical examples are real time systems, tick by tick financial apps, embedded systems, etc …

Disclaimer

For any normal kind of applications, YOU DON’T NEED TO DO THIS. You are likely to make your application run slower or blow up memory. If you have an hot path in your application (i.e. you are creating an editor with Intellisense), use the GC latency modes.

Use the code below just under extreme circumstance as it is untested, error prone and wacky. You are probably better off waiting for an official way of doing it (i.e. when this
is implemented)

The problem with TryStartNoGCRegion

There is a GC.TryStartNoGCRegion in .NET. You can use it to stop garbage collection passing a totalBytes parameter that represents the maximum amount of memory that you plan to allocate from the managed heap. Matt describes it here.

The problem is that when/if you allocate more than that, garbage collection resumes silently. Your application continues to work,but with different performance characteristics from what you expected.

The idea

The main idea is to use ETW events to detect when a GC occurs and to call an user provided delegate at that point. You can then do whatever you want in the delegate (i.e. shutdown the process, send email to support, start another NoGC region, etc…).

Also, I have wrapped the whole StartNoGCRegion/EndNoGCRegion in an IDisposable wrapper for easy of use.

The tests

Let’s start by looking at how you use it.

using Xunit;
using System.Threading;

namespace LNativeMemory.Tests
{

    // XUnit executes all tests in a class sequentially, so no problem with multi-threading calls to GC
    public class GC2Tests
    {


We need to use a timer to maximize the chances that a GC happens in some of the tests. Also we allocate an amount that should work in all GC configuration as per the article above. trigger is a static field so as to stay zero-allocation (otherwise the delegate will have to capture the a local trigger variable creating a heap allocated closure). Not that it matters any to be zero-allocation in this test, but I like to keep ClrHeapAllocationAnalyzer happy.

BTW: XUnit executes all tests in a class sequentially, so no problem with multi-threading calls to GC.

        const int sleepTime = 200;
        const int totalBytes = 16 * 1024 * 1024;
        static bool triggered = false;


First we test that any allocation that doesn’t exceed the limit doesn’t trigger the call to action.

        [Fact]
        public void NoAllocationBeforeLimit()
        {
            try
            {
                triggered = false;
                var succeeded = GC2.TryStartNoGCRegion(totalBytes, () => triggered = true);
                Assert.True(succeeded);
                Thread.Sleep(sleepTime);
                Assert.False(triggered);

                var bytes = new byte[99];
                Thread.Sleep(sleepTime);
                Assert.False(triggered);
            }
            finally
            {
                GC2.EndNoGCRegion();
                triggered = false;
            }
        }


Then we test that allocating over the limit does trigger the action. To do so we need to trigger a garbage collection. Our best attempt is with the goofy for loop. If you got a better idea, shout.

        [Fact]
        public void AllocatingOverLimitTriggersTheAction()
        {
            try
            {
                triggered = false;
                var succeeded = GC2.TryStartNoGCRegion(totalBytes, () => triggered = true);
                Assert.True(succeeded);
                Assert.False(triggered);

                for (var i = 0; i < 3; i++) { var k = new byte[totalBytes]; }

                Thread.Sleep(sleepTime);
                Assert.True(triggered);
            }
            finally
            {
                GC2.EndNoGCRegion();
                triggered = false;
            }
        }

We also test that we can go back and forth between starting and stopping without messing things up.

        [Fact]
        public void CanCallMultipleTimes()
        {

            for (int i = 0; i  triggered = true))
            {
                for (var i = 0; i < 3; i++) { var k = new byte[totalBytes]; }
                Thread.Sleep(sleepTime);
                Assert.True(triggered);
                triggered = false;
            }
        }
    }
}

A Stack data structure implementation using Span

I am back in Microsoft and today we talk about the code below, which is on github here:

  1. public ref struct SpanStack<T>
  2. {
  3.     private Span memory;
  4.     private int index;
  5.     private int size;
  6.     public SpanStack(Span mem) { memory = mem; index = 0; size = mem.Length; }
  7.     public bool IsEmpty() => index < 0;
  8.     public bool IsFull() => index > size – 1;
  9.     public void Push(T item) => memory[index++] = item;
  10.     public T Pop() => memory[–index];
  11. }
  12. public static class SpanExtensions
  13. {
  14.     public static SpanStack AsStack<T>(this Span span) => new SpanStack(span);
  15. }

This Stack data structure can be used over memory that resides on the stack, heap or unmanaged heap. If you know about Span this should immediately make sense to you.

This has to be a ref struct because it contains a Span. It can’t be used on the heap (i.e. in lambdas, async, class field, …). You have to build it on top of Memory if you need that. Also, you can happily blow the stack with this guy …

Let’s micro-benchmark it with BenchmarkDotNet.  For example, a postfix calculator. Let’s first do it naively using inheritance and the generic Stack class in the framework.

This is the naive object model:

  1. abstract class Token {}
  2. sealed class Operand: Token
  3. {
  4.     public int Value { get; }
  5.     public Operand(int v) { Value = v; }
  6. }
  7. abstract class Operator: Token {
  8.     abstract public int Calc(int a, int b);
  9. }
  10. sealed class Add: Operator
  11. {
  12.     public override int Calc(int a, int b) => a + b;
  13. }
  14. sealed class Mult : Operator
  15. {
  16.     public override int Calc(int a, int b) => a * b;
  17. }
  18. sealed class Minus : Operator
  19. {
  20.     public override int Calc(int a, int b) => a – b;
  21. }

Let’s then do it trying to be a bit more performance aware using a stack friendly representation:

  1. public enum TokenType { Operand, Sum, Mult, Minus}
  2. readonly struct SToken
  3. {
  4.     public TokenType Type { get; }
  5.     public int Value { get; }
  6.     public SToken(TokenType t, int v) { Type = t; Value = v; }
  7.     public SToken(TokenType t) { Type = t; Value = 0; }
  8.     public int Calc(int a, int b) =>
  9.                Type == TokenType.Sum   ? a + b :
  10.                Type == TokenType.Minus ? a – b :
  11.                Type == TokenType.Minus ? a * b :
  12.                throw new Exception(“I don’t know that one”);
  13. }

Perhaps not overtly elegant, but not that terrible either. You got to love those expression bodied methods and throw-expression.

We then setup things (I know I could/should parse a string here):

  1. static Token[] tokens;
  2. static SToken[] stokens;
  3. [GlobalSetup]
  4. public void Setup()
  5. {
  6.     tokens = new Token[] { new Operand(2), new Operand(3), new Operand(4), new Add(),
  7.                            new Mult(), new Operand(5), new Minus() };
  8.     stokens = new SToken[] { new SToken(TokenType.Operand, 2),
  9.                              new SToken(TokenType.Operand, 3), new SToken(TokenType.Operand, 4),
  10.                              new SToken(TokenType.Sum),  new SToken(TokenType.Mult),
  11.                              new SToken(TokenType.Operand, 5), new SToken(TokenType.Minus)};
  12. }

And first test the naive object model with the standard Stack from System.Collections.Generic.

  1. [Benchmark]
  2. public int PostfixEvalStack()
  3. {
  4.     var stack = new Stack(100);
  5.     foreach (var token in tokens)
  6.     {
  7.         switch (token)
  8.         {
  9.             case Operand t:
  10.                 stack.Push(t);
  11.                 break;
  12.             case Operator o:
  13.                 var a = stack.Pop() as Operand;
  14.                 var b = stack.Pop() as Operand;
  15.                 var result = o.Calc(a.Value, b.Value);
  16.                 stack.Push(new Operand(result));
  17.                 break;
  18.         }
  19.     }
  20.     return (stack.Pop() as Operand).Value;
  21. }

Then let’s just swap out our own lean-and-mean stack:

  1. [Benchmark]
  2. public int PostfixEvalSpanStack()
  3. {
  4.     Span span = new Token[100];
  5.     var stack = span.AsStack();
  6.     foreach (var token in tokens)
  7.     {
  8.         switch (token)
  9.         {
  10.             case Operand t:
  11.                 stack.Push(t);
  12.                 break;
  13.             case Operator o:
  14.                 var a = stack.Pop() as Operand;
  15.                 var b = stack.Pop() as Operand;
  16.                 var result = o.Calc(a.Value, b.Value);
  17.                 stack.Push(new Operand(result));
  18.                 break;
  19.         }
  20.     }
  21.     return (stack.Pop() as Operand).Value;
  22. }

And finally let’s go the whole way, lean object model and lean data structure, everything on the stack:

  1. [Benchmark(Baseline = true)]
  2. public int PostfixEvalSpanStackStructTypes()
  3. {
  4.     Span span = stackalloc SToken[100];
  5.     var stack = span.AsStack();
  6.     foreach (var token in stokens)
  7.     {
  8.         if (token.Type == TokenType.Operand)
  9.         {
  10.             stack.Push(token);
  11.         } else {
  12.             var a = stack.Pop();
  13.             var b = stack.Pop();
  14.             var result = token.Calc(a.Value, b.Value);
  15.             stack.Push(new SToken(TokenType.Operand, result));
  16.             break;
  17.         }
  18.     }
  19.     return stack.Pop().Value;
  20. }

We also want to check that we didn’t code anything stupid and finally run the benchmark.

  1. static void Test()
  2. {
  3.     var p = new Program();
  4.     p.Setup();
  5.     Trace.Assert(p.PostfixEvalStack() == p.PostfixEvalSpanStack() &&
  6.                  p.PostfixEvalSpanStack() == p.PostfixEvalSpanStackStructTypes());
  7. }
  8. static void Main(string[] args)
  9. {
  10.     Test();
  11.     var summary = BenchmarkRunner.Run();
  12. }

On my machine I get these results:


BenchmarkDotNet=v0.10.14, OS=Windows 10.0.16299.431 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i7-6600U CPU 2.60GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
Frequency=2742185 Hz, Resolution=364.6727 ns, Timer=TSC
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.1.0-rc1 (CoreCLR 4.6.26426.02, CoreFX 4.6.26426.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-rc1 (CoreCLR 4.6.26426.02, CoreFX 4.6.26426.04), 64bit RyuJIT
Method Mean Error StdDev Scaled ScaledSD
PostfixEvalSpanStackStructTypes 76.24 ns 1.563 ns 2.857 ns 1.00 0.00
PostfixEvalSpanStack 168.65 ns 5.280 ns 15.319 ns 2.22 0.22
PostfixEvalStack 334.56 ns 7.387 ns 20.593 ns 4.39 0.31

Your mileage might vary. I want to emphasize that I am just playing with things. I haven’t done any deep analysis of this benchmark. There can be flaws, etc… etc…

Still, I find the idea of data structures which are memory-location-independent rather fascinating.

Building a stock alert system with Google Script

 

This is obsolete as Yahoo stopped their quote service. I have a new spreadsheet here.

When I thought about it, I realized that my ideal system would be a spreadsheet where to add tickers and alert levels. Under the covers, the system would need to check the current price of a ticker, compare it with the alert level and send me email when triggered.

Also the whole thing shouldn’t be running from my machine at home, but from somewhere on the internet.

Google script fit the bill. Let’s see how it works.

Script is here. Sheet is here.

First a utility function to send errors via email, which will be used throughout the script.

function emailError(e) {
  MailApp.sendEmail("lucabolg@gmail.com", "Watchlist Error",
                    "\r\nMessage: " + e.message
                    + "\r\nFile: " + e.fileName
                    + "\r\nLine: " + e.lineNumber
                    + "\r\nLOg: " + Logger.getLog())
}

Then another one to check if the price downloaded from the internet is sensible.

function validPrice(price) {
 return price != 'undefined' && price > 0.1
}

We then need one to retrieve the current price of a ticker from the array of data returned from the internet:

// Find the current price of a ticker in an array of data where the ticker is the first column
function getQuote(data, ticker) {
  var ticker = ticker.trim().toLowerCase()

  for(var i = 0; i = 22 && hour <= 23) || value != "close"
}

With all of that in place, we can now look at the main function. First we load up the spreadsheet and get the values and headers we care about. This would be more robust if we looked up the sheet by name. Also the id of the sheet is burned in the code. You'll need to change it if you want to make it point to your own.

// Check spreadsheet with tickers and stop prices, send email when a stop is hit and mark the row as 'Executed'.
function checkQuotes() {
try {

// Get all data from spreadsheet in one web call.
var ss = SpreadsheetApp.openById("1WQf2AiBPQW5HLzCyGgsFlKN0f1HTOWAteJ5bJCXVnlc")
var range = ss.getSheets()[0].getDataRange()
var values = range.getValues()
var headers = values[0]
var rows = ObjApp.rangeToObjects(values)
var body = ""
var now = new Date()

Notice 'ObjApp' is part of the ObjService library to make the code a bit more maintainable, instead of scattering column numbers in the code.

Now we get all the tickers and download the prices from Yahoo (we try three times as it occasionally fails.

    // Fish out all tickers from col 0 where Status (col 4) is not executed
    var tickers = []
    for(var i = 1; i < rows.length; i++) {// dont' process the headers
      if((rows[i]).executed.toLowerCase() == 'active' && isRightTime(rows[i], now)) tickers.push((rows[i]).ticker.trim().toLowerCase())
    }
    Logger.log("Tickers:%s" ,tickers)

    if(tickers.length == 0) return // Nothing to process

    // Get ticker, real time bid, real time ask for all tickers in one web call
    var url = "http://finance.yahoo.com/d/quotes.csv?s=" + tickers.join("+") + "&f=sl1"//"&f=sb2b3"

    // Try 3 times before giving up
    for(var i = 0; i < 3; i++) {
      try {
        var response = UrlFetchApp.fetch(url)
        break;
      } catch(e) {
      }
    }

    Logger.log("Response:\n%s", response)
    var data = Utilities.parseCsv(response.getContentText())
    Logger.log("Data:\n%s", data)

Once that is done, we enter the main loop. The concept is simple, for each row we check the price and, if the price is above/below the alert we add it to the body string and mark the row in the sheet so that we don’t process it again next time. A the end, we email the body variable if not null.

First we check that we haven’t already executed this row:

    for(var i = 1; i < rows.length; i++) {// dont' process the headers
      var current = rows[i]
      if(current.executed.trim().toLowerCase() == 'executed') continue // no need to process it as it is 'Executed'

      var symbol = current.operator
      var stop = current.stop

If it's still active and if it is the right time, we check if the alert is triggered. If it is we add the text to the body variable.

      if(isRightTime(current, now)) {
        var price = getQuote(data, current.ticker)
        if( (symbol.trim() == ">" && price > stop) ||
           (symbol.trim() == "<" && price < stop)) {

          current.executed = "Executed"
          current.price = price

          body += [current.kind, current.ticker, current.price, current.operator, current.stop, "\r\n"].join(" ")
          Logger.log("Body in loop:\n%s", body)
        }
      }
    }

If body is not empty, that means that something was triggered, so we send the email.

    if(body != "") {
      Logger.log("Body final:%s", body)
      MailApp.sendEmail('lucabolg@gmail.com', 'Watchlist: stops triggered', body)
      var data = ObjApp.objectToArray(headers, rows)
      data.unshift(headers)
      range.setValues(data)
    }

If an error was generated, then we send the error email.

  } catch (e) {
    Logger.log(e.lineNumber + ":" + e.message)
    emailError(e)
  }
}

My experience overall was remarkable. The learning curve was very quick and the web editor works remarkably well (well, stepping through code is rather slow).

Overall, if Google has all your data (in Drive) and you can write code to manipulate it (in Google script), why do I need my home computer again? I can just have a small screen that connects to the internet and I’m done.

That’s probably true for me apart from two things that I haven’t found in web form: editing of images in the raw format and a sophisticated portfolio application. If I find these two, I’m ready to give up my life to Google …

Funky C for literate programming

1 Main ideas

This is a port of LLIte in C. The reason for it is to experiment with writing functional code in standard C and compare the experience with using a functional language like F#. It is in a way a continuation of this and this posts.

I will be using glib and an header of convenient macros/functions to help me (lutils.h). I don’t think that is cheating. Any modern C praticoner has its bag of tricks …

Don’t tell me this is not idiomatic C. I already know that.

#include <string.h>
#include <stdbool.h>

#include <glib.h>
#include <glib/gprintf.h>

#ifdef ARENA
#include "arena.h"
#endif

#include "lutils.h"

2 Lack of tuples

In the snippet below I overcomed such deficiency by declaring a struct. Using the new constructor syntax makes initializing a static table simple.

typedef struct LangSymbols { char language[40]; char start[10]; char end[10];} LangSymbols;

static
LangSymbols* s_lang_params_table[] = {
    &(LangSymbols) {.language = "fsharp",   .start = "(*" "*", .end = "*" "*)"},
    &(LangSymbols) {.language = "c",        .start = "/*" "*", .end = "*" "*/"},
    &(LangSymbols) {.language = "csharp",   .start = "/*" "*", .end = "*" "*/"},
    &(LangSymbols) {.language = "java",     .start = "/*" "*", .end = "*" "*/"},
    NULL
};

3 Folding over arrays

I need to gather all the languages, aka perform a fold over the array. You might have noticed the propensity to add a NULL terminator marker to arrays (as for strings). This allows me to avoid passing a size to functions and makes simpler writing utility macros (as foreach below) more simply.

In the rest of the program, every time I end a function with _z, it is because I consider it generally usable and I add a version of it without the _z to lutils.h.

#define array_foreach_z(p) for(; *symbols != NULL; ++symbols)

static
char* summary(LangSymbols** symbols) {

    GString* langs = g_string_sized_new(20);
    array_foreach(symbols) g_string_append_printf(langs, "%s ", (*symbols)->language);

    g_string_truncate(langs, strlen(langs->str) - 1);

    GString* usage = g_string_sized_new(100);

    g_string_printf(usage,
        "You should specify:nt. either -l or -o and -pn"
        "t. either -indent or -P and -Cn"
        "t. -l supports: %s"
        ,langs->str);

    return usage->str;
}

Find an item in an array based on some expression. Returns NULL if not found. Again, this is a common task, hence I’ll abstract it out with a macro (that ends up being a cute use of gcc statment expressions).

#define array_find_z(arr, ...)                          
    ({                                                  
        array_foreach(arr) if (__VA_ARGS__) break;      
        *arr;                                           
    })

static
LangSymbols* lang_find_symbols(LangSymbols** symbols, char* lang) {
    g_assert(symbols);
    g_assert(lang);

    return array_find(symbols, !strcmp((*symbols)->language, lang));
}

4 Deallocating stuff

You might wonder why I don’t seem overly worried about deallocating the memory that I allocate. I haven’t gone crazy(yet). You’ll see.

5 Discriminated unions

Here are the discriminated unions macros from a previous blog post of mine. I’ll need a couple of these and pre-declare two functions.

union_decl(CodeSymbols, Indented, Surrounded)
    union_type(Indented,    int indentation;)
    union_type(Surrounded,  char* start_code; char* end_code;)
union_end(CodeSymbols);

typedef struct Options {
    char*           start_narrative;
    char*           end_narrative;
    CodeSymbols*    code_symbols;
} Options;

static
gchar* translate(Options*, gchar*);

union_decl(Block, Code, Narrative)
    union_type(Code,        char* code)
    union_type(Narrative,   char* narrative)
union_end(Block);

6 Main data structure

We want to use higher level abstractions that standard C arrays, hence we’ll pick a convenient data structure to use in the rest of the code. A queue lets you to insert at the front and back, with just a one pointer overhead over a single linked list. Hence it is my data structure of choice for this program.

static
GQueue* blockize(Options*, char*);

There is already a function in glib to check if a string has a certain prefix (g_str_has_prefix). We need one that returns the remaining string after the prefix. We also define a g_slow_assert that is executed just if G_ENABLE_SLOW_ASSERT is defined

static
char* str_after_prefix(char* src, char* prefix) {
    g_assert(src);
    g_assert(prefix);
    g_slow_assert(g_str_has_prefix(src, prefix));

    while(*prefix != '0')
        if(*src == *prefix) ++src, ++prefix;
        else break;

    return src;
}

7 Tokenizer

The structure of the function is identical to the F# version. The big bread-winners are statement expressions and local functions …

It is interesting how you can replicate the shape of an F# function by substituting ternary operators for match statements.

It is nothing magic, just a way to have a case statment as an expression, but it is suggestive of its more functional counterpart.

#define NL "n"

union_decl(Token, OpenComment, CloseComment, Text)
    union_type(OpenComment, int line)
    union_type(CloseComment,int line)
    union_type(Text,        char* text)
union_end(Token);

GQueue* tokenize(Options* options, char* source) {
    g_assert(options);
    g_assert(source);

    struct tuple { int line; GString* acc; char* rem;};

    bool is_opening(char* src)      { return g_str_has_prefix(src, options->start_narrative);}
    bool is_closing(char* src)      { return g_str_has_prefix(src, options->end_narrative);}
    char* remaining_open (char* src){ return str_after_prefix(src, options->start_narrative);}
    char* remaining_close(char* src){ return str_after_prefix(src, options->end_narrative);}

    struct tuple text(char* src, GString* acc, int line) {
        inline struct tuple stop_parse_text()
            { return (struct tuple) {.line = line, .acc = acc, .rem = src};}

        return  str_empty (src)? stop_parse_text() :
                is_opening(src)? stop_parse_text() :
                is_closing(src)? stop_parse_text() :
                                ({
                                  int line2         = g_str_has_prefix(src, NL) ? line + 1
                                                                                : line;
                                  GString* newAcc   = g_string_append_c(acc, *src);
                                  char* rem         = src + 1;
                                  text(rem, newAcc, line2);
                                });
    }

    GQueue* tokenize_rec(char* src, GQueue* acc, int line) {
        return  str_empty(src)  ?   acc                     :
                is_opening(src) ?   tokenize_rec(remaining_open(src),
                                        g_queue_push_back(acc, union_new(
                                                    Token, OpenComment, .line = line)),
                                        line)        :
                is_closing(src) ?   tokenize_rec(remaining_close(src),
                                               g_queue_push_back(acc, union_new(
                                                    Token, CloseComment, .line = line)),
                                        line)        :
                                ({
                                    struct tuple t = text(src, g_string_sized_new(200), line);
                                    tokenize_rec(t.rem,
                                        g_queue_push_back(acc, union_new(
                                                    Token, Text, .text = t.acc->str)), t.line);
                                 });
    }

    return tokenize_rec(source, g_queue_new(), 1);
}

8 Parser

This again has a similar structure to the F# version, just longer. It is very long because it contains 3 (nested) functions which are on the verbose side in C.

The creation of a error macro is unfortunate. I just don’t know how to adapt g_assert_e so that it works for not pointer returning functions.

I also need a simple function report_error to exit gracefully giving a message to the user. I didn’t found such thing in glib (?)

#define report_error_z(...) G_STMT_START { g_print(__VA_ARGS__); exit(1); } G_STMT_END                                                            

union_decl(Chunk, NarrativeChunk, CodeChunk)
    union_type(NarrativeChunk,  GQueue* tokens)
    union_type(CodeChunk,       GQueue* tokens)
union_end(Chunk);

static
GQueue* parse(Options* options, GQueue* tokens) {
    g_assert(options);
    g_assert(tokens);

    struct tuple { GQueue* acc; GQueue* rem;};

    #define error(...) 
        ({ report_error(__VA_ARGS__); (struct tuple) {.acc = NULL, .rem = NULL}; })

    struct tuple parse_narrative(GQueue* acc, GQueue* rem) {

        bool isEmpty    = g_queue_is_empty(rem);
        Token* h        = g_queue_pop_head(rem);
        GQueue* t       = rem;

        return  isEmpty                 ?
                                    error("You haven't closed your last narrative comment") :
                h->kind == OpenComment  ?
                    error("Don't open narrative comments inside narrative comments at line %i",
                          h->OpenComment.line)                                              :
                h->kind == CloseComment ? (struct tuple) {.acc = acc, .rem = t}             :
                h->kind == Text         ? parse_narrative(g_queue_push_back(acc, h), t)     :
                                          error("Should never get here");
    };

    struct tuple parse_code(GQueue* acc, GQueue* rem) {

        bool isEmpty    = g_queue_is_empty(rem);
        Token* h    = g_queue_pop_head(rem);
        GQueue* t   = rem;

        return  isEmpty                 ? (struct tuple) {.acc = acc, .rem = t}         :
                h->kind == OpenComment  ?
                    (struct tuple) {.acc = acc, .rem = g_queue_push_front(rem, h)}      :
                h->kind == CloseComment ? parse_code(g_queue_push_back(acc, h), rem)    :
                h->kind == Text         ? parse_code(g_queue_push_back(acc, h), rem)    :
                                          error("Should never get here");
    };
    #undef error

    GQueue* parse_rec(GQueue* acc, GQueue* rem) {

        bool isEmpty    = g_queue_is_empty(rem);
        Token* h    = g_queue_pop_head(rem);
        GQueue* t   = rem;

        return  isEmpty                 ? acc                                           :
                h->kind == OpenComment  ? ({
                                           GQueue* emp = g_queue_new();
                                           struct tuple tu = parse_narrative(emp, t);
                                           Chunk* ch = union_new(
                                                Chunk, NarrativeChunk, .tokens = tu.acc );
                                           GQueue* newQ = g_queue_push_back(acc, ch);
                                           parse_rec(newQ, tu.rem);
                                           })                                            :
                h->kind == CloseComment ?
                    report_error_e(
                        "Don't insert a close narrative comment at the start of your"
                        " program at line %i",
                                            h->OpenComment.line)                         :
                h->kind == Text         ?
                                        ({
                                           GQueue* emp = g_queue_new();
                                           struct tuple tu =
                                                parse_code(g_queue_push_front(emp, h), t);
                                           parse_rec(g_queue_push_back
                                            (acc,
                                             union_new(Chunk, CodeChunk, .tokens = tu.acc)),
                                             tu.rem);
                                          })                                                               :
                                          g_assert_no_match;
    }

    return parse_rec(g_queue_new(), tokens);
}

9 Flattener

This follows the usual practice of representing fold as foreach statments (and maps to). Pheraps I shall build better abstractions for them at some point. I also introduce a little macro to simplify writing of GFunc lambdas, given how pervasive they are.

Again, note how heavy ternary operated this is …

#define g_func_z(type, name, ...) lambda(void,                                              
                                        (void* private_it, G_GNUC_UNUSED void* private_no){ 
                                       type name = private_it;                              
                                       __VA_ARGS__                                          
                                })

static
GQueue* flatten(Options* options, GQueue* chunks) {
    GString* token_to_string_narrative(Token* tok) {
        return  tok->kind == OpenComment ||
                tok->kind == CloseComment   ?
                    report_error_e("Cannot nest narrative comments at line %i",
                                   tok->OpenComment.line)                                   :
                tok->kind == Text           ? g_string_new(tok->Text.text)                  :
                                              g_assert_no_match;
    }
    GString* token_to_string_code(Token* tok) {
        return  tok->kind == OpenComment    ?
                report_error_e(
                    "Open narrative comment cannot be in code at line %i."
                    " Pheraps you have an open comment "
                    "in a code string before this comment tag?"
                    , tok->OpenComment.line)                                                :
                tok->kind == CloseComment   ? g_string_new(options->end_narrative)          :
                tok->kind == Text           ? g_string_new(tok->Text.text)                  :
                                              g_assert_no_match;
    }
    Block* flatten_chunk(Chunk* ch) {
        return  ch->kind == NarrativeChunk  ? ({
                               GQueue* tokens = ch->NarrativeChunk.tokens;
                               GString* res = g_string_sized_new(256);
                               g_queue_foreach(tokens, g_func(Token*, tok,
                                                g_string_append(
                                                    res,
                                                    token_to_string_narrative(tok)->str);
                                                ), NULL);
                               union_new(Block, Narrative, .narrative = res->str);
                                               })   :
                ch->kind == CodeChunk       ? ({
                               GQueue* tokens = ch->CodeChunk.tokens;
                               GString* res = g_string_sized_new(256);
                               g_queue_foreach(tokens, g_func(Token*, tok,
                                                        g_string_append(
                                                            res,
                                                            token_to_string_code(tok)->str);
                                                        ), NULL);
                               union_new(Block, Code, .code = res->str);
                                               })   :
                               g_assert_no_match;
    }

    GQueue* res = g_queue_new();
    g_queue_foreach(chunks, g_func(Chunk*, ch,
                                Block* b = flatten_chunk(ch);
                                g_queue_push_tail(res, b);
                                ) ,NULL);
    return res;
}

Now we can tie everything together to build blockize, which is our parse tree.

static
GQueue* blockize(Options* options, char* source) {
    GQueue* tokens  = tokenize(options, source);
    GQueue* blocks  = parse(options, tokens);
    return flatten(options, blocks);
}

10 Define the phases

In C you can easily forward declare function, so you don’t have to come up with some clever escabotage like we had to do in F#.

static
GQueue* remove_empty_blocks(Options*, GQueue*);
static
GQueue* merge_blocks(Options*, GQueue*);
static
GQueue* add_code_tags(Options*, GQueue*);

static
GQueue* process_phases(Options* options, GQueue* blocks) {

    blocks          = remove_empty_blocks(options, blocks);
    blocks          = merge_blocks(options, blocks);
    blocks          = add_code_tags(options, blocks);
    return blocks;
}

static
char* extract(Block* b) {
    return  b->kind == Code         ? b->Code.code          :
            b->kind == Narrative    ? b->Narrative.narrative:
                                      g_assert_no_match;
}

There must be a higher level way to write this utility function …

static
bool is_str_all_spaces(const char* str) {
    g_assert(str);
    while(*str != '0') {
        if(!g_ascii_isspace(*str))
            return false;
        str++;
    }
    return true;
}

static
GQueue* remove_empty_blocks(G_GNUC_UNUSED Options* options, GQueue* blocks) {

    g_queue_foreach(blocks, g_func(Block*, b,
        if(is_str_all_spaces(extract(b)))
            g_queue_remove(blocks, b);
                                   ), NULL);
    return blocks;
}

static
GQueue* merge_blocks(G_GNUC_UNUSED Options*options, GQueue* blocks) {
    return  g_queue_is_empty(blocks)            ? blocks            :
            g_queue_get_length(blocks) == 1     ? blocks            :
                ({
                 Block* h1 = g_queue_pop_head(blocks);
                 Block* h2 = g_queue_pop_head(blocks);
                 h1->kind == Code && h2->kind == Code ? ({
                     char* newCode =
                        g_strjoin("", h1->Code.code, NL, h2->Code.code, NULL);
                     Block* b = union_new(Block, Code, .code = newCode);
                     merge_blocks(options, g_queue_push_front(blocks, b));
                                                         })         :
                 h1->kind == Narrative && h2->kind == Narrative ? ({
                     char* newNarr =
                        g_strjoin(
                            "", h1->Narrative.narrative, NL, h2->Narrative.narrative, NULL);
                     Block* b = union_new(Block, Narrative, .narrative = newNarr);
                     merge_blocks(options, g_queue_push_front(blocks, b));
                                                         })         :
                                                         ({
                     GQueue* newBlocks =
                        merge_blocks(options, g_queue_push_front(blocks, h2));
                     g_queue_push_front(newBlocks, h1);
                                                         });
                 });
}

This really should be in glib …

inline static
gint g_asprintf_z(gchar** string, gchar const *format, ...) {
	va_list argp;
	va_start(argp, format);
	gint bytes = g_vasprintf(string, format, argp);
	va_end(argp);
    return bytes;
}

static
char* indent(int n, char* s) {
    g_assert(s);

    char* ind       = g_strnfill(n, ' ');
    char* tmp;
    g_asprintf(&tmp, "%s%s", ind, s);

    char* withNl;
    g_asprintf(&withNl, "n%s", ind);

    return g_strjoinv(withNl, g_strsplit(tmp, NL, -1));
}

And finally I ended up defining map. See if you like how the usage looks in the function below.

#define g_queue_map_z(q, type, name, ...) ({                                
        GQueue* private_res = g_queue_new();                                
        g_queue_foreach(q, g_func(type, name,                               
            name = __VA_ARGS__;                                             
            g_queue_push_tail(private_res, name);                           
            ), NULL);                                                       
        private_res;                                                        
                                      })

static
GQueue* add_code_tags(Options* options, GQueue* blocks) {

    GQueue* indent_blocks(GQueue* blocks) {
        return g_queue_map(blocks, Block*, b,
                b->kind == Narrative ? b                                                                                                    :
                b->kind == Code      ?
                    union_new(Block, Code, .code =
                        indent(options->code_symbols->Indented.indentation, b->Code.code))    :
                    g_assert_no_match;);
    }

    GQueue* surround_blocks(GQueue* blocks) {
        return g_queue_map(blocks, Block*, b,
                b->kind == Narrative ?
                    union_new(Block, Narrative, .narrative =
                        g_strjoin("", NL, g_strstrip(b->Narrative.narrative), NL, NULL))   :
                b->kind == Code      ?
                    union_new(Block, Code, .code = g_strjoin("",
                                                 NL,
                                                 options->code_symbols->Surrounded.start_code,
                                                 NL,
                                                 g_strstrip(b->Code.code),
                                                 NL,
                                                 options->code_symbols->Surrounded.end_code,
                                                 NL,
                                                 NULL))    :
                                       g_assert_no_match;);

    }

    return  options->code_symbols->kind == Indented     ?   indent_blocks(blocks)   :
            options->code_symbols->kind == Surrounded   ?   surround_blocks(blocks) :
                                                            g_assert_no_match;
}

char* stringify(GQueue* blocks) {
    GString* res = g_string_sized_new(2048);
    g_queue_foreach(blocks, g_func(Block*, b,
        g_string_append(res, extract(b));
    ), NULL);
    return g_strchug(res->str);
}

void deb(GQueue* q);

static
char* translate(Options* options, char* source) {
    g_assert(options);
    g_assert(source);

    GQueue* blocks  = blockize(options, source);
    blocks          = process_phases(options, blocks);
    return stringify(blocks);
}

11 Parsing the command line

In glib there is a command line parser that accept options in unix-like format and automatically produces professional --help messages and such. We shoudl really have something like this in .NET. Pheraps we do and I’m not aware of it?

typedef struct CmdOptions { char* input_file; char* output_file; Options* options;} CmdOptions;

static
CmdOptions* parse_command_line(int argc, char* argv[]);

static char *no = NULL, *nc = NULL, *l = NULL, *co = NULL, *cc = NULL, *ou = NULL;
static char** in_file;

static int ind = 0;
static bool tests = false;

// this is a bug in gcc, fixed in 2.7.0 not to moan about the final NULL
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wmissing-field-initializers"

static GOptionEntry entries[] =
{
  { "language"          , 'l', 0, G_OPTION_ARG_STRING, &l ,
                                "Language used", "L"  },
  { "output"            , 'o', 0, G_OPTION_ARG_FILENAME, &ou,
                                "Defaults to the input file name with mkd extension", "FILE" },
  { "narrative-open"    , 'p', 0, G_OPTION_ARG_STRING, &no,
                                "String opening a narrative comment",   "NO" },
  { "narrative-close"   , 'c', 0, G_OPTION_ARG_STRING, &nc,
                                "String closing a narrative comment",   "NC" },
  { "code-open"         , 'P', 0, G_OPTION_ARG_STRING, &co,
                                "String opening a code block",          "CO" },
  { "code-close"        , 'C', 0, G_OPTION_ARG_STRING, &cc,
                                "String closing a code block",          "CC" },
  { "indent"            , 'i', 0, G_OPTION_ARG_INT,    &ind,
                                "Indent the code by N whitespaces",    "N"  },
  { "run-tests"         , 't', G_OPTION_FLAG_HIDDEN, G_OPTION_ARG_NONE,   &tests,
                                "Run all the testcases", NULL },
  { G_OPTION_REMAINING  ,   0, 0, G_OPTION_ARG_FILENAME_ARRAY, &in_file,
                                "Input file to process",   "FILE" },
  { NULL }
};
#pragma GCC diagnostic pop

Brain damaged way to run tests with a -t hidden option. Not paying the code size price in release.

#ifndef NDEBUG
#include "tests.c"
#endif

Here is my big ass command parsing function. It could use a bit of refactoring …

void destroy_arena_allocator();

static
CmdOptions* parse_command_line(int argc, char* argv[]) {

    GError *error = NULL;
    GOptionContext *context;

    context =
        g_option_context_new ("- translate source code with comemnts to an annotated file");
    g_option_context_add_main_entries (context, entries, NULL);
    g_option_context_set_summary(context, summary(s_lang_params_table));

    if (!g_option_context_parse (context, &argc, &argv, &error))
        report_error("option parsing failed: %s", error->message);

    CmdOptions* opt = g_new(CmdOptions, 1);
    opt->options = g_new(Options, 1);

    #ifndef NDEBUG
    if(tests) {
        int i = run_tests(argc, argv);
        exit(i);
    }
    #endif

    if(!in_file) report_error("No input file");
    opt->input_file = *in_file;

    // Uses input file without extension, adding extension .mkd (assume markdown)
    opt->output_file = ou ? ou :  ({
                                  char* output      = g_strdup(*in_file);
                                  char* extension   = g_strrstr(output, ".");
                                  extension ? ({
                                               *extension = '0';
                                               g_strjoin("", output, ".mkd", NULL);
                                                }) :
                                               g_strjoin("", output, ".mkd", NULL);
                                  });

    if(l) { // user passed a language
        LangSymbols* lang = lang_find_symbols(s_lang_params_table, l);
        if(!lang) report_error("%s is not a supported language", l);

        opt->options->start_narrative  = lang->start;
        opt->options->end_narrative    = lang->end;

    } else {
        if(!no || !nc) report_error("You need to specify either -l, or both -p and -c");

        opt->options->start_narrative  = no;
        opt->options->end_narrative    = nc;
    }

    if(ind) { // user pass    g_option_context_free();
        opt->options->code_symbols = union_new(CodeSymbols, Indented, .indentation = ind);
    } else {
        if(!co || !cc) report_error("You need to specify either -indent, or both -P and -C");
        opt->options->code_symbols =
            union_new(CodeSymbols, Surrounded, .start_code = co, .end_code = cc);
    }

    return opt;
}

Some windows programs (i.e. notepad, VS, …) add a 3 bytes prelude to their utf-8 files, C doesn’t know anything about it, so you need to strip it. On this topic, I suspect the program works on UTF-8 files that contain non-ASCII chars, even if when I wrote it I didn’t know anything about localization.

It should work because I’m just splitting the file when I see a certain ASCII string and in UTF-8 ASCII chars cannot appear anywhere else than in their ASCII position.

char* skip_utf8_bom(char* str) {
    unsigned char* b = (unsigned char*) str;
    return  b[0] == 0xEF && b[1] == 0xBB && b[2] == 0xBF    ? (char*) &b[3]  : // UTF-8
                                                              (char*) b;
}

12 Not freeing memory (again)

The reason I haven’t been freeing memory all along is because I was planning on using an arena allocator (a kind of linear allocator).

Memory management is fully hortogonal to the style of programming described in this post. You can do it whatever way you prefer, but there is a certain affinity between an arena allocator (or garbage collection) and functional programming because of the temporary objects created in expressions. You could create the temporary objects explicitely, but that would diminish the conciseness of the paradigm.

I have an arena allocator implementation here. In the code below I comment it out so that you don’t have a dependency from that code if you want to try this. The program runs so quickly and it does so little that you can probably let the operating system reclame memory at the end of the process life.

If you ended up integrating this with an editor (i.e. literate programming editing), you’d need to be more careful.

#ifdef ARENA

Arena_T the_arena;

inline static
gpointer arena_malloc(gsize n_bytes) {
    return Arena_alloc(the_arena, n_bytes, __FILE__, __LINE__);
}

inline static
gpointer arena_calloc(gsize n_blocks, gsize n_block_bytes) {
    return Arena_calloc(the_arena, n_blocks, n_block_bytes, __FILE__, __LINE__);
}

inline static
gpointer arena_realloc(gpointer mem, gsize n_bytes) {
    return Arena_realloc(the_arena, mem, n_bytes, __FILE__, __LINE__);
}

void arena_free(G_GNUC_UNUSED gpointer mem) {
    // NOP
}

void set_arena_allocator() {
    GMemVTable vt = (GMemVTable) { .malloc = arena_malloc,      .calloc = arena_calloc,
                                   .realloc = arena_realloc,    .free = arena_free,
                                   .try_malloc = arena_malloc,  .try_realloc = arena_realloc};
    g_mem_set_vtable(&vt);

    the_arena = Arena_new();
}

void destroy_arena_allocator() {
    Arena_dispose(&the_arena);
}

#endif

13 Summary

I have to say, it didn’t feel too cumbersome to structure C code in a functional way, assuming that you can use GLib and a couple of GCC extensions to the language. It certainly doesn’t have the problems that C++ has in terms of debugging STL failures.

There are a couple of things I don’t like about GLib and I’m working on an hobby project to overcome them. Eventually I’ll post it.

int main(int argc, char* argv[])
{
#ifdef ARENA
    set_arena_allocator();
#endif

    CmdOptions* opt = parse_command_line(argc, argv);

    char* source    = NULL;
    GError* error   = NULL;

    if(!g_file_get_contents(opt->input_file, &source, NULL, &error))
        report_error(error->message);

    source = skip_utf8_bom(source);

    char* text              = translate(opt->options, source);

    if(!g_file_set_contents(opt->output_file, text, -1, &error))
        report_error(error->message);

#ifdef ARENA
    destroy_arena_allocator();
#endif

    return 0;
}

Functional programming in C – Implementation

0.1 Cleanup

Let’s start simple with the cleanup function. First we need the usual barrage of includes. G_BEGIN_DECLS allows the header to be linked in C++.

#ifndef L_UTILS_INCLUDED
#define L_UTILS_INCLUDED

#include "glib.h"

G_BEGIN_DECLS

#include <stdlib.h>
#include <stdio.h>

This feature is GCC specific. It uses __attribute((cleanup(f))) where f is the cleanup function. In this case the cleanup function just frees the memory.

#ifdef __GNUC__

 static inline void __autofree(void *p) {
     void **_p = (void**)p;
     free(*_p);
 }

auto_clean is a building block that you can use to plug in your own cleanup function. In the common case of memory allocation, I created a wrapper macro auto_free to make it even easier.

#define auto_clean(f)   __attribute((cleanup(f)))
#define auto_free       auto_clean(__autofree)

0.2 Lambdas

I took this one from here.

If you think about it, a lambda is just an expression that returns a function. This macro creates a nested function, called fn, inside a statement expression and returns it. Unfortunately these features are gcc specific.

Remember that lambdas are not allocated on the heap, so you have to be careful on how you used them.

#define lambda(return_type, function_body)                                          
  ({                                                                                
    return_type __fn__ function_body                                                
    __fn__;                                                                         
  })

#endif

0.3 Unions

A union type is what you would expect: a struct that contains an unnamed union and a field to specify which type it is. We need the list of types in union_decl to create the kind enum. The usage of __VA_ARGS__ allows to use whatever syntax you want to go into the enum (i.e. specify int values).

Having to specify the the types here is unfortunate as you are going to need to specify it in the union_case macros as well.

I haven’t found another way to do it. If you do, let me know.

#define union_decl(alg, ...)                                                        
typedef struct alg {                                                                
    enum {  __VA_ARGS__ } kind;                                                     
    union {

You specify each type for the union with union_type. That looks pretty good to me.

#define union_type(type, ...)                                                       
    struct type { __VA_ARGS__ } type;

Ideally you shouldn’t need to specify alg here. Perhaps there is a way to avoid doing so.

#define union_end(alg)                                                              
    };} alg;

You can then set the fields on the union type by using the below macro. Notice the usage of the new struct constructor here to allow optional named parameters.

This is a statement, so it cannot go into an expression place. I think I could make it an expression that returns the existing (or a new) union. This is going to be a sceanrio if people are not using gcc statement expressions.

#define union_set(instance, type, ...)                                              
    G_STMT_START {                                                                  
        (instance)->kind     = (type);                                              
        (instance)->type   = (struct type) { __VA_ARGS__ };                         
    } G_STMT_END

This is an utility macro. It is a version of g_assert that you can use in an expression position.

#define g_assert_e(expr) (                                                          
    (G_LIKELY (!expr) ?                                                             
   (void)g_assertion_message_expr (G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC, 
                                             #expr)                             
    : (void) 1) )

And this allows to fill the default case in a match statement implemented as a ternary operator. It prints out a text representation of the expression and returns it.

#define union_fail(...) (g_assert_e(((void)(__VA_ARGS__) , false)), (__VA_ARGS__))

The rest of the code is commented out. It is a macro way to do pattern matching. For me, the ternary operator is simpler, but I left it there in case you want to play with it.

/*
#define union_case_only_s(instance, type, ...)                                      
        G_STMT_START {                                                              
        if((instance)->kind == (type)) {                                            
            G_GNUC_UNUSED struct type* it = &((instance)->type); __VA_ARGS__; }     
        else g_assert_not_reached();                                                
        } G_STMT_END

#define union_case_first_s(alg, instance, type, ...)                                
    G_STMT_START {                                                                  
        alg* private_tmp = (instance);                                              
        if(private_tmp->kind == type) {                                             
            G_GNUC_UNUSED struct type* it = &((private_tmp)->type); __VA_ARGS__; }

#define union_case_s(type, ...)                                                     
        else if(private_tmp->kind == type) {                                        
            G_GNUC_UNUSED struct type* it = &((private_tmp)->type); __VA_ARGS__; }

#define union_case_last_s(type, ...)                                                
        else if(private_tmp->kind == type) {                                        
            G_GNUC_UNUSED struct type* it = &((private_tmp)->type); __VA_ARGS__; }  
            else g_assert_not_reached(); } G_STMT_END

#define union_case_default_s(...)                                                   
        else __VA_ARGS__; } G_STMT_END

// Need to use assert here because g_assert* cannot be used in expressions as it expands to do .. while(0)
#define union_case_only(instance, type, ...)                                        
        ( (instance)->kind == (type) ? (__VA_ARGS__) : (assert(false), __VA_ARGS__) )

#define union_case_first(instance, type, ...)                                       
        ( (instance)->kind == (type) ? (__VA_ARGS__) :

#define union_case(instance, type, ...)                                             
        (instance)->kind == (type) ? (__VA_ARGS__) :

#define union_case_last(instance, type, ...)                                        
        (instance)->kind == (type) ? (__VA_ARGS__) : (assert(false), (__VA_ARGS__)) )
#define union_case_default(...)                                                     
        (__VA_ARGS__) )

*/

G_BEGIN_DECLS

#endif // L_UTILS_INCLUDED

Functional programming in C

This post/program (as I’m writing it in literate style) is a continuation of my previous posts about functional programming in C++. I promise I’m not going to post about doing it in assembly language (I think) ….

I came to like the simplicity of C very much and got interested in how you could write functional code in it.

There is one irritating thing about C as a viable programming language. Microsoft’s compiler support is not good. It just supports ANSI C, not C99 or C11. So, if you want to use more modern idyoms, you got to use gcc or clang. In this post I assume you use gcc. I will point out the gcc specific extensions.

Also, the C standard library is pretty limited, so I decided to use GLib to complement it. I also created some macros to simplify the syntax. I never understood why people like templates and think macros are evil. It takes me all of 5 minutes to do a -E on GCC to debug the result of a macro expansion. With templates, well, that’s different.

So, in summary, this post is about how you can write functional code in C, perhaps with some gcc extensions and certainly with some macro tricks. Let’s call it funkyC (thanks Ian ). I’m going to show how to use it first. Next post I’m going to show how it’s implemented.

0.1 Discriminated unions in C

With a bit of macro magic, you can get a decent looking discriminated union syntax. First we need to include the correct headers. lutils.h is where all the macros are defined.

#include <glib.h>
#include <stdio.h>
#include <assert.h>
#include <stdbool.h>
#include <signal.h>
#include <string.h>

#include "lutils.h"

Then you can declare a discriminated union with the syntax below. It suffers from two problems: repeting the list of possible types in union_decl and repeating the name of the discriminated union in union_end. Perhaps there is a way to avoid that, but I haven’t found it.

The syntax for the union_type call is the same as you would use inside a struct declaration. We’ll see how this works when we look at lutils.h.

union_decl  (Car, Volvo, Fiat, Ferrari)
union_type      (Volvo,     int x; double y;)
union_type      (Fiat,      char* brand, *model;)
union_type      (Ferrari,   char* brand, *model;)
union_end   (Car)

We can create a Car either on the stack, as below, or on the heap and we can set its value with union_set.

Notice the usage of the new struct construction syntax to simulate optional named parameters in C. I would prefer not to have a dot at the start of the name, but overall it is beautiful (if I can say that myself).

static void printCar(Car*);

static void testUnion() {
    Car c;

    union_set   (&c, Volvo, .x = 3, .y = 4);
    printCar    (&c);
    union_set   (&c, Ferrari, .brand = "Ferrari");
    printCar    (&c);
    union_set   (&c, Fiat, .brand = "Fiat", .model = "234");
    printCar    (&c);
}

You can then access values inside your discriminated union with normal if statements.

static void testCar(Car*, char const *);

static void printCar(Car* c) {

    if(c->kind == Volvo) {
        int x = c->Volvo.x;
        g_assert_cmpint(x, ==, 3);
    }

Or perhaps you want the equivalent of a match statement in F# (aka an expression that returns a value based on the type of the discriminated union). Notice that, as logical, all the expressions need to return the same type. That’s why union_fail takes a value of the expression type.

    char temp[40];

    char* value =   c->kind == Volvo    ?   itoa(c->Volvo.x, temp, 10)
                  : c->kind == Ferrari  ?   (void)c->Ferrari.model, c->Ferrari.brand
                  : c->kind == Fiat     ?   c->Fiat.model
                                        :   union_fail("Not a valid car type");

If you are willing to be gcc specific, then your expression can be comprised of multiple statements, of which the last one returns the value of the expression. This allows a much more flexible syntax for your match clauses.

#ifdef __GNUC__

    value       =   c->kind == Volvo    ? ({
                                            struct Volvo* it = &c->Volvo;
                                            itoa(it->x, temp, 10);
                                          })
                  : c->kind == Ferrari  ?   (void)c->Ferrari.model, c->Ferrari.brand
                  : c->kind == Fiat     ?   c->Fiat.model
                                        :   union_fail("Not a valid car type");

    testCar(c, value);

#endif // __GNUC__
}

We then use the super simple test framework in GLib to make sure that it all works as expected …

static void testCar(Car* c, char const * value) {
    if(c->kind == Volvo) g_assert_cmpstr(value, ==, "3");
    else if (c->kind == Fiat) g_assert_cmpstr(value, ==, "234");
    else if (c->kind == Ferrari) g_assert_cmpstr(value, ==, "Ferrari");
    else g_assert_not_reached();

}

0.2 Nested functions and lambda variables

GCC has many other cool extensions. A very simple one is nested functions. It allows you to nest functions :-) Look at the definition of doA and f2 in the function below. Putting together nested functions and block statement expressions allows you, with some macro magic, to define lambda functions in your code (from here ).

Remember that lambdas (aka nested functions) are allocated on the stack. They are very fast, but you cannot store their pointer into a gloal table (unless such table is used while the stack for this function is alive).

In such cases, you have to create a proper function. But for the other 90% of use cases, they work pretty well. They are lambdas in the spirit of C: very fast, but error prone …

#ifdef __GNUC__

static void testLambda() {

    typedef int (*aFunc) (int);

    aFunc doA(aFunc f){

        int k(int i) {
            return f(i) + 3;
        }
        return k;
    }

    int clos = 2;

    int f2 (int i) {return i;}
    aFunc b = doA(lambda (int, (int p) {return p + clos;}));

    g_assert_cmpint(b(3), ==, 8);
}

0.3 Automatic cleanup of local variables

This is not a functional topic per se, but something that always annoyed me tremendously about C. The fact that you cannot define the equivalent of the using statement in C#, or destructors in C++. Well, now you can. Or not?

Again, if you are willing to be GCC specific, you can use an attribute (more on this in the upcoming implementation post) to associate a cleanup function that gets called when your variable goes out of scope. In this case, I wrapped the free case in a nice looking macro.

But that doesn’t really work. You would certainly want such function to be called on any kind of exit from the enclosing scope (aka via exit(), abort() or longjmp()). Alas, that doesn’t happen.

This reduces the usefulness of this mechanism tremendously. Probably too much in that it lulls you into a false sense of security. You still need to free your resources in the error path of your application.

static void testAutomaticCleanup() {
    char* stack_alloc() {
        auto_free char* b = g_malloc(10000);
        memset(b, '#', 10000);
        return b;
    };

    char * c = stack_alloc();
    g_assert(*c != '#');
}

#endif

0.4 Data structures

GLib comes with a vast library of data structures to use, not too different from the .NET framework or Java. For example, below you have a single linked list …

static void testGLib() {
     GSList* list = NULL;

     list = g_slist_append(list, "Three");
     list = g_slist_prepend(list, "first");
     g_assert_cmpint(g_slist_length(list), ==, 2);

     list = g_slist_remove(list, "first");
     g_assert_cmpint(g_slist_length(list), ==, 1);

     g_slist_free(list);
}

0.5 Wrapping up

There you go, rising the level of abstraction of C, still keeping it very fast (if you are willing to be gcc bound).

There are other features in functional programming languages that are not in this post. Maybe I’ll get around to macro my way into them eventually, maybe not.

In the next post we’ll talk about how all of this is implemented. Below is the code for running the testcases.

int runTests(int argc, char* argv[]) {
    g_test_init(&argc, &argv, NULL);

    if(g_test_quick()) {
        g_test_add_func("/utils/automatic_cleanup", testAutomaticCleanup);
        g_test_add_func("/utils/lambda", testLambda);
        g_test_add_func("/utils/Union", testUnion);
        g_test_add_func("/utils/SList", testGLib);
    }

    return g_test_run();
}

int main(int argc, char *argv[]) {
    return runTests(argc, argv);
}

LLite : language friendly literate programming

1 Main ideas

The code for this post is here. The source used to generate it is here. I also attached a pdf file to give an idea of the final result.

My interest in literate programming comes from some realizations on my part:

  • When I go back to code that I have written some time ago, I don’t remember my reasoning
  • When I write a blog post, my code seems to be better. Perhaps explaining things to people encourages me to be more precise
  • I like to think top down, but the compiler forces me to write code bottom up, starting from details and going to higher level concepts

1.1 Unhappiness with existing tools

Many of the existing literate programming tools work similarly to the original CWeb.

  • They have a tangle program that goes over your file and extract something that the compiler can understand
  • They have a weave program that extracts from your file something that the document generator can understand

This scheme has the unfortunate limitation of breaking your code editor. Given that your file is not a valid code file anymore, the editor starts misbehaving (i.e. intellisense breaks). The debugger starts to get confused (albeit people tried to remediate that with cleaver use of #line. If your language has an interactive console, that would not work either.

1.2 A different interpretation

The main idea of this program is to add your narrative to the comment part of a code file by extending the comment tag (i.e. in C you could use /** ). This keeps editor, debugger and interactive console working.

The weave phase as been retained and what you are reading is the program that goes over your code file and extracts a nicely formatted (for this program in markdown format) file that can then be translated to HTML, PDF, latex, etc…

You got that? The document you are reading now is the program.

1.3 Multi-language, multi-document format

LLite works for any programming language, assuming it has open and close comment character sequences, and any documentation format, assuming it has open and close code character sequences (aka allows you to delimitate your code somehow), or it needs the code to be indented. This document uses markdown (with Pandoc extensions to generate table of contents and titles).

1.4 Usage

You invoke the program as documented below. The first set of parameters lets you choose the symbols that delimitate your language comments (or the default symbols below). The second set of parameters lets you choose how your target documentation language treats code. Either it delimits it with some symbols or it indents it.

module LLite

let langParamsTable     = [ "fsharp", ("(*" + "*", "*" + "*)") // The + is not to confuse the parser
                            "c", ("/**", "**/")
                            "csharp", ("/**", "**/")
                            "java", ("/**", "**/")] |> Map.ofList

let languages = langParamsTable |> Map.fold (fun state lang _ -> state + lang + " ") ""

let usage   = sprintf @"
Usage: llite inputFile parameters
where:
One of the following two sets of parameters is mandatory
    -no string : string opening a narrative comment
    -nc string : string closing a narrative comment
or
    -l language: where language is one of (%s)

One of the following two sets of parameters is mandatory
    -co string : string opening a code block
    -cc string : string closing a code block
or
    -indent N  : indent the code by N whitespaces

The following parameters are optional:
    -o outFile : defaults to the input file name with mkd extension" languages

let getLangNoNC lang    =
    match Map.tryFind lang langParamsTable with
    | Some(no, nc) -> no, nc
    | None -> failwith (lang + " is not a valid programming language")

1.5 Programming Languages limitations

One of the main tenets of literate programming is that the code should be written in the order that facilitates exposition to a human reader, not in the order that makes the compiler happy. This is very important.

If you have written a blog post or tried to explain a codebase to a new joiner, you must have noticed that you don’t start from the top of the file and go down, but jump here and there trying to better explain the main concepts. Literate programming says that you should write your code the same way. But in our version of it, the compiler needs to be kept happy because the literate file is the code file.

Some ingenuity is required to achieve such goal:

  • In C and C++ you can forward declare functions and classes, also class members can be in any order
  • In C#, Java, VB.NET, F# (the object oriented part) you can write class members in any order
  • In the functional part of F# you do have a problem (see later in this doc)

The F# trick below is used in the rest of the program. You’ll understand its usage naturally by just reading the code

let declare  = ref Unchecked.defaultof

2 Implementation

At the core, this program is a simple translator that takes some code text and return a valid markdown/whatever text. We need to know:

  • The strings that start and end a narrative comment (input symbols)
  • How to translate a code block into a document. We support these variations:
    • Indented: indent them by N spaces
    • Surrounded by startCode/endCode strings
type CodeSymbols =
    | Indent of int                 // indentation level in whitespaces
    | Surrounded of string * string // start code * end code

type Options = {
    startNarrative  : string
    endNarrative    : string
    codeSymbols     : CodeSymbols
}

let translate   = declare string -> string>

2.1 Going over the parse tree

We need a function that takes a string and returns a list with the various blocks. We can then go over each block, perform some operations and, in the end, transform it back to text

type Block =
| Code      of string
| Narrative of string

let blockize = declare string -> Block list>

I could have used regular expressions to parse the program, but it seemed ugly. I could also have used FsParsec, but that brings with it an additional dll. So I decided to roll my own parser. This has several problems:

  • It is probably very slow
  • It doesn’t allow narrative comments inside comments, in particular it doesn’t allow the opening comment
  • It doesn’t allow opening comments in the program code (not even inside a string)

The latter in particular is troublesome. You’ll need to use a trick in the code (i.e. concatenating strings) to foul this program in not seeing an opening comment, but it is inconvenient.

With all of that, it works.

TODO: consider switching to FsParsec

2.1.1 Lexer

The lexer is going to process list of characters. We need functions to check if a list of characters starts with certain chars and to return the remaining list after having removed such chars.

BTW: these functions are polymorphic and tail recursive

let rec startWith startItems listToCheck =
    match startItems, listToCheck with
    | [], _             -> true
    | _ , []            -> false
    | h1::t1, h2::t2  when h1 = h2  -> startWith t1 t2
    | _, _              -> false

let rec remove itemsToRemove listToModify =
    match itemsToRemove, listToModify with
    | [], l             -> l
    | _ , []            -> failwith "Remove not defined on an empty list"
    | h1::t1, h2::t2  when h1 = h2  -> remove t1 t2
    | _, _              -> failwith "itemsToRemove are not in the list"

let isOpening options       = startWith (List.ofSeq options.startNarrative) 
let isClosing options       = startWith (List.ofSeq options.endNarrative)
let remainingOpen options   = remove (List.ofSeq options.startNarrative)
let remainingClose options  = remove (List.ofSeq options.endNarrative)

This is a pretty basic tokenizer. It just analyzes the start of the text and returns what it finds. It also keeps track of the line number for the sake of reporting it in the error message.

let NL = System.Environment.NewLine

type Token =
| OpenComment   of int
| CloseComment  of int
| Text          of string

let tokenize options source =

    let startWithNL = startWith (Seq.toList NL)

    let rec text line acc = function
        | t when isOpening options t    -> line, acc, t 
        | t when isClosing options t    -> line, acc, t
        | c :: t as full                ->
            let line' = if startWithNL full then line + 1 else line
            text line' (acc + c.ToString()) t
        | []                            -> line, acc, [] 
    let rec tokenize' line acc = function
        | []                            -> List.rev acc
        | t when isOpening options t    -> tokenize' line
                                            (OpenComment(line)::acc)  (remainingOpen options t)
        | t when isClosing options t    -> tokenize' line
                                            (CloseComment(line)::acc) (remainingClose options t)
        | t                             ->
            let line, s, t'= text line "" t
            tokenize' line (Text(s) :: acc) t'

    tokenize' 1 [] (List.ofSeq source)

2.1.2 Parser

The parse tree is just a list of Chunks, where a chunk can be a piece of narrative or a piece of code.

type Chunk =
| NarrativeChunk    of Token list
| CodeChunk         of Token list

let parse options source =

    let rec parseNarrative acc = function
        | OpenComment(l)::t         ->
            failwith ("Don't open narrative comments inside narrative comments at line "
                                                                                    + l.ToString())
        | CloseComment(_)::t        -> acc, t
        | Text(s)::t                -> parseNarrative (Text(s)::acc) t
        | []                        -> failwith "You haven't closed your last narrative comment"

    let rec parseCode acc = function
        | OpenComment(_)::t as t'   -> acc, t'
        | CloseComment(l)::t        -> parseCode (CloseComment(l)::acc) t
        | Text(s)::t                -> parseCode (Text(s)::acc) t
        | []                        -> acc, []
    let rec parse' acc = function
        | OpenComment(_)::t         ->
            let narrative, t' = parseNarrative [] t
            parse' (NarrativeChunk(narrative)::acc) t' 
        | Text(s)::t                ->
            let code, t' = parseCode [Text(s)] t
            parse' (CodeChunk(code)::acc) t'
        | CloseComment(l)::t           ->
            failwith ("Don't insert a close narrative comment at the start of your program at line "
                                                                                    + l.ToString())
        | []                -> List.rev acc

    parse' [] (List.ofSeq source)

2.1.3 Flattener

The flattening part of the algorithm is a bit unusual. At this point we have a parse tree that contains tokens, but we want to reduce it to two simple node types containing all the text in string form.

TODO: consider managing nested comments and comments in strings (the latter has to happen in earlier phases)

let flatten options chunks =
    let tokenToStringNarrative = function
    | OpenComment(l) | CloseComment(l)  -> failwith ("Narrative comments cannot be nested at line "
                                                                                    + l.ToString())
    | Text(s)                           -> s

    let tokenToStringCode = function
    | OpenComment(l)                -> failwith ("Open narrative comment cannot be in code at line"
                                                                + l.ToString()) +
                                                 ". Perhaps you have an open comment in" +
                                                 " a code string before this comment tag?"
    | CloseComment(_)               -> string(options.endNarrative |> Seq.toArray)
    | Text(s)                       -> s

    let flattenChunk = function
    | NarrativeChunk(tokens)             ->
        Narrative(tokens |> List.fold (fun state token -> state + tokenToStringNarrative token) "")
    | CodeChunk(tokens)                  ->
        Code(tokens |> List.fold (fun state token -> state + tokenToStringCode token) "")

    chunks |> List.fold (fun state chunk -> flattenChunk chunk :: state) [] |> List.rev

We are getting there, now we have a list of blocks we can operate upon

blockize := fun options source -> source |> tokenize options |> parse options |> flatten options
 

2.2 Narrative comments phases

Each phase is a function that takes the options and a block list and returns a block list that has been processed in some way.

type Phase = Options -> Block List -> Block List

let removeEmptyBlocks   = declare
let mergeBlocks         = declare
let addCodeTags         = declare

let processPhases options blockList = 
    blockList
    |> !removeEmptyBlocks   options
    |> !mergeBlocks         options
    |> !addCodeTags         options

We want to manage how many newlines there are between different blocks, because we don’t trust the programmer to have a good view of how many newline to keep from comment blocks and code blocks. We’ll trim all newlines from the start and end of a block, and then add our own.

let newLines = [|'\n';'\r'|]

type System.String with
    member s.TrimNl () = s.Trim(newLines) 

2.2.1 Remove empty blocks

There might be empty blocks (i.e. between two consecutive comment blocks) in the file. For the sake of formatting the file beautifully, we want to remove them.

let extract = function
    | Code(text)        -> text
    | Narrative(text)   -> text

removeEmptyBlocks := fun options blocks ->
                        blocks |> List.filter (fun b -> (extract b).TrimNl().Trim()  "")

2.2.2 Merge blocks

Consecutive blocks of the same kind need to be merged, for the sake of formatting the overall text correctly.

TODO: make tail recursive

let rec mergeBlockList = function
    | []        -> []
    | [a]       -> [a]
    | h1::h2::t -> match h1, h2 with
                   | Code(t1), Code(t2)             -> mergeBlockList (Code(t1 + NL + t2)::t)
                   | Narrative(n1), Narrative(n2)   -> mergeBlockList(Narrative(n1 + NL + n2)::t)
                   | _, _                           -> h1::mergeBlockList(h2::t)

mergeBlocks := fun options blocks -> mergeBlockList blocks

2.2.3 Adding code tags

Each code block needs a tag at the start and one at the end or it needs to be indented by N chars.

let indent n (s:string) =
    let pad = String.replicate n " "
    pad + s.Replace(NL, NL + pad)

addCodeTags := fun options blocks ->
    match options.codeSymbols with
    | Indent(n)         ->
        blocks |> List.map (function Narrative(s) as nar -> nar | Code(s) -> Code(indent n s))
    | Surrounded(s, e)  -> 
        blocks |> List.map (function
                            | Narrative(text)   -> Narrative(NL + text.TrimNl() + NL)
                            | Code(text)        -> Code(NL + s + NL + text.TrimNl() + NL + e + NL))

2.2.4 Flatten again

Once we have the array of blocks, we need to flatten them (transform them in a single string), which is trivial, and then finally implement our original translate function.

let sumBlock s b2 = s + extract b2

let flattenB blocks = (blocks |> List.fold sumBlock "").TrimStart(newLines)

translate := fun options text -> text |> !blockize options |> processPhases options |> flattenB

2.3 Parsing command line arguments

Parsing command lines involves writing a function that goes from a sequence of strings to an input file name, output file name and Options record

let parseCommandLine = declare string * string * Options>

To implement it, we are going to use a command line parser taken from here. The parseArgs function takes a sequence of argument values and map them into a (name,value) tuple. It scans the tuple sequence and put command name into all subsequent tuples without name and discard the initial (,) tuple. It then groups tuples by name and converts the tuple sequence into a map of (name,value seq)

For now, I don’t need the value seq part as all my parameters take a single argument, but I left it in there in case I will need to pass multiple args later on.

open  System.Text.RegularExpressions

let (|Command|_|) (s:string) =
  let r = new Regex(@"^(?:-{1,2}|\/)(?\w+)[=:]*(?.*)$",RegexOptions.IgnoreCase)
  let m = r.Match(s)
  if m.Success
  then 
    Some(m.Groups.["command"].Value.ToLower(), m.Groups.["value"].Value)
  else
    None

let parseArgs (args:string seq) =
  args 
  |> Seq.map (fun i -> 
                    match i with
                    | Command (n,v) -> (n,v) // command
                    | _ -> ("",i)            // data
                  )
  |> Seq.scan (fun (sn,_) (n,v) -> if n.Length>0 then (n,v) else (sn,v)) ("","")
  |> Seq.skip 1
  |> Seq.groupBy (fun (n,_) -> n)
  |> Seq.map (fun (n,s) -> (n, s |> Seq.map (fun (_,v) -> v) |> Seq.filter (fun i -> i.Length>0)))
  |> Map.ofSeq

let paramRetrieve (m:Map) (p:string) = 
  if Map.containsKey p m
  then Some(m.[p])
  else None

This is the main logic of parameter passing. Note that we give precedence to the -l and -indent parameters, if present.

This is a function that goes from the map of command line parameters to the input file name, output file name and options. With that we can finally define the original parseCommandLine.

let safeHead errMsg s = if s |> Seq.isEmpty then failwith errMsg else s |> Seq.head 

let paramsToInputs paramsMap =
    let single p er     = match paramRetrieve paramsMap p with | Some(k) -> Some(k |> safeHead er)
                                                               | None -> None
    let get p s         = match paramRetrieve paramsMap p with |Some(k) -> k |> safeHead s
                                                               | None -> failwith s
    let defaultP p q er = match paramRetrieve paramsMap p with | Some(k) -> k |> safeHead er
                                                               | None -> q

    let inputFile       = get "" "You need to pass an input file"
    let outputFile      = defaultP  "o"
                                    (System.IO.Path.GetFileNameWithoutExtension(inputFile) + ".mkd")
                                    "You must pass a parameter to -o"

    let no, nc          = match single "l" "You must pass a language parameter to -l" with
                          | Some(l) -> getLangNoNC l
                          | None    ->
                                get "no" "The no (narrative open) parameter is mandatory, if no -l specified",
                                get "nc" "The nc (narrative close) parameter is mandatory, if no -l specified"

    let codeSymbs       = match single "indent" "You must pass a whitespace indentation number to -indent" with
                          | Some(n)     ->
                                let success, value = System.Int32.TryParse n
                                if success
                                    then Indent(value)
                                    else failwith "-i accepts just an integer value as parameter"                          
                          | None        ->
                                Surrounded(
                                    get "co" "The co (code open) parameter is mandatory, if no -indent specified",
                                    get "cc" "The cc (code close) parameter is mandatory")
    inputFile, outputFile, {
        startNarrative  = no
        endNarrative    = nc
        codeSymbols     = codeSymbs
        }

parseCommandLine := parseArgs >> paramsToInputs

2.4 Main method

We can then write main as the only side effect function in the program. Here is where the IO monad would live …

let banner  = "LLite : language friendly literate programming\n"

let myMain args =
    try
        printfn "%s" banner

        let inputFile, outputFile, options = !parseCommandLine args
        let input       = System.IO.File.ReadAllText inputFile
        let output      = !translate options input
        System.IO.File.WriteAllText (outputFile, output)
        0
    with
    | e ->
        printfn "%s" "Failure"
        printfn "%s" e.Message 
        printfn "%s" usage
#if DEBUG 
        printfn "\nDetailed Error Below:\n%A" e
#endif
        -1

3 An aside: forward declaring functions in F#

3.1 A simple solution

You can achieve something somehow similar to forward declaration by the ’declare ’dirty trick used in this program. Whenever you want to do a forward declaration of a function , or variable, you can type:

let testDeclare() =

    let add = declare float>

    let ``function where I want to use add without having defined it`` nums = nums |> Seq.map !add

This creates a ref to a function from float to float. It looks a bit like an Haskell type declaration. You can then use such function as if it were actually define and delay his definition to a later point in time when you are ready to explain it.

When you are ready to talk about it, you can then define it with:

    add := fun x -> x + 1.

The syntax is not too bad. You get that often-sought Haskell like explicit type declaration and you can regex the codebase to create an index at the end of the program (maybe).

But is it too slow? After all, there is one more indirection call for each function call.

Let’s test it: enable #time in F# interactive and execute timeNormalF and timeIndirectF varying sleepTime and howManyIter until you convince yourself that it is ok (or not).

    let sleepTime   = 50
    let howManyIter = 100
    let normalF x   = System.Threading.Thread.Sleep sleepTime
    let indirectF   = declare unit>
    indirectF      := fun x -> System.Threading.Thread.Sleep sleepTime

    let timeNormalF     = [1..howManyIter] |> List.iter normalF
    let timeIndirectF   = [1..howManyIter] |> List.iter !indirectF
    ()

3.2 A correct solution (but ugly)

Unfortunately, there is a big problem with all of the above: it doesn’t work with generic functions and curried function invocations. The code below works in all cases, but it is ugly for the user to use. In this program I’ve used the beautiful, but incorrect, syntax.

type Literate() =
    static member Declare  (ref : obj ref) (x : 'a) : 'b =
        unbox <| (unbox obj> !ref) x
    static member Define (func : 'a -> 'b) (ref : obj ref) (f : 'a -> 'b) =
        ref := box (unbox >> f >> box)

// Declaration    
let rec id (x : 'a) : 'a = Literate.Declare idImpl x
and idImpl = ref null

// Usage
let f () = id 100 + id 200

// Definition
Literate.Define id idImpl (fun x -> x)

3.3 The traditional way

The traditional way of doing something like this is by passing the function as a parameter in a manner similar to the below.

// Declaration and usage intermingled
let calculate' (add: int -> int -> int) x y = add x y * add x y

// Definition
let add x y = x + y

let calculate = calculate' add

To my way of seeing, this is the most cumbersome solution and conceptually dishonest because you are not parametrizing your function for technical reasons, but just for the sake of explaining things. In a way, you are changing the signature of your functions for the sake of writing a book. That can’t be right …

Exceptions vs. Return Values to represent errors (in F#) – IV – Implementation

The Critical monad is defined as follows. First there is the type that propagates through the monad:

type Result<'a, 'b> =
| Success of 'a
| Failure of 'b


Then we define the usual computation expression methods.

type Critical() =
       // a -> m a
        member o.Return x       = Success x
        // m a -> (a -> m b) -> m b
        member o.Bind (m, f)    = match m with
                                    | Failure e -> Failure e
                                    | Success x -> f x
        // m a -> m a
        member o.ReturnFrom m   = m

Explaining how computational expressions work in F# is a blog onto itself. And several chapters in many books. Sufficient to say that conceptually this propagates the success value and returns the failure value.

We then define an instance of this type, so that we can use the nice ‘critical { … }’ syntax.

let critical = Critical()


We then go and define the functions that the user needs to use to annotate their function calls. The simplest one is the one to propagate any exception coming from the function ‘f’.

let fault f = f


Then it comes the one to manage contingencies. This will trap any exception for which ‘stopF ex’ is ‘true’, call ‘errF ex’ to construct the error return value and wrap it in a ‘Failure’. Otherwise it will rethrow the exception.

let contingentGen stopF errF f =
    try
        Success(f ())
    with
        | ex when stopF ex -> Failure(errF ex)
        | _                -> reraise ()

Albeit very simple, the above is the core of the system. Everything else is just details. Let’s look at them.

First we want a function that takes as parameter a list of (Exception, ReturnValue) and gives back the correct stopF errF to plug into ‘contingentGen’.

let exceptionMapToFuncs exMap =
    let tryFind ex = exMap |> List.tryFind (fun (k, _) -> k.GetType() = ex.GetType())
    (fun ex ->
        let found = tryFind ex
        match found with Some(_) -> true | None -> false),
    (fun ex ->
        let found = tryFind ex
        match found with
        | Some(k, v)    -> v ex
        | None          -> raise ex)

Then ugliness comes. For the sake of getting a decent syntax (not great) on the calling site, we need to fake overloading of functions by the old trick of adding a number at the end. Thanks to Tobias to point out this (my api was even worse earlier).

I often wondered about the trade-off between currying and overloading for functions. I seem to always paint myself in a situation where I need overloading. In any case, here it goes:

let contingent1 exMap f x =
    let stopF, errF = exceptionMapToFuncs exMap
    contingentGen stopF errF (fun _ -> f x)

let contingent2 exMap f x y =
    let stopF, errF = exceptionMapToFuncs exMap
    contingentGen stopF errF (fun _ -> f x y)

let contingent3 exMap f x y z =
    let stopF, errF = exceptionMapToFuncs exMap
    contingentGen stopF errF (fun _ -> f x y z)


Sometimes you want to trap all exceptions from a function and return your own error value:

let neverThrow1 exc f x     = contingentGen (fun _ -> true) (fun ex -> exc ex) (fun _ -> f x)
let neverThrow2 exc f x y   = contingentGen (fun _ -> true) (fun ex -> exc ex) (fun _ -> f x y)
let neverThrow3 exc f x y z = contingentGen (fun _ -> true) (fun ex -> exc ex) (fun _ -> f x y z)


Other times you need to go from a function that returns return values to one that throws exceptions. You need translating from contingencies to faults:

let alwaysThrow exc f x =
    match f x with
    | Success(ret)              -> ret
    | Failure(e)                -> raise (exc e)      

And that’s it. Hopefully we have bridged the gap between exceptions and return values without making the code too ugly (just a little bit). Or perhaps not.

I need to add that I haven’t used this library myself (yet). I’m sure when I do I’ll discover many things to change.