Calling Foreign Functions from Mercury

Mercury is a fully declarative functional / logic programming language. Unlike many other functional languages, it supports non-deterministic programming and automatic backtracking (or other search strategies); unlike many other logic languages, it supports static typing and efficient compilation to C, Java, C# and Erlang.

Motivation

Like many other languages, it offers multi-precision integers, in the form of the integer.m part of its standard library. In another article I described some work on parallelization of the bignum multiplication. Recently, in a thread on the m-users mailing list, a user reported his experience using bignum arithmetic in Mercury, stating that performance was severely lacking behind other languages.

Investigating at bit, it became obvious that Mercury was the only language using a native implementation base on pure, immutable data-structures (lists of 14 bit digits) for bignum arithmetic. The other languages either use proprietary low-level implementations or the GMP library.

While there still is a potential to optimize bignum division in integer.m, it became obvious that in order to achieve comparable performance, one would need the binding to an external library. Being a fan of suckless software, I had read about libtommath, which is an efficient but simple library for multi-precision arithmetic, written in fully portable ISO-C with very clean and understandable code.

Data Structures

The data-structure for libtommath is defined as a simple C structure:

/* the infamous mp_int structure */
typedef struct  {
    int used, alloc, sign;
    mp_digit *dp;
} mp_int;

It contains the number of used digits, the number of allocated digits, the sign of the number and finally the array containing the digits. mp_digit is defined dependent on the system on which the library is compiled.

Using this type in Mercury is surprisingly easy. We can simply declare a new foreign type mp_int as being a pointer to a C mp_int structure:

:- pragma foreign_type("C", mp_int, "mp_int*")
    where equality is equal, comparison is mp_cmp.

In order to have correct comparison and unification for the new type, we have to declare which predicates to use for equality and comparison.

To communicate in the other direction, i.e., from C to Mercury, in particular to report the return status of operations on mp_int, one declares a Mercury discriminated union type for the different possibilities of return values from the library.

:- type mp_result_type --->
      mp_result_okay
    ; mp_result_out_of_mem
    ; mp_result_invalid_input.

The 3 possibilities of this type are then mapped explicitly on the values from the header declaration of the library:

:- pragma foreign_enum("C", mp_result_type/0,
                      [
                       mp_result_okay          - "MP_OKAY",
                       mp_result_out_of_mem    - "MP_MEM",
                       mp_result_invalid_input - "MP_VAL"
                      ]).

Creating mp_int Values and the Garbage Collector

Creating an mp_int is done by calling the Mercury predicate mp_init/2 which calls the mp_init/3 predicate, defined as a direct call into the library. If the library call does not succeed, i.e., the mp_int could not be created, an error is thrown.

:- pred mp_init(int::in, mp_int::out) is det.
mp_init(N, Res) :-
    mp_init(N, Result, Res0),
    ( Result = mp_result_okay ->
        Res = Res0
    ;
        error("could not initialize mp_int")
    ).

One problem that arises when connecting a garbage collected language like Mercury to a library written in C is how one treats memory allocation on the different heaps.

On the Mercury side, this is quite easy. It is sufficient to allocate memory on the Mercury heap using the defined macros. In this way, the garbage collector is aware of the allocated variables and will be able to release them if required.

:- pred mp_init(int::in, mp_result_type::out, mp_int::out) is det.
:- pragma foreign_proc("C",
                      mp_init(Value::in, Result::out, Mp_Int::out),
                      [will_not_call_mercury, promise_pure, thread_safe],
"
  int initResult, opResult;
  Mp_Int     = MR_GC_NEW_ATTRIB(mp_int, MR_ALLOC_ID);
  initResult = mp_init(Mp_Int);
  if (initResult == MP_OKAY)
    {
      opResult = mp_set_int(Mp_Int, Value);
      Result   = opResult;
    }
  else
    Result     = initResult;
").

Here, one allocates GC-collected memory for Mp_Int, calls the initialization function of the library and, if successful, calls the function to set the value of the mp_int. The problem is, that memory allocated on the C side is not yet under control of the garbage collector.

For libtommath, as pointed out in a message on m-users, all memory allocations and releases are declared as macros and can be redefined at compile time. It is therefore possible to replace any call to malloc and free via calls to the Boehm garbage collector, i.e., making all allocated memory subject to the GC, once it is no longer referenced.

Calling C-library Functions from Mercury

Using the same principle as above, any function exported from the library can be called via a Mercury predicate or function. For each declaration, one can specify whether there are side-effects (promise_pure), whether and Mercury predicate will be called from the C function (will_not_call_mercury) and whether the call is thread safe (for more details see the language reference). For libtommath, all calls are declared as pure, thread safe and without calls to Mercury.

An example for such a call is the divide_with_rem predicate that calculates the quotient and remainder of two mp_int values.

divide_with_rem(A, B, Quot, Rem) :-
    ( is_zero(B) ->
        throw(math.domain_error("mp_int.quot_with_rem: division by zero"))
    ;
        mp_quot_rem(A, B, Result, Quot0, Rem0),
        (
          Result = mp_result_okay,
          Quot = Quot0,
          Rem = Rem0
        ;
          Result = mp_result_out_of_mem,
          error("could not initialize mp_int")
        ;
          Result = mp_result_invalid_input,
          throw(math.domain_error(
            "mp_int.quot_with_rem: could not compute quotient and remainder"))
        )
    ).

The corresponding foreign predicate is defined as follows:

:- pred mp_quot_rem(mp_int::in, mp_int::in, mp_result_type::out, mp_int::out,
                   mp_int::out) is det.
:- pragma foreign_proc("C",
                      mp_quot_rem(A::in, B::in, Result::out,
                                  Quot::out, Rem::out),
                      [will_not_call_mercury, promise_pure, thread_safe],
"
  int initResult1, initResult2, opResult;
  Quot        = MR_GC_NEW_ATTRIB(mp_int, MR_ALLOC_ID);
  Rem         = MR_GC_NEW_ATTRIB(mp_int, MR_ALLOC_ID);
  initResult1 = mp_init(Quot);
  initResult2 = mp_init(Rem);
  if (initResult1 == MP_OKAY && initResult2 == MP_OKAY)
    {
      opResult = mp_div(A, B, Quot, Rem);
      Result   = opResult;
    }
  else
    Result     = initResult1 != MP_OKAY ? initResult1 : initResult2;
").

Of course, this could be simplified a bit, but as the Mercury compiler creates C code that is given to a C compiler, there is an additional optimization step and superfluous variables will be optimized away.

Simple Boolean foreign predicates are even easier to declare:

:- pred is_zero(mp_int::in) is semidet.
:- pragma foreign_proc("C",
                      is_zero(A::in),
                      [will_not_call_mercury, promise_pure, thread_safe],
"
  SUCCESS_INDICATOR = mp_iszero(A) ? MR_TRUE : MR_FALSE;
").

Where MR_TRUE / MR_FALSE indicate whether the foreign predicate succeeded or not.

Testing and Performance

A program that uses the mp_int type needs to be linked with a libtommath library that is compiled using the redefined MALLOC/FREE macros. The command line for mmc is as follows:

$ mmc --link-object libtommath.a $ADDITIONAL_PARAMETERS $PROGRAM

The OP of the original thread posted a program for the probabilistic Miller-Rabin primality test using integer types. For a number with ca. 2000 decimal digits, the integer version required 464s while the mp_int version required 0.81s, i.e., more than 2 orders of magnitude faster. For a more elaborate program for integer factorization based on Pollard's rho algorithm, the factor was around 14, for numbers around 60-80 decimal digits.

Conclusion

To summarize, creating a binding to an external C library to use from within Mercury (using C grades) is relatively easy. The most challenging part is probably the correct incorporation of the memory management, i.e., having both the Mercury heap as well as the external heap under control of the Boehm-Demers-Weiser garbage collector.

Further code reading on github.