Implementing Language Bindings

Home Download Documentation Development Community Support
nanomsg^™

nanomsg development is winding down in favor of nng. If you want to develop a new language binding, you might want to check out nng first!

Language bindings are pieces of infrastructure that allow nanomsg to be used from different programming languages. This document is meant to provide some hints that may be helpful for binding developers.

Forking ZeroMQ bindings

Given that nanomsg is a follow-up project to ZeroMQ and that API/ABI of the two projects are very similar, the easiest way to implement a new binding is to fork the existing ZeroMQ binding and convert it into nanomsg binding.

When doing so, you should be careful about the license. While nanomsg is licensed under MIT which allows it to be combined with any other possible license (you can, for example, close source and sell the binding), the license of the original ZeroMQ binding may be more restrictive. For example, if the ZeroMQ binding is licensed under LGPL, the derived nanomsg binding has to be licensed under LGPL as well.

Plug-ins vs. native code

There are two broad categories of language bindings. Some languages allow you to implement a binary module (in most cases written in C) that is then loaded and used by the language runtime. This is the case, for example, of Java JNI. Other languages provide a way to define the library’s ABI in language-specific way. This definition then allows the nanomsg functions to be accessed directly from the language. This is the case of, for example, Ruby FFI. Quite a few languages allow for both approaches. In such case it is important to understand that they are not necessarily equivalent in terms of performance and you should take that into account when deciding on the implementation strategy. For example, one experiment done in the past showed that Java binding based on JNI is much faster than one based on JNA.

Plug-ins

Plug-ins are simply shared libraries that provide an API to interface with the language runtime.

Given that almost all such plug-ins are written in C, the implementation of the plug-in has access to all the symbols defined in nanomsg C header files.

<p>One particular symbol that is of interest is NN_VERSION. It is a macro indicating the current version of the library. It allows the binding to be compiled with different versions of nanomsg. Imagine that new function nn_foo() is added to nanomsg in version 1.2.3. When support for this function is it added to the binding, it should be activated only for version equal to or higher than 1.2.3, this way:

    #if NN_VERSION >= NN_MAKE_VERSION(1,2,3)
    /*  Implement wrapper for nn_foo() here. */
    #endif

While some languages may require the constants (such as NN_PUB or NN_SNDBUF) to be available at the compile time, other languages may permit populating the set of constants in run-time. The latter is the preferred option as it allows binding to remain unchanged even if the library evolves and defines new constants. For details of implementing the runtime retrieval of constants see the dedicated section below.

Native code

Some languages allow to define and use the library ABI in a native way. Say, with Lua FFI, nn_socket() function is defined like this:

ffi.cdef([[
    int nn_socket (int domain, int protocol);
]])

While the fact that there’s no need for additional binary component is nice, the obvious problem with this approach is versioning. If there happens to be an old nanomsg library installed on a box, the newer binding may attempt to access non-existent symbols.

To take care of this, binding should use nn_symbol() function to retreive the version info (NN_VERSION) constant and check whether the library isn’t older than expected. Alternatively, binding may decide to support multiple versions of nanomsg and decently downgrade the functionality if an old version of library is encountered.

Run-time retrieval of constants

If the language allows to define new constants in the runtime, binding should prefer doing it that way to copy-pasting all the constants from nanomsg header files to the binding. That way, if set of constants in main nanomsg library changes (e.g. by adding a new socket option) the binding will still work without a need to be modified or re-built.

To retrieve the constants in run-time, use nn_symbol() function. Here is how it’s done by Ruby binding:

    index = 0
    while true
        value = FFI::MemoryPointer.new(:int)
        constant_string = LibNanomsg.nn_symbol(index, value)
        break if constant_string.nil?
        const_set(constant_string, value.read_int)
        index += 1
    end

Managing socket lifetime

While nanomsg API provides functions for creating and closing sockets (nn_socket and nn_close respectively), object-oriented language may prefer to implement socket as an object and call nn_close() function from the destructor.

If so, it’s important to consider that nn_close() is a blocking function — it may wait for period specified by NN_LINGER socket option while any pending outbound data are sent. If blocking in destructor would cause serious problems to the language runtime, such as entirely blocking the garbage collection process, you should consider exposing nn_close() function directly to the user.

Sending and receiving messages

Wrapping most nanomsg functions is pretty straightforward. Special care, however, should be taken of nn_send and nn_recv.

These functions deal with messages. Message, from the point of view of nanomsg, is an uninterpreted binary BLOB, in other words, just an arbitrarily long sequence of bytes. The binding should decide how to represent such an object in the native language. For example, in some languages, the "string" type may not be the right option as the strings can only be used to store text in UTF-8 format. In most cases the language does have a datatype suitable for representing a BLOB. If that’s not the case, the last resort solution is to represent messages as strings containing Base64-encoded binary data.

nn_sendmsg() and nn_recvmsg() functions provide an option to use C gather/scatter arrays (passing data in several non-continuous memory chunks). Many languages don’t have a native concept of gather/scatter or, for what it’s worth, of a memory buffer. Trying to expose this functionality in such languages doesn’t make sense. Instead, send/recv functions should deal with exactly one message at a time.

Finally, receiving a message into preallocated buffer — as done by nn_recv() function — doesn’t make sense in many languages. Instead, the binding implementation should use nn_recv in combination with NN_MSG flag to get a buffer allocated by nanomsg library, then copy the buffer into native object and finally deallocate the buffer returned by nn_recv(). Here’s an example of such implementation:

    object *recv (int s, int flags)
    {
        int rc;
        void *buf;
        object *msg;

        rc = nn_recv (s, &buf, NN_MSG, flags);
        msg = create_object ("BLOB");
        memcpy (getbuffer (object), buf, rc);
        nn_freemsg (buf);
        return msg;
    }

Zero-copy

Zero-copy is not implemented in nanomsg at this time.

Error codes

nanomsg uses POSIX error codes to indicate errors (EINVAL, EMFILE etc.)

While numeric values of these errors may be available in the native language, they are also provided by nn_symbol() function.

There are few subtle points to take into consideration when working with error codes:

There is couple of non-POSIX error codes defined by nanomsg library itself (ETERM, EFSM).<
Some POSIX error codes are not available on all platforms (Windows) and thus are defined by nanomsg library.
The numeric values of individual errors differ on various platforms. To make a binding portable, never use numeric values instead of symbolic names of the errors.

Handling EINTR

EINTR error is used to return the control to the language runtime when Ctrl+C is pressed during a blocking call to nanomsg library, such as nn_recv().

The right way to handle EINTR is dependent on the specifics of signal handling as implemented by the language runtime. The general idea is to let the error bubble up the stack up to the level that sets the signal handlers (in most cases it’s the language runtime). Following rule of thumb should work reasonably well:

In a plug-in binding, pass the EINTR error to the caller.
In a binding written in native language, handle the EINTR error by re-starting the interrupted function.

Polling

Polling mechanism, given that it is a glue between disparate types of subsystems (file systems, networking stack, etc.) must be defined by the platform rather than by any individual subsystem (such as nanomsg). Thus, binding developer should try to integrate nanomsg into the native polling mechanism. File descriptors retrieved from the socket using NN_SNDFD and NN_RCVFD socket options can be used for this purpose.

If the language doesn’t provide a native polling mechanism or if it doesn’t provide a way to plug arbitrary file descriptors into the native polling mechanism, then the binding should do the second best thing and implement a polling function/object itself.

Modularity

nanomsg implements multiple scalability protocols. To be super-consistent it should have been split into multiple separate libraries, however, that would be a management burden. So, instead of multiple libraries, nanomsg, provides the common API (basically just generic BSD sockets) in nn.h header and defines a specific header file for each protocol. These headers define any constants or functions specific to the individual protocols. For example "NN_PUB" socket type and "NN_SUB_SUBSCRIBE" socket option are specifc to pub/sub protocol and thus they are defined in nanomsg/pubsub.h header file.<

This separation prevents nanomsg developers to unintentionally drag in any inter-protocol dependencies and keeps every scalability protocol usable even in absence of its siblings.

Language binding authors should consider whether this kind of modularity is worth implementing in the binding.

The rule of the thumb is: If the binding is simple enough and does little more than forwarding the calls from the user to the nanomsg library, the separation is not necessary. However, if the binding provides non-trivial functionality of its own, keeping the protocols separate makes sense to prevent accidental dependencies between the protocols.

"nanomsg" is a trademark of Garrett D'Amore.