iLab Neuromorphic Robotics Toolkit  0.1
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
NRT Data Flow

What happens when a message is posted?

When a message is posted on a Poster port, the following happens:

  1. The message is stored on the local Blackboard, overwriting any previous message posted by the exact same port. This message is thus available for later use at any time via Checker ports (see below). No copy of the message is made, the Blackboard just keeps an additional smart pointer to the message data. So the CPU/memory cost of stroring the message on the Blackboard is negligible.
  2. In parallel, all callbacks of all matching subscribers (i.e., subscribers with matching message type, return type, and topic) on the local machine are called (each in a separate parallel thread).
  3. In parallel to this, if there is at least one remote machine on the network with a matching subscriber, then the outgoing message is serialized (converted from a C++ class to a plain array of bytes that can be transmitted over the network). The message is then sent only once to each remote machine that has one or more matching subscribers. Finally, each of those remote machines is notified that, as soon as the message is fully received, it should be de-serialized (i.e., unpacked from an array of bytes into a C++ class) once, and then locally dispatched to all matching subscribers on that machine.
  4. As each matching subscriber's callback is executed, it may optionally return a result message, or it may throw an exception if something goes wrong. All results (whether containing a message, void, or an exception) are sent back to the initial poster and stored into a list. That list is returned to the poster.
Note
For efficiency, in NRT the list of matching local subscribers, matching remote machines, and, within each remote machine, matching subscribers, is pre-computed so that there is no CPU or network cost incurred by trying to find matching subscribers at the time a message is posted.

Futures and Promises in NRT

To complicate things just a little bit more, but to also make the NRT core system even more powerful, NRT extensively uses the C++11 concept of future and promise.

In the above context of posting a message, this works as follows: when a message is posted, the act of posting is actually nearly instantaneous and returns immediately. However, an object is returned to the initiator of the post, which contains a list of future results. This list contains placeholders for results which do not exist yet (because, for example, a message is still being transmitted to a remote blackboard, or that remote blackboard is still running a subscriber onMessage() callback for that message). As long as the original poster does not attempt to access the results, those placeholders happily are just a reminder that some operation is in progress which may generate results in the future. Each time one of the results is actually generated (e.g., one subscriber callback completes), the corresponding placeholder is replaced by the result's value. If one of the subscribers throws an exception, the placeholder is replaced by that exception.

Modules which post messages typically do one of three things with that list of future results:

  1. They may just ignore it altogether. In which case, as the list is destroyed because it will not be used, it will first make sure that all results (or exceptions) have been received. Essentially, this will block the poster until all subscribers have completed. This is the behavior obtained if you just call SomePoster::post(msg) without catching the return value of the post() fuction.
  2. They may store it in a temporary variable, then do something else (which may include posting more messages), and finally ignore and destroy that temporary variable. Upon destruction, the poster will block until all subscribers have completed.
  3. They may get and use the results from the list. In such case, NRT organizes the results so that as soon as the next one is received, it will be available in the list of results. As the poster is trying to access the next result in the list, that access may either block (no new result has been received yet), or return the next received result, or throw an exception if one was received in place of a result.

Recursive (depth-first) execution within each parallel (breadth-first) thread

In typical networks of NRT modules, such as the one shown below, when a subscriber receives a message, in the callback function of that subscriber the message will be processed and the result will be posted. In most modules, the results of a post are ignored which, as explained above, results in blocking the poster until all results have been received from all subscribers. This thus means that:

  1. When a message is posted, all matching subscribers are called in parallel as exposed above. This is a parallel or breadth-first flow.
  2. Each subscriber will not complete until the (possibly very long) chain of subscribe/post/subscribe/post/... that may be triggered within the subscriber is complete. Hence, within each parallel branch, execution proceeds in a depth-first manner.
designer_demo.png

In the above example, consider the poster port of the Image Rescaler module. When a message is posted on that port, two parallel threads handle the 2 subscribers connected to this poster (note how the third connection is to a checker port, which does not involve any triggering of callbacks when a message is posted). In the first thread, the callback for the Super Pixel subscriber is called, and within this callback, a result is posted. As this result is posted, posting it will block the caller until the result has been fully processed by the next subscriber, in this case the Convex Hull Approximator module. But this subscriber also posts! Hence it will not return until its post's results are received, in this case from the Polygon Thresholder. And so on, and so on.

The importance of understanding data flow

Obviously, data flow in NRT can become complex as we aim to parallelize as much as possible (to exploit muti-core machines and networks of machines) but cannot proceed along a chain of processing until the necessary data has been computed by the previous step.

Note
Let's analyze the data flow of the above NRT network of modules. By design, remember from what we learned about posters that the first thing that is done with a posted message is to store it on the Blackboard, before we start triggering subscriber callbacks. In the above example, the Polygon Painter issues from within the callback of its subscriber port a check for the latest available data posted by by the Image Rescaler. We are guaranteed that this check will return the latest image posted by the Image Rescaler: Indeed, as the Image Rescaler posts, first we store the image (which hence becomes available to checks), then we start triggering the callbacks of the Super Pixel, then Convex Hull Approximator, then Polygon Thresholder, and finally Polygon Painter modules. So the Polygon Painter subscriber callback is guaranteed to be called after the Image Rescaler data has been updated on the Blackboard.
Now examine the Polygon Thresholder. There is a possible race condition between the two branches that connect to it (one via Super Pixel and Convex Hull Approximator, and the other via Moving Object Detection). If you need to guarantee that the convex hull and the image should be from the same video frame (assuming that your Image Source is streaming frames from a movie or from a camera), then additional synchronization logic is necessary to enforce that same frame. A number of solutions are examined in FIXME.

Designing and optimizing data flow in NRT may require deep thinking and possible debugging. As you create your own complex networks of NRT modules, making sure that the data flow makes sense is an important part of your design (e.g., avoiding deadlocks due to some types of loops in your connections, etc).

For example, consider this NRT network:

poster-subscriber-deadlock.png

Ooops, the connection between the poster port of the Image Ranger and the subscriber port of the Morphology Filter creates an infinite loop (i.e., an infinitely deep recursion)! Callbacks will keep piling up until your computer runs out of resources (threads, memory, etc) and refuses to further recurse over that infinite loop. If you are using the NRT designer, simply pressing the Stop button before your machine is exhausted will break that loop and free the associated resources (pending callbacks).

What happens when a check for a message is issued?

Remember that NRT implements, for efficiency reasons, partial coherency among a distributed Blackboard federation. This is reflected by a difference between subscribers and checkers, designed to minimize wasted network traffic.

  • When a message is posted, if a matching remote subscriber exists on a different machine, the message is always serialized and sent to that remote machine.
  • However, the existence of a matching remote checker does not trigger transfer of every matching posted message. Indeed, the assumption is that the checker does not want to receive every message (otherwise, it would be a subscriber), but only, once in a while, the latest posted message. Thus, transfer of messages, if any, occurs at the time a check is issued.

To illustrate this, consider a video camera posting 30 high-resolution video frames per second. Then consider a remote module that only needs to use the latest available frame each time new GPS data is received, about once per second. It would be a big waste of CPU, network bandwidth, etc to send that module all 30 frames every second when it will only use one! Thus, the remote module should implement a subscriber port for GPS data and a checker port for image data. Each time GPS data is received, a check would be issued for image data.

Because check requests require that messages be transmitted in response to the request, they may take a while (if the network is slow or congested). Thus you should in general consider that checks are expensive. This is not the case for local checks (since smart pointers are used for those). Thus it is usually a good idea to run several modules connected by check relationships on the same machine.

In a bit more details, when a check request is issued, the following happens:

  1. A list of future results is created. In general, the length of the list is not yet known, because we do not yet know whether modules running on remote blackboards which possibly could post relevant messages indeed have posted them or not.
  2. Some placeholder slots in the list are immediately filled with any matching messages posted on the local blackboard. Those are immediately available for use by the issuer of the check command.
  3. Remote check requests are dispatched over the network, only to those remote blackboards which are known to have matching posters. As each request is received by each remote blackboard, it checks whether indeed some matching messages have been posted yet by those matching posters. If so, the messages are serialized and returned to the issuer of the initial check command. If not, the original checker is notified that no matching message has been posted yet (this is why the length of the list of future results is not known until all remote blackboards have replied).

The list of future results is organized in such a way that as soon as a result is received, it will be the next available result. Trying to get the next available result when all readily available ones have been consumed but some are still pending to be received will block the caller until those pending results are received.