hmbdc is an ultra low latency / high throughput middleware, what kind of latency performance should you expect from hmbdc on a typical hardware configuration nowadays?

You should expect ~200 nanoseconds for inter-thread and IPC message latency. Network message latency on 10G NIC could be as fast as 10 microseconds.
Please refer to here for details.

What about throughput?

Take 8 bytes messages for example, well over 100M messages per second on local machine (inter-thread or IPC) and more than 6M messages per second over Gigabit network.
Please refer to here for details on the even better 10G NIC results.

Can I trust the above numbers?

Test programs are provided in the forms of binary and source code, and everyone can try it out without license requirements.
Even the command parameters are listed here.

How does hmbdc reliable network messaging latency performance compare to other products, for example DDS?

It is only meaningful to compare the results when the tests are done on the same kind of hardware and OS, and we did just that to compare hmbdc rmcast, hmbdc rnetmap against the 2 well respected DDS products which, like hmbdc, provide public accessible performance measurement tools of their own. We call them dds_a and dds_b.

Here are the two hosts (both running CentOs 7) and two pairs of directly linked NIC used in the tests. All tests are executed within the CPU shields on the hosts to reduce jittering.
- Intel(R) Core(TM) i5-3350P CPU @ 3.10GHz 4 core 8G RAM (cpu 1-3 shielded)
- Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz 8 core 32G RAM (cpu 4-7 shielded)
- 1G direct link (intel 82574L and 82579LM)
- 1G direct link (intel 82574L and Realtek RTL8411) (this NIC is only used by rnetmap's test)
The parameter setting of DDS tests are from each product's performance test documentation. No fine tuning is done for each test. The same goes for rmcast and rnetmap. The only varying part of parameter setting is message size.

  • hmbdc vs dds_a latency comparison

A continuous 5000+ message round trip delays are measured and divided by 2 at 200 msg/sec. The min/max and mean (or median) latency values are shown in the diagram.

Both hmbdc rmcast and rnetmap are about a magnitude faster latency-wise than dds_a in our test.
Also notice that the kernel bypassing rnetmap is about twice as fast as rmcast. (for instance , 12 us vs 21 us)

  • hmbdc vs dds_b latency and throughput comparison
A continuous 24000+ message round trip delays are measured and divided by 2 at 200 msg/sec.
Both hmbdc's rnetmap and rmcast are significantly faster than dds_b in our tests (26us,49us vs 82us for 50 bytes median)
To take away from the test result, dds_min = rmcast_median = rnetmap_max !

Here are the throughput comparison results. We have seen some throughput results published by other companies based on the network utilization. In our opinion, that is not what our users care most, particularly when the message is small. The reason is that it doesn't take out the message header or key overhead into the equation. They all contribute to the utilization percentage. 

In hmbdc, we just measure the user bit per sec. If message rate is 1M/sec and the message size (excluding all the header or key which are bound to present in any useful messages) is 8, then the user bit per sec is 8M*8=64Mbps period!

The above chart shows all 3 products increase the bps results as message size increases - a result of the overhead of the message header per message decreasing.
rnetmap and rmcast perform almost identical and beat dds_b in all of our tests.

Large Network Messages

Generally speaking all hmbdc network transports (except for tcpcast) message size is subjected to the size of MTU and has a limit around 1013 bytes. For large message needs (for example, the user needs to propagate a 10GB file to a group of hosts using reliable multicast), hmbdc supports zero-copy message transfer in tcpcast, rmcast and rnetmap transports through the "memory attachment" technology, and there is NO LIMIT on the size of the attachment. 
Additionally, the memory attachment mechanism enjoys the same scalability characteristic as the individual transport itself, and the user could customize how the memory gets allocated and released on both sender and receiver sides - for example, the user could use a memory mapped file.
Here is the pure user data bit throughput performance for large messages on the same hardware used in above. 
On the 1G link, tcpcast plateaued at 0.94Gbps and the more scalable rmcast and rnetmap plateaued at 0.92Gbps.


measure thread message throughput  measure reliable netmap (rnetmap) network message latencies on a 1Gb link (netmap driver needs...