[NKP-0009] NKN Transport layer

The NKN message protocol isn’t reliable wiki - networking reliability which is fine in some cases (offers better performance, lower latency), but some use cases might need a reliable protocol implemented by client SDKs.

Current message protobuf encoding (simplified) is as follows:
Message { Recipient; EncryptedPayload { UseEncryption YES/NO; enc( Payload(Type; Data ) } } where Type can denote difference between BIN/TEXT/ACK packet.

I suggest a change that Payload would contain not just Data itself but split to Header(s) and Body where Header(s) can contain service information about the protocol state and if TCP like transport layer is used or not. (As well as other useful information)

User should be able to choose if they want the SDK to handle (initiate) connection that handles resending dropped packets, managing packet order, or both. Possibly with user-selectable hyperparameters, if they want to optimize for higher transfer speed or latency/session initiation. On incoming connection, the SDK would just continue using whatever type was used.

We will have to find a way to name things correctly to avoid confusion with the session mode (when released), that will create a direct tunnel and keep this session with one node only, not with one client using messages.

Possible Header types:

  • MessageType (as currently implemented)
  • ControlFlow { Control = NOTHING | DROPS | ORDER_AND_DROPS; some control bits required for this transfer (TCP protocol for inspiration) }
    When no ControlFlow header is prvided, one with Control = NOTHING is assumed. Control = NOTHING will not require any additional parameters

What are your thought/requirements for this protocol?

1 Like

I suggest going one level lower and create this new protocol directly on top of OutboundMessage/payload to avoid additional overhead. Also, that way we’ll be able to stack current nkn-client protocol either on top of raw OutboundMessage or this new protocol to seamlessly and effortlessly gain reliability (switch in nkn-client SDK)

hello,

thanks for your proposal, it looks good!

for me as one of the people trying to develop with NKN i think the most important thing would be that NKN tries to work using ACKnowledgement + resend - so send a package -

wait for confirmation or receiver, resend if in timeout x it did not get acknowledged. and provide reordering if packets arrive out of order.

for me its not so important to have lots of configurable options or parameters.

the new proposal i made today covers similar requirements: [NKN-0010] NVPN - Virtual private network functionality in the NKN core

hint: maybe instead of reiventing the wheel you guys can copy this retransmission, reorder etc. logic from existing networking stack code or maybe OPENVPN, they spent lots of time designing and testing it well.

What about using this control flow to replace the no_ack field in Payload message? It includes what no_ack is intended for but more general.

hi,

i am afraid i dont have an suggestion about this, i am not familiar with no_ack so someone else should answer.

A relevant (but not the same) stuff we just built: nkn-multiclient-js

The multiclient uses multiple concurrent (and importantly, independent) nkn client to greatly enhance reliability and reduce latency.

hi,

@yilun thanks but i am using the java sdk so i need an option to have retransmission & packet reordering either in the java sdk or better in the NKN core directly.

so e.g. i can be sure the packets coming in in the websocket are complete and in right order, when working with binary data specially lost or wrong ordered packets are useless.

I was thinking of using KCP for ARQ protocol: https://github.com/skywind3000/kcp/blob/master/README.en.md. It has implementation in most of our client languages (Go, Java, Nodejs), and it does not make any assumption on how message are sent and received.

If we choose to use it, we only need to implement a special message type to specify session number conv, and then use nkn client to send and receive message.

Please let me know your thoughts or any other ideas!

Do we even need the conv number?
I mean, are we expecting multiple sessions from the same client to the other one? I think the client identifier (name and public key) for each side is good enough. The multiclient uses a different name anyway (therefore this would generate different conv id). This only creates a local requirement to match correct messages with opened KCP sessions, which is not a big deal.

On the other hand however, we would have to implement some sort of conv negotiation scheme, even before starting the KCP algorithm to make sure the number is unique on both sides at the same time.

I would suggest adding new types of messages (now there is BINARY and TEXT) and that would be KCP_BINARY and KCP_TEXT saying that the KCP protocol should be used and what type of data to expect in the stream (so we don’t have to say this information again, inside the stream itself, to save some bandwidth)

(In theory, there might be 2 kcp sessions open per each pair of clients, one with TEXT mode, other with BINARY mode. It is still trivial to match packets to correct KCP session)

Do we even need the conv number?

conv is needed by KCP protocol and needs to be the same on both side. If we try to hard code a conv, we will lose the possibility of having multiple streams, or stream after client reconnect might get mixed up.

I mean, are we expecting multiple sessions from the same client to the other one?

I think that will be very useful. It’s essentially a multiplexing. For example, for file transfer application, it allows multiple files to be transferred concurrently.

On the other hand however, we would have to implement some sort of conv negotiation scheme, even before starting the KCP algorithm to make sure the number is unique on both sides at the same time.

I would suggest adding new types of messages (now there is BINARY and TEXT ) and that would be KCP_BINARY and KCP_TEXT saying that the KCP protocol should be used and what type of data to expect in the stream (so we don’t have to say this information again, inside the stream itself, to save some bandwidth)

The API I was thinking is similar to TCP and kcp-go:

// initiator:
session = Dial(nknAddr)

// the other end:
session = Accept()

If we support multiple concurrent sessions, each packet needs to include the conv for the session. Then we have two options:

  1. Do not add new message type. Instead, just add a conv field to BINARY message. When a client receives a message with a new conv, treat it as the request to establish a session. This is how kcp-go did it and it has 1x RTT lower latency, which is quite significant for nkn client (roughly 500ms)
  2. Add at least one message type like REQUEST or DIAL to indicate the intension to initialize a session, then the other end send back accept msg (or just use the request msg type for simplicity) as reply. This is more like the TCP style. The dialer is aware of the presence of the other side and will not attempt to send any data if the other side is not online.

(In theory, there might be 2 kcp sessions open per each pair of clients, one with TEXT mode, other with BINARY mode. It is still trivial to match packets to correct KCP session)

I feel that only supporting BINARY mode in streaming is enough because:

  1. I haven’t heard of anything uses text + streaming
  2. text streaming might not be well defined because a unicode char takes >1 byte, and a character might be split into two underlying packet by kcp or whatever protocol we use

On the other side, I’ve been trying out kcp with nkn client. It works most of the times, but sometimes it will get stuck, especially when I try to send large streams with many MB. I’m not sure if it’s a mis-configuration (as there are some parameters to tune). I will keep trying and let you know.

You said that conv has to be the same on both sides and unique. Which means, there has to be some negotiation (more than 1 RTT) (because some (different) convs might be used by different sides). But according to:

// initiator:
session = Dial(nknAddr)
// the other end:
session = Accept()

It doesn’t seem there is any (or rather, that the conv itself is fully handled by KCP protocol by itself)
How does that work? NKN RTT is terrible we want to minimize it as much as we can (e.g. handle which message needs to be matched with what ourselves, based on addresses and such)

Do you suggest that we remove the plain BINARY message and use KCP for everything? I don’t think that is a good idea. I mean, plain BINARY message will be useful for “low latency” applications, like games and such, rather than using full KCP which is designed for rather high volume data sending.

The message still won’t contain actual data, would it? That is handled by the KCP protocol which wouldnt be “active” yet and therefore couldn’t have generated any packets, right?
Therefore I dont see a lot of difference between options 1 and 2, as the conv number would have to be assigned/negotiated with server as well, and therefore some reply “Ok, tunnel is active. Let’s call it conv=5” will be required anyway, before the actual data stream.
Do I understand it correctly?

Does the KCP support natively very large files and their verifications? Or would we still need to implement some sort of chunk system?

I mean, imagine I want to send 5GB movie or something. Do I just give KCP the entire 5GB binary blob on one side, and wait for 2 hours on the other side and then it will give me 5GB binary blob, that has already been integrity verified (some SHA checksum or something)?

Or do we need to split to 1MB chunks anyway, and send each one separately and reconstruct on the other side, manually?

And same question goes for protobuffer. Can I just create something like “filename”, “size in bytes” and “5GB binary blob” as one thing? Or is it too much for it to handle?

Oh I don’t meant to remove plain BINARY message at all. I’m just saying the following simple protocol should work:

  • Add a field (e.g. kcp_conv) to the current Payload structure
  • If kcp_conv is not sepcified, it’s the current BINARY message
  • When A wants to create a new session to B, just send the data in session (kcp will give you bytes to send) with kcp_conv=conv
  • When B receives a Payload, check kcp_conv first. If it’s not set, then it’s regular message. If it’s seen, then it’s a known session and pass data to it (each session has its own kcp object). If it’s not seen, then create a new kcp object and pass the data to it
  • In this way, the first packet of a session contains the actual data, and there is no additional packet (without any data) needed to negotiate the conv

KCP is very low level. It cannot take too large chunks, and it will take whatever passed to it to the send queue. In order not to ruin the performance, the upper level (application) needs to check the sending queue length (how many packets have not been sent & acked yet) and only send when the queue is not too long (e.g. a few hundred KB, or a few hundreds x MTU to be more precise). So typically there is another layer of structure (e.g. Session) that manages this and will block the session.send function call if kcp sending queue is too long, and also split large input into small chunks before feeding it to kcp. But session is not part of kcp and needs our own implementation.

KCP does not have checksum itself, but it do provides reliability + packet order. That should be enough because NKN packet is using AEAD, which includes not only encryption, but also authentication. So in short, NKN packet has verification, and upper level is reliable + ordered, so the whole stream is verified.

As for the protobuf, I don’t think you want to put too much content in it because typically people decode the whole protobuf object into the memory, which will become the bottleneck if you put too much content inside.

I did a bunch of more tests using nkn client + kcp with many different kcp configs. Now I’m not sure if kcp is a good choice for us given the results. Specifically, the problem I found are:

  • Basically for whatever parameters I choose, kcp is using way too much more bandwidth than the actual data to be sent (caused by retransmission because kcp overhead can be ignored in my tests), from 2x to more than 10x.
  • KCP has very limited throughput in our case, probably due to the bandwidth overhead, but might also related to the fact that KCP is not optimized for throughput by design. I can easily create one or a few pair of nkn client, send data between them, and get around 1MB/s throughput. The data is similar for nkn-file-transfer. But if I use kcp with it, I typically only get 2%~20% of the throughput.

I’m now trying to learn more about the underlying mechanism of KCP and think more about whether it’s really suitable for us. Will send more results here.

I did some more research and tests and now I think we don’t need kcp or other existing ARQ algorithm. Here is the reason: Each hop from sender to receiver in NKN is using reliable transport protocol (TCP by default) with all features like flow control, congestion control. Usually the cause of throughput bottleneck are:

  1. From sender client to node
  2. From one node to another node
  3. From node to receiver client

If 1 is the bottleneck, no ARQ algorithm/flow control/congestion control will be helpful. If 2 and 3 are the bottleneck, then what happens is that packets will be accumulated at one node until the buffer channel is full, and then packets will be dropped massively. We don’t want to wait for congestion to happen and then start to adjust window, because that means packets have already been dropped massively. Instead, we can simply use a client window size (e.g. 256) that is a lot smaller than the packets buffer size in node (2333 by default) and makes sure the throughput bottleneck will not cause congestion.

Also, based on the tests, typically end to end latency will not change dramatically if congestion does not happen (while they do have quite significant variance), but once the congestion (due to too large window + bottleneck at node like above) happens, packets are massively dropped, which makes dynamic timeout not working well. When I test kcp where the ack timeout is dynamically adjusted, I often saw only two state: either the ack rtt is normal (<1s), or it grows crazy (to whatever max value) due to the congestion, which almost always happens on every test. What I found that works best so far is to use a fixed timeout for ack (e.g. 3s or 5s) to distinguish normal and abnormal state.

So given the characteristics of nkn network, simply using a fixed proper window size (e.g. min(sender window size, receiver window size, max allowed value)) with a fixed proper ack timeout will work pretty well. Actually I did some preliminary tests and can pretty much easily get something like 500+KB/s with one client, or something like 2MB/s with multiple concurrent client pair transmitting tens of MBs of data, more than 10x higher than what I typically get with kcp (using best parameters I found).

I will try to write a protocol draft and post it here.

Even though each hop/node uses reliable protocol, the entire thing isn’t as the node itself might drop the message (became temporarily unavailable or something) and not send it. Also, as each packet can take different route, the result is not ordered (which I don’t believe is a big deal for most applications. Even file transfer is fine when the packets are not ordered as long as there is id for each packet)

The features I think we need from the new protocol is retransmission on failure (and to detect that packet drop and ask for retransmission) with as little overhead as possible and as few RTTs as possible.

And maybe that session mode, as you mentioned before, which would allow to send large files or continuous streams to the other side (and using the above mentioned protocol, it would be guaranteed that everything arrives and the content can be reconstructed on the other side)

Even though each hop/node uses reliable protocol, the entire thing isn’t as the node itself might drop the message (became temporarily unavailable or something) and not send it. Also, as each packet can take different route, the result is not ordered (which I don’t believe is a big deal for most applications. Even file transfer is fine when the packets are not ordered as long as there is id for each packet)

Given the same topology and node buffer not full, it is almost reliable because dropped packets will be retransmitted hop by hop, and route is also static. Actually I tried sending tens of thousands of packets as fast as possible, and most of the times they all arrive in the correct order. But if we have multiple concurrent sender/receiver client pair to take advantage of the multi-path transmission (which we definitely need), packets will not be ordered.

The features I think we need from the new protocol is retransmission on failure (and to detect that packet drop and ask for retransmission) with as little overhead as possible and as few RTTs as possible.

And maybe that session mode, as you mentioned before, which would allow to send large files or continuous streams to the other side (and using the above mentioned protocol, it would be guaranteed that everything arrives and the content can be reconstructed on the other side)

Yeah the session protocol I suggested to use above is basically a simplest ARQ protocol that retransmit lost/timeout packet using a fixed window/timeout and almost no overhead compared to current nkn packet. What I suggest is that, after tests and some analysis, probably a simple ARQ with fixed window/timeout is good enough, unless we find an algorithm that can adjust those parameters dynamically without dramatically sacrifice performance.

Yes, but sometimes it happens that the node goes offline for whatever reason. Or new connects and the path between nodes changes to include this one. These are the message drops I am talking about. Or even malicious nodes, which would just drop some messages. We need an end-to-end retransmission mechanism (simple ARQ)

Yes, but sometimes it happens that the node goes offline for whatever reason. Or new connects and the path between nodes changes to include this one. These are the message drops I am talking about. Or even malicious nodes, which would just drop some messages. We need an end-to-end retransmission mechanism (simple ARQ)

Yeah that’s true, working on it recently and will send back updates :slight_smile: