Introducing Decentralized Pub/Sub Based on NKN

zbruceli · October 21, 2020, 7:50am

by Yilun Zhang, CTO of NKN.org

What is Pub/Sub

One basic function of NKN client (e.g. https://github.com/nknorg/nkn-client-js) is to provide a decentralized message system , which includes unicast, multicast and anycast. That’s pretty much enough if the message sender knows who are the receivers. However, in lots of common scenarios, the receiver should be logically decoupled from the sender. For example, when I send a message to a chatroom, I don’t want to know exactly who is in the chatroom, I just want whoever in the chatroom to receive my message. That’s where pub/sub comes into place.

Basically, pub/sub (short for publish/subscribe) is a model that decouples message sender (publisher) and receiver (subscribers) . Publishers publish message to topic (we only consider topic based pub/sub here) without having to know who subscribe to the topic and will receive the message. Subscribers subscribe to topics and will receive messages sent to subscribed topics. Pub/sub is a basic building block for modern applications , and has been widely used from infrastructure level (e.g. load balancing) to application level (e.g. chat room/messenger).

The following diagram from Google Cloud ( https://cloud.google.com/pubsub/docs/overview) shows the relationship between publisher, subscriber and topic.

Challenges of Decentralized Pub/Sub

Cloud providers like Google Cloud, AWS provide cloud based pub/sub, but their centralized nature makes them hard (if not impossible) to be used in decentralized applications.

On the other hand, building decentralized pub/sub is also challenging, as most existing decentralized systems (e.g. Ethereum) are not well suited for real time messages — if sending a single message would cost over a cent in USD and almost a minute to deliver, not to mention the scalability issue, how could you expect it to be practically useful?

To be more general, there are a few challenges in building decentralized pub/sub based on existing decentralized system (most likely blockchain based):

Message needs to be delivered in real time
Message delivery needs to be affordable
Message throughput needs to be horizontal scalable

If you come from a non-blockchain background, you are probably laughing at how trivial the above “challenges” are. Well the reality is, if we rely on on-chain transaction to transmit information, it’s still very hard to solve the above problems.

One solution to these problems is to use off-chain message delivery . That’s why we believe NKN is perfect for being the infrastructure of a decentralized pub/sub system : message delivery in NKN is instant (millisecond level end to end latency), free , and horizontal scalable (more nodes → higher throughput) because it’s purely off-chain.

Building Decentralized Pub/Sub

To build a pub/sub system, we need to solve two basic problems: how to store and retrieve topic-subscriber info , and how to deliver messages . While NKN naturally solves the second one, we still need to decide where the subscriber information is stored.

After quite a lot of discussion, we decide to store the topic-subscriber information on-chain . As the result, subscribe needs to be done in a transaction, which will be reliable but not horizontal scalable. Luckily, subscribe should be a much less frequent action compared to publish (which will be off-chain and horizontal scalable), so it shouldn’t be the bottleneck.

After some work and tests, we are now happy to say that our pub/sub is working quite nicely . Because publishing is mainly sending off-chain messages, it is integrated into NKN client (e.g. https://github.com/nknorg/nkn-client-js). Subscribing, on the other hand, is integrated into NKN wallet (e.g. https://github.com/nknorg/nkn-wallet-js) as it needs to sign and send transactions. Both are integrated into NKN SDK (e.g. https://github.com/nknorg/nkn-sdk-go) which has both NKN client and NKN wallet.

Using Pub/Sub

Details of how to use pub/sub can be found in the documentation of various NKN client/wallet/SDK implementations. The API is quite straightforward in general. For example, in the JavaScript implementation, subscribing to a topic is as simple as

wallet.subscribe(topic, bucket, duration)

The reason that we have bucket concept is to avoid (accidental) message flooding in the case of massive subscribers, and can be hidden by higher level APIs like SubscribeToFirstAvailableBucket . Similarly, publishing to a topic is as simple as

client.publish(topic, bucket, message)

and bucket count of a topic can be get by APIs like GetTopicBucketsCount . Clients subscribed to the topic can listen for messages with

client.on(‘message’, (src, payload, payloadType) =&gt; {
});

Use Cases and Summary

Pub/sub has been widely used in many systems and applications. In additional to the existing use cases, I’d like to dive into a new class that is more suitable and unique to decentralized applications .

Most centralized applications are not open sourced, and protocol is often bound with the application only. Decentralized applications, on the other hand, are most likely open sourced, and protocol is decoupled from the implementation to allow cross implementation communication. This greatly reduces the friction to design and implement cross application protocols. If multiple applications want to share the same information flow, a decentralized, application neutral and language neutral pub/sub platform would be essential.

Here are some examples:

Different service providers want to share the same service discovery mechanism
Multiple applications want to share the same rating system
Applications would like to pass data to downstreaming applications sharing the protocol

Our decentralized pub/sub could be used to achieve these goals. To be more general, this represents one of the most interesting properties (in my opinion) of decentralized applications — the separation of application, protocol, and data . NKN’s technology is uniquely positioned and purposely built to capitalize upon this opportunity. We will soon release something built on our decentralized pub/sub framework, so please stay tuned!

zbruceli · April 9, 2019, 4:17pm

What is Pub/sub

https://cloud.google.com/pubsub/docs/overview

Pub/Sub brings the flexibility and reliability of enterprise message-oriented middleware to the cloud. At the same time, Cloud Pub/Sub is a scalable, durable event ingestion and delivery system that serves as a foundation for modern stream analytics pipelines. By providing many-to-many, asynchronous messaging that decouples senders and receivers, it allows for secure and highly available communication among independently written applications. Cloud Pub/Sub delivers low-latency, durable messaging that helps developers quickly integrate systems hosted on the Google Cloud Platform and externally.

Core concepts

Topic: A named resource to which messages are sent by publishers.
Subscription: A named resource representing the stream of messages from a single, specific topic, to be delivered to the subscribing application. For more details about subscriptions and message delivery semantics, see the Subscriber Guide.
Message: The combination of data and (optional) attributes that a publisher sends to a topic and is eventually delivered to subscribers.
Message attribute: A key-value pair that a publisher can define for a message. For example, keyiana.org/language_tag and value en could be added to messages to mark them as readable by an English-speaking subscriber.

Advantages

https://en.wikipedia.org/wiki/Publish–subscribe_pattern#Advantages

Loose coupling

Publishers are loosely coupled to subscribers, and need not even know of their existence. With the topic being the focus, publishers and subscribers are allowed to remain ignorant of system topology. Each can continue to operate as per normal independently of the other. In the traditional tightly coupled client–server paradigm, the client cannot post messages to the server while the server process is not running, nor can the server receive messages unless the client is running. Many pub/sub systems decouple not only the locations of the publishers and subscribers but also decouple them temporally. A common strategy used by middleware analysts with such pub/sub systems is to take down a publisher to allow the subscriber to work through the backlog (a form of bandwidth throttling).

Scalability

Pub/sub provides the opportunity for better scalability than traditional client-server, through parallel operation, message caching, tree-based or network-based routing, etc. However, in certain types of tightly coupled, high-volume enterprise environments, as systems scale up to become data centers with thousands of servers sharing the pub/sub infrastructure, current vendor systems often lose this benefit; scalability for pub/sub products under high load in these contexts is a research challenge.

Outside of the enterprise environment, on the other hand, the pub/sub paradigm has proven its scalability to volumes far beyond those of a single data center, providing Internet-wide distributed messaging through web syndication protocols such as RSS and Atom. These syndication protocols accept higher latency and lack of delivery guarantees in exchange for the ability for even a low-end web server to syndicate messages to (potentially) millions of separate subscriber nodes.

markg85 · September 21, 2019, 8:59pm

Hi,

This is a nice neat system you’ve got here! Specifically the speed and scalability is really amazing!

Now i think i have a usecase that would be very interesting for pub/sub where i’d like your opinion on.
In the IPFS world, there was a live streaming test a while ago. You can hear all about it here: https://ipfs.infura.io/ipfs/QmWsKvBvXUKaHcHzrUS91XV4k3YjQFdywQ7bY9BZVX4ghk/

For live streaming, they used HLS and a player capable of playing the resulting m3u8 file. When using HLS it means your video will be send to you in chunks (each of those is in the ever updating m3u8 file). The problem those folks had is updating the file which could take a few seconds for it to reach the user. That few seconds is too much when the chunk size is a few seconds. They eventually kept the m3u8 file on “the regular centralized internet” but each chunk was in IPFS. They tried pub/sub but it wasn’t as stable as they had hoped it would be.

Now enter NKN with it’s millisecond pub/sub. If you keep all the other conditions the same but rather use NKN with pub/sub as a sort of digital m3u8. To maintain compatibility, the client side would construct the m3u8 file with input from the messages flowing in from the publisher. Now, as a video or live event is cut into - say - 5 second pieces, each piece being a message, that quickly expands to A LOT of messages.

To do a little math. Lets say a popular live event is being broadcasted with say 250.000 viewers for a live event of say 2 hours. If you cut that 2 hours up in 5-second-pieces you already have 1440 messages (2 hours = 7200 seconds / 5 = 1440). Over the event that’s 360 million messages (1440 * 250000). On average, that’s 50000 messages per second. Now, you will probably have those 250.000 messages send within the first 2 seconds of the 5 as you want them out asap to not cause stuttering. Now this is all still hypothetical, i don’t have an actual case for this. But if your ambition it to be the next internet then this scenario is easy to get. Look at this for example and i quote: “…was streamed simultaneously by 458,000 viewers on YouTube worldwide…”. So yeah, this scenario is still mild compared to that event.

Mind you, the above is a simple scenario. If you add in multiple resolutions the amount of messages goes up! Quite far up.

I’m just curious what your thoughts are at these scales

zbruceli · September 21, 2019, 11:58pm

Hi Mark

Thanks for the very interesting suggestion. It indeed fits both our Pub/Sub service, as well as our upcoming nCDN (new kind of Content Delivery Network) for video streaming.

I will let the tech team know and they can respond properly. @yilun

Again thanks for the suggestion!

yilun · September 22, 2019, 10:33am

That sounds like a good use case. The scale you described is nothing for NKN network, but the sender needs to do it in the right way. Let’s also do some math:

Let’s use 100k msg/s as example. Because client address are random and evenly distributed, those 100k msg/s will be load balanced automatically by all nodes in the network. Let’s assume there are 20k nodes (the current network size is 17k), then each node will need to relay 100e3 * log(20e3)/2 / 20e3 ~ 10 msg/s (log(20e3)/2 is the average path length in Chord DHT). Well, that’s just a tiny number To give you some idea, each relay is literally receive a msg, compute a sha256 hash, and send it to the next node. Normal node can easily do thousands of or even tens of thousands of them per second.

But the tricky thing is the sender (video source) part:

The sender needs to have enough upload bandwidth. The space overhead to send a msg brought by nkn client is around 32 bytes per dest per msg, so that’s around 3MB/s overhead for 100k msg/s.
In addition, the sender also needs to have enough computational power because nkn-client needs to compute an Ed25519 signature per message per dest. As a reference, my MBP can sign 16k+ msg/s/core, so probably 8+ cores are needed.
He cannot just create one NKN client and rely on that single node to send all those messages. The better way is to create enough clients, and each sender client just sends to part of all those receipts.
NKN’s subscribe info is stored on-chain, so subscribe is an on-chain operation and has some latency. For service at such a scale, I guess viewers info is also stored somewhere else, so it might be better to just use that viewers info with NKN’s p2p messaging.

markg85 · September 22, 2019, 1:26pm

Is that really going to work? I mean, on a technical level it probably will and for just watching a video that will be fine too. But for a live video that load balancing will introduce delays and if those delays add up to, say, 1 full chunk then you’ll probably get either stuttering or just downright playback failures. (just guessing here, i’m no HLS expert). Don’t you want something of a “max hop latency”, with that i mean that the message is still load balanced in the network but only till a set limit (say 1 second max). After that, it should not get load balanced anymore. That also means that the 20k nodes is a nice number, but that’s not the number that would do the load balancing that’s quite likely to be much lower and (hopefully) mostly involves the nodes closest to the people that view the live stream.

I don’t quite get this. Does the sender need to send all the messages for all the viewers? I was under the impression that the sender would need to send, say, 1000x the messages and that the load balancing within the network would be enough to re-transmit messages to the target audience?

But i could be mixing concepts here. For instance, if i compare this to IPFS then if i watch a video, i become a sending node for other people. The original source does not have to send 1 extra message if another user pops up and starts watching but happens to be connecting to me as i’m closer. That user would get the messages from me.

This kinda confirms what i say above. It would be so awesome if there would be an option for a node to not just be a relay, but also to serve as source. So like IPFS in a way

Here’s another interesting point! Say you really have 250k viewers. They all would have to subscribe, otherwise they don’t get m3u8 updates. And they would subscribe mostly at the start of a live stream. Say 200k subscribe requests right from the start. That is, as you said, an on-chain action. Where NKN would need to be able to handle that amount in, say, 1 minute. Sure, it’s a “peak load”, but hey, this isn’t unthinkable Can NKN handle that? This would be a on-chain throughput of ~3333 messages per second! Also, how about fees in this case?

yilun · September 22, 2019, 2:05pm

Is that really going to work? I mean, on a technical level it probably will and for just watching a video that will be fine too. But for a live video that load balancing will introduce delays and if those delays add up to, say, 1 full chunk then you’ll probably get either stuttering or just downright playback failures. (just guessing here, i’m no HLS expert). Don’t you want something of a “max hop latency”, with that i mean that the message is still load balanced in the network but only till a set limit (say 1 second max). After that, it should not get load balanced anymore. That also means that the 20k nodes is a nice number, but that’s not the number that would do the load balancing that’s quite likely to be much lower and (hopefully) mostly involves the nodes closest to the people that view the live stream.

It’s not the load balance you are thinking about. Maybe I shouldn’t use the phrase “load balance”, but what I meant is that, Chord DHT (or any other DHT) has the “load balance” nature such that a msg will only pass log2(N)/2 nodes on average, and log2(N) nodes at most. As long as sender and receiver are distributed randomly, traffic will be automatically “load balanced”.

I don’t quite get this. Does the sender need to send all the messages for all the viewers? I was under the impression that the sender would need to send, say, 1000x the messages and that the load balancing within the network would be enough to re-transmit messages to the target audience?

Sender does not needs to send one msg per dest, but he do needs to attach one signature per dest with the msg. The signature is used to construct signature chain, which is what relayer nodes used to win mining rewards.

But i could be mixing concepts here. For instance, if i compare this to IPFS then if i watch a video, i become a sending node for other people. The original source does not have to send 1 extra message if another user pops up and starts watching but happens to be connecting to me as i’m closer. That user would get the messages from me.

I’m not sure if I fully understand the problem but NKN has a fundamentally different architecture than IPFS: in NKN we have nodes and clients, where nodes are relayers, trying to get mining rewards by relaying data for clients, while client connects to one node, send and receive data by that node. This enables every device, regardless of public IP address, NAT condition, etc, to be a very light-weighted client that only maintains an outbound websocket connection. In this case I assume user will be client but not node, so they won’t be directly connected to each other, and they communicate through the p2p messaging.

This kinda confirms what i say above. It would be so awesome if there would be an option for a node to not just be a relay, but also to serve as source. So like IPFS in a way

It should be easy to solve I guess: just run a node and a client Fundamentally we try to separate node and client because unlike IPFS, we also have the blockchain part. NKN node is also running the consensus protocol and it might not be a good idea for, e.g. mobile web app or native app, to do all these stuff for a lot of reasons such as performance, security, battery, etc.

Here’s another interesting point! Say you really have 250k viewers. They all would have to subscribe, otherwise they don’t get m3u8 updates. And they would subscribe mostly at the start of a live stream. Say 200k subscribe requests right from the start. That is, as you said, an on-chain action. Where NKN would need to be able to handle that amount in, say, 1 minute. Sure, it’s a “peak load”, but hey, this isn’t unthinkable Can NKN handle that? This would be a on-chain throughput of ~3333 messages per second! Also, how about fees in this case?

It’s not a problem in our (not so good but standardized) machine based on our testing, but in the public mainnet it depends on most nodes’ spec, and it’s hard for us to predict given that we only run ~1% nodes in the network and we have no way to detect other nodes’ spec Also most nodes set up max block size to some pretty small value (much smaller than what they could) as we never had even close load in the network. But I guess it’s not a problem in the long term.

The fee mechanism is similar to Ethereum: nodes will pack txn with higher fee first and then lower fee.