[NKP-0021] Prefer stable nodes in routing to incentivize nodes to be more stable

Sangellos · January 10, 2022, 4:18pm

That seems like the opposite of what you said in the initial post “As we have seen recently, massive amount of nodes getting online and offline, although having no impact on off-chain data transmission (relay), can potentially cause issues to block propagations.”.

If this NKP is implemented it should have a cap. So for instance if the cap was 24 hours then instead uptime/latency it could be min(uptime, 24 hours)/latency. If there is no cap then it becomes very difficult for new miners to compete. The lower this cap is the closer it approximates the current system (that doesn’t factor in uptime), so I think it’s safer to start with low values. I think 24 hours is too much since if a node has only 1 disconnection per 24 hours on average, then it would cut their profit in half - that’s huge and would price out many people from mining. Though this is an oversimplification since competing nodes disconnect sometimes too, and its more complicated than cutting it in half since the distribution of latencies in the network matter etc.

If this choice from X next hop candidates is not enforced and also there is no economic incentive for miners to follow this new rule, then this shouldn’t be implemented. Mining software must be made to maximize mining profits for the node operator - not be made to maximize the greater good of the network. Otherwise someone will fork the mining software and change the code to make it run more profitably in order to increase their own profits.

yilun · January 12, 2022, 1:07am

That seems like the opposite of what you said in the initial post “As we have seen recently, massive amount of nodes getting online and offline, although having no impact on off-chain data transmission (relay), can potentially cause issues to block propagations.”.

You are right (so is your quoted initial post)! I almost forgot, this NKP was initially proposed to make block propagation more stable… But the reason we encouraged community look at this NKP again recently is for relay and service stability, as we recently discovered. Actually there are more benefits we didn’t mention in the post, such as each node will have less bandwidth usage caused by node churning; message cache will likely to have longer persistent time. After all, having a more stable network is always good, especially if we want to be one of the network infrastructures for web3.

If this NKP is implemented it should have a cap.

Definitely!

I think 24 hours is too much since if a node has only 1 disconnection per 24 hours on average, then it would cut their profit in half - that’s huge and would price out many people from mining. Though this is an oversimplification since competing nodes disconnect sometimes too, and its more complicated than cutting it in half since the distribution of latencies in the network matter etc.

The network stats says differently though. According to crawler results, most nodes have way more than 1 week uptime, which means they don’t restart or disconnect for weeks. Also it’s not just about the uptime cap, it’s also about how uptime is added into the equation. If, for example, uptime has very low weight compared to latency, then it will be ok to choose a longer uptime cap.

But I understand your concern, so I’m also thinking of some mechanism that can tolerate occasional short disconnect. For example, the “score” of a neighbor is reset only if the neighbor change his IP, or disconnect for a long time or too often. Or maybe disconnect doesn’t even reset score, just reduce it. It’s just a rough idea now, so there are a lot more to consider more carefully and thoroughly, but it still sounds like a good idea.

If this choice from X next hop candidates is not enforced and also there is no economic incentive for miners to follow this new rule, then this shouldn’t be implemented. Mining software must be made to maximize mining profits for the node operator - not be made to maximize the greater good of the network. Otherwise someone will fork the mining software and change the code to make it run more profitably in order to increase their own profits.

Choosing next hop with lower latency and better stability always have (indirect) economic benefits, because node with lower latency and better stability has higher chance to receive, process, and deliver packet to destination and increase the chance of earning reward.

Toolkit · January 20, 2022, 5:21am

I don’t think it’s a good idea. As I observed, large portions of NKN nodes don’t run for a long time. It is almost impossible to find any hosting service which can provide more than 1 Terabytes transfer one month for less than $3 which is supposed to be earned by rewarded tokens. This implementation will hurt the miners providing cheap data transfer, leading to a decrease of the number of nodes. Further more, if longest living nodes are preferred, it will be easier for long living nodes to controller the transfer chain, thus reducing the security. As for this implementation, compared to other blockchain tokens, I suggest raising a proposal that requires some NKN stake to vote for, this is much fairer for working miners to participate and govern rather than doing the fix all by official developer team.

yilun · January 20, 2022, 7:47am

As I observed, large portions of NKN nodes don’t run for a long time. It is almost impossible to find any hosting service which can provide more than 1 Terabytes transfer one month for less than $3 which is supposed to be earned by rewarded tokens.

That’s what needs to be solved. You have pointed out a good point: the cost (on paper) should be higher than reward. But why people keep running it at high churning rate if the cost is more than the reward? Actually we have received official reports from major cloud providers (such as AWS, DO) that some nodes running on their platforms never pay their bill, and that’s why they keep churning. This is an issue we need to solve, or it will hurt the ecosystem for the long term.

Further more, if longest living nodes are preferred, it will be easier for long living nodes to controller the transfer chain, thus reducing the security.

Not really. The long term preference we are proposing has an upper limit (a few days for example), and nodes running more than that limit won’t have additional advantage. Most normal nodes can hit this limit easily, so the network security won’t be affected. In addition, this mechanism only affects rewards, not consensus.

As for this implementation, compared to other blockchain tokens, I suggest raising a proposal that requires some NKN stake to vote for, this is much fairer for working miners to participate and govern rather than doing the fix all by official developer team.

We already have the generate ID fee mechanism. This mechanism and similar mechanism such as staking only work on public key, but not physical node (IP address). We are trying to solve physical node churning issue in this proposal, not public key churning.

nknprominer · January 20, 2022, 10:58am

Actually this problem is easy to solve.

Just follow the “stable node” way, and you can easily solve this problem.

Miners who use AWS but never pay their bills, they can’t keep their nodes online for a long time, if you change the rule to “the longer the online time, the more neighbors, the more stable the node, the higher the probability of mining NKN”

nknprominer · January 20, 2022, 11:02am

Now a large number of nodes go online and then go offline, which greatly affects the problem of the overall nKN network. As mentioned earlier: the network transmission interruption caused by the node offline, the user sees the prompt of the network connection error, but the application does not It will automatically refresh, or reconnect, and the user needs to manually refresh the application to get the correct result

nknprominer · January 20, 2022, 11:10am

I think this “value” should be set to 30 days

Toolkit · January 20, 2022, 11:35am

I know this may be offensive but I have to point out that you cannot count for users’ Raspberry Pis or routers to contribute as main infrastructure in NKN network. By implementing this, you will easily wipe out large providers which contribute a lot with high quality machines and network to the NKN network. No one would even want to host their nodes on AWS, DO or other large providers which have really expensive data transfer cost. I think mining NKN should be a way to maximize residual value of unused nodes rather than find providers which cost less.

To be more crucial, if this is implemented, people who uses service without paying would not even stop but find a more fierce way to do that, until when they do this harder without getting profit, by then NKN network no longer has the same scale and prosperity as now. The urgent thing to do now is reducing the data transfer required to reduce cost or reducing the performance required for each node. For nodes which don’t relay commercial data need 1+ TB data transfer each month and more than 100 iops disk activity, it is really hard to maintain.

I aquatint with many major miners who participate. Welcome joining us and we can know each other further. https://t.me/paofufu

yilun · January 20, 2022, 12:03pm

By implementing this, you will easily wipe out large providers which contribute a lot with high quality machines and network to the NKN network.

This is definitely not true. Nodes on cloud providers have the best uptime as long as their account are normal. Personally I have nodes running on all those platforms, and they almost never restart or interrupted. The only nodes being affected are those who don’t pay their bill (e.g. by providing false identity or payment method, which technically speaking are illegal) and thus account closed by providers in a few days since creation. We definitely don’t want to encourage those behaviors.

On the contrary, since those miners are already under cloud providers’ radar, cloud providers might even just detect and not allow anyone to run NKN node on their platforms. That is really wiping out NKN on those platforms.

To be more crucial, if this is implemented, people who uses service without paying would not even stop but find a more fierce way to do that, until when they do this harder without getting profit

If that happens, we will try to find new solutions for whatever ways they find.

by then NKN network no longer has the same scale and prosperity as now

The size of the network is always determined by total mining reward / cost per node. If network size is below that, there will always be someone started mining because it’s profitable.

The urgent thing to do now is reducing the data transfer required to reduce cost or reducing the performance required for each node. For nodes which don’t relay commercial data need 1+ TB data transfer each month and more than 100 iops disk activity, it is really hard to maintain.

The traffic is mostly determined by network utilization. On the long term, when the network becomes more popular, we will definitely see more but not less traffic, and reward per traffic relayed will be decreased. This is inevitable in all PoW network, just like the reward per hash power in Bitcoin is always decreasing since it’s created.

jerry49 · January 21, 2022, 11:30am

In a traditional PoW system like Bitcoin, a miner(pool) has to stay 10 mins average, because the correct block hash may arrive at any time between the 10mins time window.

for hash power migration issues, bitcoin has difficulty adjustment algorithm.

if we use this concept, an NKN miner (pool) should also stay for a time window, but this time window should be longer (maybe 24 hours)?

for difficulty adjustment, if the NKN miner is stable, it should have fair difficulty, if the miner is unstable, it should have more difficulty mining a block?

yilun · January 21, 2022, 11:44am

if we use this concept, an NKN miner (pool) should also stay for a time window, but this time window should be longer (maybe 24 hours)?

for difficulty adjustment, if the NKN miner is stable, it should have fair difficulty, if the miner is unstable, it should have more difficulty mining a block?

That’s exactly what this proposed mechanism does

jerry49 · January 21, 2022, 11:58am

Perfect! Just voted Yes for NKP-0021

FiretronP75 · January 27, 2022, 4:22am

It seems that every time a node prunes, it resets the uptime, which makes sense, if it can’t transact during pruning. If full nodes never prune, then full nodes would have a huge advantage over lite nodes if the proposal were implemented. And then the lite nodes would be pointless, and everyone would have to use full nodes in order to participate in the network.

yilun · January 27, 2022, 5:34am

Uptime will be reset when node restart, regardless of pruning or not. This does not give any advantage/disadvantage to full or light node.

FiretronP75 · January 27, 2022, 6:16am

Personally I have nodes running on all those platforms, and they almost never restart or interrupted. The only nodes being affected are those who don’t pay their bill

I pay my bills but my nodes on digital ocean report offline for about 2 minutes, regularly. I believe it is during pruning? This ruins their reported uptime in the current system.

FiretronP75 · January 27, 2022, 6:18am

Are you saying that in the new system, uptime will not be interrupted by pruning?

yilun · January 27, 2022, 6:30am

2 min is definitely not cause by pruning because pruning typically takes at least tens of minutes or even hours. I would say it’s probably crawler’s issue, which will not affect its uptime.

Are you saying that in the new system, uptime will not be interrupted by pruning?

Pruning only happens when a node restart, and whether pruning is enabled will not affect when a node restarts. When a node with pruning enabled restarts, it will prune and then enter mining state. When a node without pruning enabled restarts, it will enter mining states directly. So in short, whether pruning is enabled will not affect the uptime of a node at all

Fred · February 2, 2022, 2:16am

I’m surprised that this issue still hasn’t been resolved one way or the other after over a year. There’s also the empirical fact that the node count has been defiantly stable for months. That’s not the sign of a growing ecosystem, which underscores the importance of this proposal that might address the cause of this stagnation.

I’ve seen the frequent complaints about waiting too long to get rewards, which discourages new miners who might otherwise offer faster and more reliable nodes. I also understand Yilun’s argument that stability must be rewarded if NKN is ever to be a mainstream technology for CDN services. We certainly can’t reward opportunistic bot armies that go on and off every few seconds in order to capture rewards but leave web users stranded with errors.

Therefore, instead of something like (uptime/latency), why not implement (log(uptime)/latency)? After all, uptime is only so important; there is no humanly meaningful difference between a node that goes offline once a month and another that goes offline once a year, especially if there are other nodes available to relay the traffic in such situations. On the other hand, there’s a huge difference between getting knocked offline every minute as opposed to every hour.

At most, I might be in favor of something like ((uptime^0.5)/latency). But even that seems overly generous. Perceived value distributions typically manifest as power laws, though, so we might indeed be looking for a reasonable value of X in ((uptime^X)/latency). But if that’s the case, then certainly (0<X<1).

I’m not in favor of hard cutoffs like “10 days for a maximum reward”. Arbitrary thresholds create unexpected arbitrage opportunities that might become a disservice to the community; better to just use a continuous function.

yilun · February 2, 2022, 1:23pm

Thanks for your feedback! Here are my thoughts.

Therefore, instead of something like (uptime/latency), why not implement (log(uptime)/latency)?

uptime/latency is just an example, we will probably not end up using it.

I’m not in favor of hard cutoffs like “10 days for a maximum reward”. Arbitrary thresholds create unexpected arbitrage opportunities that might become a disservice to the community; better to just use a continuous function.

Continuous function without cutoff could be problematic in the context of packet routing. Routing is better to be deterministic (i.e. the neighbor with higher weight will always be selected) to minimize jitter and maintain packet order as much as possible. Imagine if the weight function does not have a cutoff, and there are two neighbors with similar latency, then the one with just a bit of more uptime will always get reward because it happens to join the network for a bit, maybe just a few hours, earlier.

Actually many continuous function will also have an effective timescale (often characterized as the time for the output to be half max), which can be adjusted by changing some parameters of the curve. Tweaking that effective timescale often yields the same results as tweaking the cutoff.

Fred · February 3, 2022, 3:07pm

If I understand your point, you’re saying that if 2 neighbors have equivalent latency, but there is no cutoff for uptime, then the node which has been online slightly longer will always win. This is bad because it means that the new guy who just set up a node will always lose unless he can somehow lower latency, which extremely difficult and/or expensive to do in practice, or wait until his nearest competitor goes down.

You’re not going to like my answer, but I think that the community would be better off if the value function were used as a probability weight instead of a guarantee of winning. Why is it wrong to have 80% of packets flowing through node X while 20% flow through node Y? They will just be reassembled at the receiving end. In theory, given enough compute power, you could allow every node in the world a nonzero change of delivering the traffic. But if the weight is some rapidly decaying function of latency, then in effect, you restrict yourself to only nearby nodes.

The risk with cutoffs is that various exploits might occur around the cutoff boundary. At best, this is a threat to economic efficiency. At worst, there might be some security threats involving arrival time uncertainty.

But in any case, deterministic routing behavior is great for performance. It’s also fragile. Networks need to be antifragile first and performant second. Besides, if you take a weighted approach, the node count will grow faster because new node operators won’t be stymied by a lack of rewards for such a long period if time. (This is the #1 problem of NKN because it’s directly tied to marketcap growth.)

What jitter are you worried about? Jitter really just means uncertainty in clock pulse (or packet, in this case) arrival time. That’s harmless in a sufficiently buffered context. The same applies to out-of-order arrival. We’re not talking about 10 ms arrival differences. More like 1 ms in the limit of geographically ubiquitous nodes. Yes, with only 100K nodes, there might be some 10 ms jitter events, but if you optimize for ecosystem growth first and performance second, then in fact performance will improve because lots of new node operators will join and remain active.

Also, why not send packets down duplicate routes? Packets cost like picodollars to deliver. Who cares if the cost of doing that more reliably is double the nonredundant cost? This would be antifragile and performant.

Performance is important but ecosystem growth is king. Billions have been made by companies producing junk software far worse than anything NKN has to offer, simply because of adoption and standardization.

At least, if you don’t take my advice, then I hope you will understand the ramifications of my warnings.