MainNet Postmortem (2020-08-26)

MainNet block producing stopped for about a day yesterday. On-chain services (mainly subscribe) was blocked. Off-chain data transmission (relay, tuna, etc) was NOT affected.

Based on our investigation, it was triggered by a bug that consensus timer can be triggered without proposal and cause nodes to become not synchronized. This bug was fixed in the latest version v2.0.4 released today.

Beyond that, we are also planning to have a consensus upgrade in September that uses random topology instead of DHT topology for consensus voting, as we have discussed in our white paper. This upgrade will make consensus to converge faster and more robust, at the cost of increasing the number of connections each node needs to maintain (but should barely affect communication cost or actual bandwidth usage).