Nodes keep restarting even after latest update to 1.0.6

I’m constantly keeping an eye on my nodes after realizing that they were restarting quite often. After checking in syslog, I found following:

[22991] 2019/08/06 11:11:11.511375 #033[0;33m[WARN ]#033[m GID 77, Error handling msg: reveive vote at 139298 for 0000000000000000000000000000000000000000000000000000000000000000 error: Election has already stopped
[22991] 2019/08/06 11:11:11.562320 #033[0;32m[INFO ]#033[m GID 229, Change expected block height to 139300
[22991] fatal error: runtime: out of memory
[22991] runtime stack:
[22991] runtime.throw(0xca0a6b, 0x16)
[22991] #011/usr/local/go/src/runtime/panic.go:617 +0x72
[22991] runtime.sysMap(0xc030000000, 0x4000000, 0x1422b58)
[22991] #011/usr/local/go/src/runtime/mem_linux.go:170 +0xc7

I need more information to debug this problem, kindly suggest further.

I’m seeing these two error frequently

  1. Out of memory
  2. Concurrent read write on map

Aug 6 13:31:55 – nknd[27501]: 2019/08/06 13:31:55.587982 #033[0;32m[INFO ]#033[m GID 236, Receive block info 5ba4cefb331487fd3a231facaeaf334232b9a33528272778aba98583c75f5278, 145 txn found in pool, 111 txn to request
Aug 6 13:31:55 – nknd[27501]: 2019/08/06 13:31:55.639029 #033[0;32m[INFO ]#033[m GID 236, Receive block proposal 5ba4cefb331487fd3a231facaeaf334232b9a33528272778aba98583c75f5278 (256 txn, 390199 bytes) by a5dbe02a766b59b2e6710c85d5f17675d17353c15c46b257132374851e9b6fa1
Aug 6 13:31:56 – nknd[27501]: fatal error: concurrent map read and map write
Aug 6 13:31:56 – nknd[27501]: goroutine 74 [running]:
Aug 6 13:31:56 – nknd[27501]: runtime.throw(0xcaa721, 0x21)
Aug 6 13:31:56 – nknd[27501]: #011/usr/local/go/src/runtime/panic.go:617 +0x72 fp=0xc00f4418d8 sp=0xc00f4418a8 pc=0x42da52

Once vast majority of nodes upgrade to v 1.0.6, then the system will return to normal. Currently there are still nodes on older v 1.0.5.

1 Like

Can you also post your node spec (e.g. RAM)?

I’m using standard droplet settings from digitalocean
1 GB Memory / 25 GB Disk / BLR1 - Debian 9.7 x64
Let me know if you need more information and how can i collect it.

I could only check syslog after noticing any node’s reboot, if there’s a way I can check the problem from nknc debug information, that’d be helpful to you too I guess

Thanks.

Update

I haven’t observed the issue in last 5-6 hours. I think it was as @zbruceli said, once majority of nodes switched to 1.0.6, network seem to be stabilized compared to before.

Looks like the problem came up again. Half of my nodes restarted… Can we check some logs and debug this?

Yes, our devs are testing a fix. To be released as v1.0.7.

1 Like

Thank you.
Just updated all my nodes to 1.0.7
I’ll keep an eye on the performance and update here incase I see that restart problem again.

Out of memory issue in 1.0.8
Aug 15 14:25:16 nknd[26234]: fatal error: runtime: out of memory
Aug 15 14:25:16 nknd[26234]: runtime stack:
Aug 15 14:25:16 nknd[26234]: runtime.throw(0xca2cad, 0x16)
Aug 15 14:25:21 systemd[1]: nkn.service: Service hold-off time over, scheduling restart.
Aug 15 14:25:21 systemd[1]: Stopped nkn.
Aug 15 14:25:21 systemd[1]: Started nkn.

Hi, the developer team is aware of the issue and looking into it.

1 Like

You can try to set TxPoolMaxMemorySize to lower value (e.g. 4) in config.json and see whether it resolves the problem