Unexpected Server Restarts

Our WSJT plugin maintains a one-hour buffer and reports statistics on its contents when asked by a Datalog. We see the buffer inexplicably empties about once a day.

Errors don't happen every day. Recent errors are around 23:00 utc each day. We dig into the server logs when this happens.

journalctl -u wiki | \ egrep '^Aug 26 23:' >aug-26-23-xx.txt

22: JavaScript heap out of memory 23: JavaScript heap out of memory 24: JavaScript heap out of memory 25: Cannot read property 'replace' of undefined 26: server unexpectedly exited, 27972 (SIGKILL)

pages/unexpected-server-restarts

We notice Datalog timer based requests time out before or as the server runs out of memory. github

sample timeout after 3100 msec

Unexpected restarts cease. enlarge

We correlate the improvement with suspension of second radio reporting decode datagrams. enlarge

Second source stops 9:54 pm, Monday, August 26, 2019.

Notice that restarts coincide with midday peak decodes per slot (blue line) of over 20. With one radio reporting (gray line) this peaks at around 10.

We're recording pings on the same server. We look to see if these are delayed at all when datagram traffic gets high midday. This doesn't seem to be the case. expand

Digital Ocean 2 GB droplet memory usage shows daily restarts at peak consumption followed by days of stability under lighter traffic load. enlarge

All other charts are nominal.

It's possible our datagram decoding logic is wasteful of memory. For each binary message we construct an object that contains five multi-byte decoding functions closed over the payload position counter. github

let dec = decoder(message) let magic = dec.four() let version = dec.four() let type = dec.four() switch (type) { case 2: let id = dec.str() let isnew = dec.one() let time = dec.four() let snr = dec.four() let dtime = dec.eight() let freq = dec.four() let mode = dec.str() let copy = dec.str() let conf = dec.one() let rept = `${remote.address} ${format(time)} ${freq} ${copy}` while(log.length >= 10000) log.shift() log.push(rept) break; }

See Big Trouble Pages that trigger errors

We've seen the log grow well beyond the 240 slots expected when the winnow() is called in a timely way. This appears to be a bug in node fixed in 10.9.0. github