Our WSJT plugin maintains a one-hour buffer and reports statistics on its contents when asked by a Datalog. We see the buffer inexplicably empties about once a day.
See Heap Datalog
Errors don't happen every day. Recent errors are around 23:00 utc each day. We dig into the server logs when this happens.
journalctl -u wiki | \ egrep '^Aug 26 23:' >aug-26-23-xx.txt
22: JavaScript heap out of memory 23: JavaScript heap out of memory 24: JavaScript heap out of memory 25: Cannot read property 'replace' of undefined 26: server unexpectedly exited, 27972 (SIGKILL)
pages/unexpected-server-restarts
We notice Datalog timer based requests time out before or as the server runs out of memory. github
sample timeout after 3100 msec
Unexpected restarts cease. enlarge
We correlate the improvement with suspension of second radio reporting decode datagrams. enlarge
Second source stops 9:54 pm, Monday, August 26, 2019.
Notice that restarts coincide with midday peak decodes per slot (blue line) of over 20. With one radio reporting (gray line) this peaks at around 10.
We're recording pings on the same server. We look to see if these are delayed at all when datagram traffic gets high midday. This doesn't seem to be the case. expand
Digital Ocean 2 GB droplet memory usage shows daily restarts at peak consumption followed by days of stability under lighter traffic load. enlarge
All other charts are nominal.
It's possible our datagram decoding logic is wasteful of memory. For each binary message we construct an object that contains five multi-byte decoding functions closed over the payload position counter. github
let dec = decoder(message) let magic = dec.four() let version = dec.four() let type = dec.four() switch (type) { case 2: let id = dec.str() let isnew = dec.one() let time = dec.four() let snr = dec.four() let dtime = dec.eight() let freq = dec.four() let mode = dec.str() let copy = dec.str() let conf = dec.one() let rept = `${remote.address} ${format(time)} ${freq} ${copy}` while(log.length >= 10000) log.shift() log.push(rept) break; }
See Big Trouble Pages that trigger errors
We've seen the log grow well beyond the 240 slots expected when the winnow() is called in a timely way. This appears to be a bug in node fixed in 10.9.0. github