-
Notifications
You must be signed in to change notification settings - Fork 180
Description
I've been chasing strange turnilo behaviour which occurs with druid timeouts and is clear that time-monitor is broken and even can cause druid overload.
doChecks method https://github.com/allegro/turnilo/blob/master/src/common/utils/time-monitor/time-monitor.ts#L71 has simple boolean doingChecks guard. The problem is it is always false due to Promise.all behaviour. Promise.all expect iterable of promises.
Unfortunately checkTags is a Map which is iterable of tuples that are instant-resolved promises :(. That causes to push new requests every second - whenever doChecks is called.
The simple fix is to add .values() at the end of line
const checkTasks = timeTags.filter(this.isStale).map(this.doCheck).values();
It would be nice to implement some kind of back-off in case of error. Now if max time request fail it would be repeated next time doChecks is called because corresponding TimeTag is not updated. Maybe we should update it with previous value on error?