-
Notifications
You must be signed in to change notification settings - Fork 218
refactor of channel persistence to use UUIDs #2028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| return nil, errInsufficientPrivs, false | ||
| } | ||
| // enforce confusables | ||
| if !registered && (cm.chansSkeletons.Has(skeleton) || cm.registeredSkeletons.Has(skeleton)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this removes the exception for registered channels which are confusable with another channel. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that registered channels are always loaded now, and therefore always present in chans and chansSkeletons now (even when they are purged), so there's no need to treat them differently than other channels.
| return nil | ||
| // TODO we need a better story about error handling for later | ||
| if err = cm.server.dstore.Set(datastore.TableChannelPurges, record.UUID, purgeBytes, time.Time{}); err != nil { | ||
| cm.server.logger.Error("datastore", "couldn't store purge record", chname, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this at least return an error so that the oper knows something is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current story about this is not ideal. I haven't fully decided about what to do here, but I think in general, datastore failures will not necessarily cause the underlying operation to fail in full. For example, in the case of CS PURGE ADD, the purge actually gets added to the in-memory datastructure no matter what and will be enforced as long as the ircd is running.
I think long-term (once I actually introduce a datastore where writes can fail for non-catastrophic reasons), the strategy will be:
- Have a (bounded) queue for asynchronously retrying sets and deletes
- Have an option for alerting the operator to failed datastore operations (a snomask?)
| return errNoSuchChannel | ||
| } | ||
| if err := cm.server.dstore.Delete(datastore.TableChannelPurges, record.UUID); err != nil { | ||
| cm.server.logger.Error("datastore", "couldn't delete purge record", chname, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
|
|
||
| for cfname, entry := range cm.chans { | ||
| if entry.channel.Founder() == account { | ||
| channels = append(channels, cfname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any performance concern about this function being O(n)? It's called from user-facing functions like checkChanLimit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt a bit conflicted about this. It's no worse than LIST is currently. On the other hand, LIST is commonly special-cased by fakelag systems for being more expensive than other commands (although possibly for bandwidth reasons, not CPU utilization reasons)?
I think it's probably OK to leave this unoptimized for now. We could put a rate limit on the relevant chanserv operations (and LIST?) if it becomes a problem.
| } | ||
| return nil | ||
| // TODO we need a better story about error handling for later | ||
| if err = cm.server.dstore.Set(datastore.TableChannelPurges, record.UUID, purgeBytes, time.Time{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it seems this is following the model of updating the in-memory state first, then the database. Couldn't this lead to race conditions that result in the in-memory state being different from the database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to follow the same pattern established with MarkDirty:
Line 191 in 1e6dee1
| // MarkDirty marks part (or all) of a channel's data as needing to be written back |
Each update should be idempotent (it should always persist the latest data corresponding to its key, in full). So the only possible race condition is if two writes to the datastore are reordered relative to each other. This is prevented through the use of a semaphore, e.g. (*Channel).writebackLock, which ensures a linear sequence of operations [copy state, write to datastore, copy state, write to datastore ...]
One thing I'm still not totally sure about is whether there would still be a race condition under relaxed consistency modes in Cassandra (e.g. QUORUM or LOCAL_QUORUM). Under those conditions, could an earlier write could "win" the eventual consistency despite having happened-before a later write in Go?
In any case, purges don't need a semaphore of their own because they can't be updated, only deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like in Cassandra, this is guaranteed under normal conditions by the linearity of the local commitlog:
https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html
but under hardware failure or extended partition, there may still be data loss:
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsRepairNodesTOC.html
This is pretty complicated and touches a lot of things, so I'm interested in reviews, maybe from @ajaspers or @progval ?
The core idea here is to refactor persistence to eventually support datastores other than buntdb. There are two problems:
markDirtystuff, but there are some cases where we would still block, particularly for accounts where every login currently incurs a read from the datastore)The new approach is best illustrated by the new, weak datastore API:
https://github.com/slingamn/ergo/blob/7ce06362764ee35629521eacc1fdee5405370efd/irc/datastore/datastore.go
which exposes key-value pairs. Each key has a UUID and is associated with a "table". There are four operations:
This branch refactors channels and channel purge records to use the new API.