Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@slingamn
Copy link
Member

@slingamn slingamn commented Jan 4, 2023

This is pretty complicated and touches a lot of things, so I'm interested in reviews, maybe from @ajaspers or @progval ?

The core idea here is to refactor persistence to eventually support datastores other than buntdb. There are two problems:

  1. We don't want typical chat operations to block on the datastore (this is mostly implemented already, with the asynchronous persistence / markDirty stuff, but there are some cases where we would still block, particularly for accounts where every login currently incurs a read from the datastore)
  2. We don't want to require nontrivial consistency guarantees from the datastore

The new approach is best illustrated by the new, weak datastore API:

https://github.com/slingamn/ergo/blob/7ce06362764ee35629521eacc1fdee5405370efd/irc/datastore/datastore.go

which exposes key-value pairs. Each key has a UUID and is associated with a "table". There are four operations:

  1. Read everything from a table. This is used at ircd startup to read all persisted data. The source of truth then becomes the in-memory datastructures, with asynchronous persistence back to the datastore
  2. Set a key, with an optional TTL that will be respected by the datastore
  3. Delete a key
  4. Read a single key (this is used for some edge cases, like schema changes)

This branch refactors channels and channel purge records to use the new API.

@slingamn slingamn added this to the v2.12.0 milestone Jan 4, 2023
return nil, errInsufficientPrivs, false
}
// enforce confusables
if !registered && (cm.chansSkeletons.Has(skeleton) || cm.registeredSkeletons.Has(skeleton)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this removes the exception for registered channels which are confusable with another channel. Is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that registered channels are always loaded now, and therefore always present in chans and chansSkeletons now (even when they are purged), so there's no need to treat them differently than other channels.

return nil
// TODO we need a better story about error handling for later
if err = cm.server.dstore.Set(datastore.TableChannelPurges, record.UUID, purgeBytes, time.Time{}); err != nil {
cm.server.logger.Error("datastore", "couldn't store purge record", chname, err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this at least return an error so that the oper knows something is wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current story about this is not ideal. I haven't fully decided about what to do here, but I think in general, datastore failures will not necessarily cause the underlying operation to fail in full. For example, in the case of CS PURGE ADD, the purge actually gets added to the in-memory datastructure no matter what and will be enforced as long as the ircd is running.

I think long-term (once I actually introduce a datastore where writes can fail for non-catastrophic reasons), the strategy will be:

  1. Have a (bounded) queue for asynchronously retrying sets and deletes
  2. Have an option for alerting the operator to failed datastore operations (a snomask?)

return errNoSuchChannel
}
if err := cm.server.dstore.Delete(datastore.TableChannelPurges, record.UUID); err != nil {
cm.server.logger.Error("datastore", "couldn't delete purge record", chname, err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.


for cfname, entry := range cm.chans {
if entry.channel.Founder() == account {
channels = append(channels, cfname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any performance concern about this function being O(n)? It's called from user-facing functions like checkChanLimit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt a bit conflicted about this. It's no worse than LIST is currently. On the other hand, LIST is commonly special-cased by fakelag systems for being more expensive than other commands (although possibly for bandwidth reasons, not CPU utilization reasons)?

I think it's probably OK to leave this unoptimized for now. We could put a rate limit on the relevant chanserv operations (and LIST?) if it becomes a problem.

}
return nil
// TODO we need a better story about error handling for later
if err = cm.server.dstore.Set(datastore.TableChannelPurges, record.UUID, purgeBytes, time.Time{}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it seems this is following the model of updating the in-memory state first, then the database. Couldn't this lead to race conditions that result in the in-memory state being different from the database?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to follow the same pattern established with MarkDirty:

// MarkDirty marks part (or all) of a channel's data as needing to be written back

Each update should be idempotent (it should always persist the latest data corresponding to its key, in full). So the only possible race condition is if two writes to the datastore are reordered relative to each other. This is prevented through the use of a semaphore, e.g. (*Channel).writebackLock, which ensures a linear sequence of operations [copy state, write to datastore, copy state, write to datastore ...]

One thing I'm still not totally sure about is whether there would still be a race condition under relaxed consistency modes in Cassandra (e.g. QUORUM or LOCAL_QUORUM). Under those conditions, could an earlier write could "win" the eventual consistency despite having happened-before a later write in Go?

In any case, purges don't need a semaphore of their own because they can't be updated, only deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like in Cassandra, this is guaranteed under normal conditions by the linearity of the local commitlog:

https://cassandra.apache.org/doc/latest/cassandra/architecture/storage_engine.html

but under hardware failure or extended partition, there may still be data loss:

https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsRepairNodesTOC.html

@slingamn slingamn merged commit 4317016 into ergochat:master Jan 15, 2023
@slingamn slingamn deleted the channels_taketwo.1 branch January 28, 2025 07:27
@slingamn slingamn mentioned this pull request Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants