-
Notifications
You must be signed in to change notification settings - Fork 38
Multi-database support #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
billschereriii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start on the problem of catching duplicate database identifiers. The names you have used for variables and methods are making the code a bit hard to follow though.
In broad strokes, there are some areas that haven't been touched on yet in the PR. I assume you're intending to address these:
- There are several placeholders where you are clearly intending to make changes but haven't yet.
- I don't see the changes to the launcher code to invoke the new database ID names
- Changelog isn't updated
ashao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start to this PR!
General comments:
- Try to avoid adding properties to objects unless they absolutely need them. This restricts the flow of information and can help avoid writing to much spaghetti code down the line
- Most functions actually do something, so a good naming convention for them is to start with a verb and a short description of what it does.
- Python uses the soft convention of indicating that a function or property is private with an underscore, e.g.
_foo - While it is in general good to keep functions short, having functions that are one-liners are generally discouraged.
| :rtype: Orchestrator or derived class | ||
| """ | ||
|
|
||
| self.append_to_db_identifier_list(db_identifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the id list feels like it duplicates information found elsewhere (e.g. the list of orchestrators) and gives us two sources of truth that can fall out of sync. Consider not adding this additional collection.
Codecov Report
@@ Coverage Diff @@
## develop #342 +/- ##
===========================================
+ Coverage 89.52% 89.77% +0.24%
===========================================
Files 58 59 +1
Lines 3571 3609 +38
===========================================
+ Hits 3197 3240 +43
+ Misses 374 369 -5
|
al-rigazzi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! Some cleanup may be needed - I'll leave it up to your judgment, but otherwise looks good.
.github/workflows/run_tests.yml
Outdated
| run: | | ||
| python -m pip install git+https://github.com/CrayLabs/SmartRedis.git@develop#egg=smartredis | ||
| python -m pip install git+https://github.com/billschereriii/smartredis.git@multidb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this has to be reverted to CrayLabs before we merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call! Thanks - might be the last thing I do to make sure the CI passes
smartsim/_core/_install/buildenv.py
Outdated
| # Versions | ||
| SMARTSIM = Version_(get_env("SMARTSIM_VERSION", "0.5.1")) | ||
| SMARTREDIS = Version_(get_env("SMARTREDIS_VERSION", "0.4.2")) | ||
| SMARTREDIS = Version_(get_env("SMARTREDIS_VERSION", "0.4.1")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will have to be modified once we can use the CrayLabs branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo before final merge
smartsim/_core/control/controller.py
Outdated
|
|
||
| # Retrieve num_shards to append to client env | ||
| client_env[f"SR_DB_TYPE{db_name}"] = ( | ||
| "Clustered" if len(addresses) > 1 else "Standalone" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I think we will use "Clustered" and "Standalone" in other classes, I suggest defining them as enumerators (like we do for Status in other classes). That should reduce the chances of misspelling them, and make it easier to read them as constants.
smartsim/error/errors.py
Outdated
| class SSReservedKeywordError(SmartSimError): | ||
| """Raised when a Reserved Keyword is used incorrectly""" | ||
|
|
||
| class DBIDConflictError(SmartSimError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we prefix this with SS? Just to make it clear it is a SmartSim-generated error.
smartsim/_core/control/controller.py
Outdated
| steps.append((batch_step, elist)) | ||
| else: | ||
| # if ensemble is to be run as separate job steps, aka not in a batch | ||
| # if ensemble is to be run as separate job steps, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is slightly "redundant", as it simply explains what .batch means. Maybe it is left over from debugging?
smartsim/entity/model.py
Outdated
| db_cpus: int = 1, | ||
| custom_pinning: t.Optional[t.Iterable[t.Union[int, t.Iterable[int]]]] = None, | ||
| debug: bool = False, | ||
| db_identifier: t.Optional[str] = "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, but @MattToast can correct me, that if we define the default value as "", the variable type is str and not Optional[str]. But I may be wrong.
smartsim/entity/model.py
Outdated
| db_cpus: int = 1, | ||
| custom_pinning: t.Optional[t.Iterable[t.Union[int, t.Iterable[int]]]] = None, | ||
| debug: bool = False, | ||
| db_identifier: t.Optional[str] = "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
smartsim/experiment.py
Outdated
| if db_identifier in self.db_identifiers: | ||
| logger.warning( | ||
| f"A database with the identifier {db_identifier} has already been made" | ||
| "An error will be raised if multiple databases are started" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing white space at the end of strings, when concatenated the result will not be the desired one (i.e. "... madeAn ... startedwith").
ashao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some very minor changes requested. Thanks for doing such a careful job
smartsim/_core/control/controller.py
Outdated
| " name for db_identifier" | ||
| ) | ||
|
|
||
| db_name_colo = unpack_colo_db_identfifier(db_name_colo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misspelling: identfifier -> identifier
smartsim/_core/control/controller.py
Outdated
| raise SSInternalError( | ||
| "Colocated database was not configured for either TCP or UDS" | ||
| ) | ||
| client_env[f"SR_DB_TYPE{db_name_colo}"] = "Standalone" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the STANDALONE enum from servertypes
smartsim/_core/control/jobmanager.py
Outdated
|
|
||
| self.kill_on_interrupt = True # flag for killing jobs on SIGINT | ||
|
|
||
| self.active_db_identifiers: t.Set[str] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this, no longer used
smartsim/_core/control/jobmanager.py
Outdated
| :param job: job instance we are transitioning | ||
| :type job: Job | ||
| """ | ||
| # remove db id from active entity list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this comments
|
|
||
| def _gen_orc_dir(self, orchestrator: t.Optional[Orchestrator]) -> None: | ||
| def _gen_orc_dir(self, orchestrator_list: t.List[Orchestrator]) -> None: | ||
| # orchestrator: t.Optional[Orchestrator] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comment
smartsim/experiment.py
Outdated
| self._control = Controller(launcher=launcher) | ||
| self._launcher = launcher.lower() | ||
| self.db_identifiers: t.Set[str] = set() | ||
| self.db_dict: t.Dict[str, t.Any] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db_dict is no longer used, remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this relies on any of the ML backends. Move this file one directory up. If any of the tests do use an ML backend keep them here.
tests/backends/test_multidb.py
Outdated
| assert all([stat == status.STATUS_CANCELLED for stat in statuses]) | ||
|
|
||
|
|
||
| # JPNOTE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comments
smartsim/_core/utils/helpers.py
Outdated
| def unpack_db_identifier(db_id: str, token: str) -> t.Tuple[str, str]: | ||
| """Unpack the unformatted database identifier using the token, | ||
| and format for env variable suffix | ||
| :db_id: the unformatted database identifier eg. charizard_0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll play the role of crotchety, no-fun old dev along with @al-rigazzi here...but we should probably rename this :(
al-rigazzi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending conflicts to be resolved. Thanks for this big PR!
|
Looks great! Thanks for sticking with this one, it turned out to be a real beast! |
db_identifier uniqueness testing for multi database support