diff --git a/doc/terminology.md b/doc/terminology.md index 6528271..fe2483b 100644 --- a/doc/terminology.md +++ b/doc/terminology.md @@ -1,7 +1,8 @@ # Terminology overview -This describes the terms involved in using `git-toprepo`, the tool, -to emulate a monorepo for a toprepo and its submodules. +This describes the terms involved in using the `git-toprepo` tool +to _expand_ the _submodules_ of a _toprepo_ into _git-toprepo emulated monorepo_. +This _combines_ the history of all the _repositories_. ## Terms @@ -9,215 +10,243 @@ to emulate a monorepo for a toprepo and its submodules. a _repository_. May be local or on a remote server. **git submodule**: A core `git` concept, -a _submodule_ is a _repository_ with a child-parent relation ship to another. +a _submodule_ is a _repository_ with a child-parent relationship to another _repository_. **regular submodule**: A core `git` concept, a regular _submodule_ that is entirely managed through `git-submodule` etc. -**filtered submodule**: A `git-toprepo` concept, -a _submodule_ that has been assimilated into one combined history in the filtered _monorepo_. +**expanded submodule**: A `git-toprepo` concept, +a _submodule_ that has been _expanded_ into the _combined_ history in the _toprepo_. **superrepo**: Emergent from core git concepts, the parent _repository_ to a _submodule_. It may be a _submodule_ to another _superrepo_. +**toprepo**: A regular _repository_ with special configuration and purpose. +It is meant to be used together with `git-toprepo` to _expand_ its _submodules_ +to a _git-toprepo emulated monorepo_. +This is generally configured by the organization +but the user may have her own configuration for personal preferences. + +A _toprepo_ can also be checked out with _regular submodules_: +`git-submodule init --recursive` +but it is not the preferred development workflow. + +There is generally only one such _repository_ +so it is often described in definite form: "the _toprepo_". + **git-toprepo**: The tool itself. -`git-toprepo` filters a _toprepo_ -and some of its _submodules_ -into a _monorepo_ (emulated). -Takes care to push filtered _submodules_ to their remote server. - -**toprepo**: A _repository_ with _submodules_. -This is the main development _repository_ for a developer. -the _toprepo_ is the root level _superrepo_ -in a potential hierarchy of multiple levels of _submodules_. - -It may either be checked out with **regular** `git-submodule init --recursive` -or with `git-toprepo` to create a _monorepo_. -If it is checked out with `git-toprepo` -some _soubmodules_ may not be filtered into the _monorepo_, -then those must be manipulated with `git-submodule` as in the first case. +`git-toprepo` _expands_ (a choice of) _submodules_ of a _toprepo_ +into a _git-toprepo emulated monorepo_, +the git histories are _combined_. +It takes care of pushing _expanded submodules_ to their respective remote server. **monorepo**: A _repository_ with all the code, it does not typically have _submodules_. This makes it easy to make changes across different components with a regular `git` workflow, -generally without _submodule_ bumps and binary deliveries/integration +generally without _submodule bumps_ and binary deliveries/integration of first party code. Gives unparalleled reproducibility and understanding of the full product. -Throughout `git-toprepo`'s code and documentation -_monorepo_ is often used to refer to an _emulated monorepo_, for conciseness. - **pure monorepo**: A commonly sought concept, such a _repository_ does not have _submodules_ at all. There is just one _repository_ on the remote `git` server. This realizes the full value of a _monorepo_, but has no clear _access control_. -**emulated monorepo**: A client side construct -that emulates a _monorepo_ for developer -but still tracks code as _submodules_ with their own remote git _repositories_. -This is created by `git-toprepo`. +**git-toprepo emulated monorepo**: A client-side construct, +that _emulates_ a _monorepo_ for a _toprepo_. +The developer sees a joint history of all _submodules_ and can create _mono commits_ +that span multiple _submodules_ and push/fetch them with `git-toprepo`. +The tool keeps track of the _assimilated submodules_ with their own remote git _repositories_. -As a performance optimization a _monorepo_ created by `git-toprepo` -may still have _submodules_ though, -if the user does not want to assimilate all _submodules_. +As a performance optimization, an _emulated monorepo_ created by `git-toprepo` +may still have _regular submodules_ though, +if the user does not want to _expand_ all _submodules_. **submodule access control**: One can easily apply access control to individual _submodules_ by restricting access to their git _repositories_. -Such access control is not possible for different directories in a _pure monorepo_. +Such access control is not possible for different directories in a _pure git monorepo_. **commit**: A core `git` concept. -**monocommit**: A `git-toprepo` concept, -a commit in the _emulated monorepo_ for the _toprepo_. -May consist of multiple _commits_ in multiple _filtered submodules_. +**combined commit**: A `git-toprepo` concept, +a commit in the _git-toprepo emulated monorepo_. `git-toprepo` shines when a developer wants to make one change across two _submodules_ -and can track that as one _supercommit_ --- one _commit_ in the _emulated monorepo_ that consists of one _commit_ in each of the two _submodules_. +and can track that as one _combined commit_, +i.e. one _commit_ in the _emulated monorepo_ that consists of one _commit_ in each of the two _assimilated submodules_. Those are meant to be merged together -through compatible CI systems that allow _shared gating_ between _repositories_. +through compatible CI systems that allow _shared gating_ between the constituent _repositories_. -**shared gating**: A CI system concept. -CI systems like `Gerrit` allows an organization to merge code to multiple _repositories_ -atomically if all tests passes. -This allows us to emaulate a _monorepo_ and have a shared gate. -`Gerrit` uses [superproject subscription] for this +**submodule bump**: A core `git` concept, +a change in the _super repository_ +of which _commit id_ is wanted for a specific _submodule_ path. +**shared gating**: A CI system concept. +CI systems like [`Zuul CI`] allows an organization to merge code to multiple _repositories_ +if all tests passes, atomically if the git server supports it. +By bumping the submodules accordingly, +e.g. by using [superproject subscription] in `Gerrit`, +the history of the constituent _repositories_ +can be _recombined_ to the same _combined history_ graph that was pushed. + +[`Zuul CI`]: https://zuulci.org/ [superproject subscription]: https://gerrit-review.googlesource.com/Documentation/user-submodules.html ### Verbs -**filter**: `git-toprepo` filters the history of one _toprepo_ and its _regular submodules_ -into an _emulated monorepo_ with a combined history for all the _toprepo_ itself and its _filtered submodules_. +**combine**: `git-toprepo` _combines_ the history of one _toprepo_ and (a choice of) its _submodules_ +into an _emulated monorepo_ with a _combined_ history for code in the _toprepo_ itself and its _expanded submodules_. -**combined**: `git-toprepo` has _combined_ the history into an _emulated monorepo_ with combined history. +**expand**: The content of the _submodules_ is expanded into an _emulated monorepo_. -**manage**: `git-toprepo` manages a git _toprepo_ and has _expanded_ the history into an _emulated monorepo_. +**integrate**: A _submodule_ is integrated into the _git-toprepo emulated monorepo_ +when the history is _combined_ and the content is (optionally) _expanded_. ### Technical details For power users and _repository_ maintainers there are a few overlapping concepts. -**toprepo**: The `git-config` namespace for select `git-toprepo` settings that are configured through `git`. +**git-config**: The `toprepo` namespace is used for the `git-toprepo` settings +that are configured through `git`. + +**git toprepo**: Git looks for external executables to run subcommands. +Calling `git toprepo` makes `git` execute `git-toprepo`. -**toprepo**: The `git` subcommand that runs `git-toprepo`. -`git` runs external subcommands like `git-` as `git ` -to make it easy to create custom tools for `git`. +### Technical terms in the code + +**top commit**: Commits in the _toprepo_, +the remote _repository_ that has been cloned. +These are fetched using `git-toprepo fetch` (or `git fetch`) and +formed when pushing new work with `git-toprepo push`, +if changes were made to the underlying _toprepo_. + +**mono repo**: In the code, "_monorepo_" is used as short-hand notation instead of +"_git-toprepo emulated monorepo_" or "_combined repo_". As the code has no use in +a "_pure monorepo_" context, the brevity is placed over preciseness of the +term within the code. + +**mono commit**: In the code, "_mono commit_" is used as short-hand notation +instead of "_git-toprepo emulated monorepo commit_" or "_combined commit_", for +symmetry reasons. As the code has no use in a "_pure monorepo_" context, the +brevity is placed over preciseness of the term within the code. ## Examples -### Initialization: The toprepo may be a monorepo +### Initialization: Expand the toprepo into an emulated monorepo -The configuration of a _monorepo_ is often managed in the _toprepo_ and is already checked in. +The _toprepo_ can be initialized to a _git-toprepo emulated monorepo_ +with `git-toprepo`. +The configuration for `git-toprepo` +is often managed in the _toprepo_ itself and is already checked in. -Short-form initialization of a _monorepo_. +Short-form initialization of a _git-toprepo emulated monorepo_. ``` -$ monorepo $ git toprepo clone ssh://gerrit.example/toprepo.git monorepo -$ cd monorepo -monorepo $ # This is a monorepo. +$ git toprepo clone ssh://gerrit.example/toprepo.git emulated-monorepo +$ cd emulated-monorepo +emulated-monorepo $ # This is a git-toprepo emulated monorepo. ``` - - - - - - - - - -However, the code can also be checked out with regular git _submodules_. +However, the code can also be checked out with regular git _submodules_ +to create the same directory structure. ``` $ git clone ssh://gerrit.example/toprepo.git $ cd toprepo toprepo $ git submodule init --recursive -toprepo $ # This is not a monorepo +toprepo $ # This is not a git-toprepo emulated monorepo. ``` -### Initialization: Some submodules are not filtered in +### Initialization: Some submodules are not expanded -Now imagine that the _toprepo_ has one _submodule_ with a long and weird history, +Imagine that the _toprepo_ has one _submodule_ with a long and weird history, it may be binary data that takes a lot of space and is not relevant to the developer. -Then it is often **not filtered** into the _emulated monorepo_. +Then it might be preferred to not _expanding_ it into the _combined repo_. -_monorepo_: +_git-toprepo emulated monorepo_: ``` -$ monorepo $ git toprepo clone ssh://gerrit.example/toprepo.git monorepo -$ cd monorepo -monorepo $ # This is a monorepo. -monorepo $ git submodule status +$ git toprepo clone ssh://gerrit.example/toprepo.git emulated-monorepo +$ cd emulated-monorepo +emulated-monorepo $ # This is an emulated monorepo. +emulated-monorepo $ git submodule status -4e04771fcf658500987d0be5a9a63f8e77d5e386 binary_data_module ``` -regular _toprepo_: +Regular _repository_: ``` $ git clone ssh://gerrit.example/toprepo.git $ cd toprepo +toprepo $ git submodule init --recursive +toprepo $ # This is not an emulated monorepo. toprepo $ git submodule status -4e04771fcf658500987d0be5a9a63f8e77d5e386 binary_data_module -661c1b2d568693e3b6b631ae66f6872b194674f1 source_code_module ``` -### Pushing: git-toprepo pushes filtered submodules to their servers +### Pushing: git-toprepo pushes combined repositories to their respective servers `git-toprepo` shines when a developer wants to make one change across two _submodules_ -in one _supercommit_. +in one _top commit_. ``` -monorepo $ # modify one/file and two/file -monorepo $ git add one/file two/file; git commit -monorepo $ git-toprepo push HEAD:refs/for/main +emulated-monorepo $ # modify one/file and two/file +emulated-monorepo $ git add one/file two/file +emulated-monorepo $ git commit +emulated-monorepo $ git-toprepo push HEAD:refs/heads/main ``` -This pushes the two paths inside the _monorepo_ to their constituent -_repositories_ on the git server (gerrit.example/one.git and gerrit.example/two.git). +This pushes the two paths inside the _emulated monorepo_ to their constituent +_repositories_ on the git server (`gerrit.example/one.git` and `gerrit.example/two.git`). -The regular workflow with submodules, however, is more involved +The regular workflow with _submodules_, however, is more involved ``` toprepo $ # modify one/file and two/file -toprepo $ git -C one add file; git commit -toprepo $ git -C two add file; git commit -toprepo $ git -C one push HEAD:refs/for/main -toprepo $ git -C two push HEAD:refs/for/main -# As you use Gerrit's superproject subscription, you would not need a toprepo commit: -# toprepo $ git add one two; git commit -# toprepo $ git push HEAD:refs/for/main +toprepo $ git -C one add file +toprepo $ git -C one commit +toprepo $ git -C one push HEAD:refs/heads/main +toprepo $ git -C two add file +toprepo $ git -C two commit +toprepo $ git -C two push HEAD:refs/heads/main ``` -First the two _submodules_ are handled separately -then the _toprepo_ must also bump its _submodule_ pointers to the new commits within them. +In both cases, the submodule pointers in the branch `main` in the _toprepo_ +need to be updated to point at the latest commits in the submodules. +This can be done using e.g. Gerrit's superproject subscription or manually. + +``` +toprepo $ git add one two +toprepo $ git commit +toprepo $ git push HEAD:refs/heads/main +``` > [!NOTE] -> Though committing inside _regular submodules_ in a _monorepo_ is rare. -> If a _submodule_'s history is not relevant to _filter_ into the combined history +> Though committing inside _regular submodules_ in a _git-toprepo emulated monorepo_ is rare, +> if a _submodule_'s history is not relevant in the _combined_ history > it is unlikely that developers need to modify the code and make changes. -### Rebasing: git-toprepo gives a shared history that is easy to work with +### Rebasing: git-toprepo gives a combined history that is easy to work with -With `git-toprepo`, rebasing _commits_ in any of the _filtered submodules_ +With `git-toprepo`, rebasing _commits_ in any of the _expanded submodules_ is as easy as working in a single _repository_. ``` -monorepo $ git-toprepo fetch origin -monorepo $ git rebase -i origin/main +emulated-monorepo $ git-toprepo fetch origin +emulated-monorepo $ git rebase -i origin/main ``` -However when using _regular submodules_ in an _unmanaged_ _toprepo_ +However when using _regular submodules_ in an _repository_ one needs to automate the workflow within individual _submodules_. ``` toprepo $ git fetch origin toprepo $ git rebase -i origin/main -toprepo $ submod_commit_hash=$(git ls-files --stage -- one | cut -d' ' -f2) -toprepo $ git -C one rebase -i "$submod_commit_hash" -toprepo $ submod_commit_hash=$(git ls-files --stage -- two | cut -d' ' -f2) -toprepo $ git -C two rebase -i "$submod_commit_hash" +toprepo $ git submodule foreach 'git -C "$sm_path" rebase -i origin/main' +toprepo $ # On error, run 'git -C rebase --continue' +toprepo $ # followed by the same git-submodule-foreach command again. ``` In the example, two _submodules_ does not look too bad at the face of it, @@ -225,13 +254,49 @@ but note that the rebasing is not synchronized between the _submodules_. Therefore, building and testing the code after resolving a merge conflict, which may have only occurred in one _submodule_, is not trivial. -### Pushing: Push all submodules of an emulated monorepo +### Pushing: Push all submodules of a toprepo -As an _emulated monorepo_ may not have _expanded_ all _submodules_ into the combined history +As a _git-toprepo emulated monorepo_ may not have _combined_ all _submodules_ into the history some _submodules_ are left as _regular submodules_. -So to always push changes to all _submodules_ the following invocation is needed: +To always push changes to all _submodules_ the following invocation is needed: ``` -monorepo $ git-toprepo push HEAD:refs/for/main -monorepo $ git submodule for each push HEAD:refs/for/main +emulated-monorepo $ git-toprepo push HEAD:refs/heads/main +emulated-monorepo $ git submodule foreach git push origin HEAD:refs/heads/main ``` + +> [!NOTE] +> Recall that committing inside _regular submodules_ in a _git-toprepo emulated monorepo_ is rare. + +## History combination algorithm + +This briefly outlines the algorithm +that creates the _combined history_ of the _git-toprepo emulated monorepo_, +to further contextualize the pieces and their relationships. + +### Fetch a toprepo commit and create a mono commit + +`git-toprepo fetch` first fetches the _regular commits_ for the _toprepo_ itself +using (approximately) `git fetch origin +refs/heads/*:refs/namespaces/top/refs/remotes/origin/*`. + +The next phase is the load phase where for each submodule: + +1. All _top commits_ reachable from `refs/namespaces/top/refs/remotes/*` +are loaded to look for _submodules_ and what _commit ids_ are referenced. +1. All _regular commits_ reachable from `refs/namespaces//*` are loaded. +1. If any of the _commit ids_ requested by the _super repository_ was not found, +they are fetched using `git fetch +refs/heads/*:refs/namespaces//refs/remotes/origin/*`. +1. All _regular commits_ reachable from `refs/namespaces//*` are +checked for inner _submodules_ and what _commit ids_ are referenced. +1. Step 2 then follows recursively. + +When all reachable commits have been loaded, the _submodules_ within the _toprepo_ +are _expanded_ and the history _combined_. + +1. Iterate through all _top commits_ reachable from `refs/namespaces/top/refs/remotes/*` +and start processing from the initial orphan _commits_. +1. For each _regular commit_, look for _submodule bumps_ or changes in `.gitmodules`. +1. _Expand_ each _submodule bump_ by replacing the _submodule_ git-link, +that points out the _commit id_, with the corresponding tree content. +1. Transfer of parents of each _submodule commits_ into the _combined commit_, +by checking which _combined commits_ the parents were _expanded_ in.