-
Notifications
You must be signed in to change notification settings - Fork 129
feat: Admin service support for managed github repo APIs #7213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
admin/database/database.go
Outdated
@@ -308,6 +308,13 @@ type DB interface { | |||
InsertProvisionerResource(ctx context.Context, opts *InsertProvisionerResourceOptions) (*ProvisionerResource, error) | |||
UpdateProvisionerResource(ctx context.Context, id string, opts *UpdateProvisionerResourceOptions) (*ProvisionerResource, error) | |||
DeleteProvisionerResource(ctx context.Context, id string) error | |||
|
|||
FindManagedGithubRepoMeta(ctx context.Context, htmlURL string) (*ManagedGithubRepoMeta, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment will apply throughout the PR – we should call it "managed Git", not "managed Github".
Github is the provider, which we should be able to swap with another provider in the future by implementing a new integration, but without renaming APIs, database tables, etc..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned it in another reply but the problem is that a lot of code is already strongly coupled with Github and we don't know what common ground exists between Github and Gitlab and other providers so decoupling it completely is probably not as straightforward.
Hence as an extension it just felt intuitive to mark it Github
and not Git
but I am fine with removing Github
from managed and made changes in the PR.
admin/database/database.go
Outdated
OrgID string `db:"org_id"` | ||
ProjectID *string `db:"project_id"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing org_id
and project_id
feels off, since one can be looked up from the other. Should the relationship to project_id
be stored the reverse way? I.e. foreign key from the projects
table, not the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept org_id
to easily fetch repo for an org(normalization vs simplcity). But I can remove it as well. Just need an extra join.
Yeah I think storing the relationship other way is also good. That way we no longer need to maintain a bool indicating that this project uses a managed git repo. Though we can continue to duplicate remote
in both places to keep rest of the code consistent.
admin/database/database.go
Outdated
// ManagedGithubRepoMeta represents metadata about a Rill managed GitHub repository for projects deployed on Rill Cloud. | ||
type ManagedGithubRepoMeta struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the database
package where many things are metadata, I think "meta" might be a redundant word. For example, we don't say DeploymentMeta
or AssetMeta
, just Deployment
or Asset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ManagedGitRepoMeta
indicates that it just a metadata about the managed git repo and the actual data is stored somewhere and not in this table/database. That is why I added the extra meta
suffix.
ManagedGitRepo
sounds like the managed git repository is stored here itself.
However I am fine with dropping the suffix to make it inline with other tables.
@@ -0,0 +1,12 @@ | |||
CREATE TABLE IF NOT EXISTS managed_github_repo_meta ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove IF NOT EXISTS
. It's very unexpected if it already exists!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added to make it fail proof if we switch between versions. I can remove it as well.
// If managedOrgFetchError is true then the installation ID is not valid and refetched again when needed. | ||
// mu controls access to managedOrgInstallationID and managedOrgFetchError. | ||
managedOrgInstallationID int64 | ||
managedOrgFetchError error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases does this get set? Seems like a pretty serious error.
I understand not returning it early since it can cause a crash loop. But seems like it might be worth logging an error log for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases does this get set? Seems like a pretty serious error.
managedOrgInstallationID
is fetched in server startup. If it fails managedOrgFetchError
is set. If managedOrgFetchError
is set to non nil then ID will be refetched on the next request. Once managedOrgInstallationID
is set, managedOrgFetchError
is set to nil and it is never refetched again.
I can add a log on failure.
admin/database/database.go
Outdated
type InsertManagedGithubRepoMetaOptions struct { | ||
OrgID string `validate:"required"` | ||
CreatedByUserID *string `validate:"required"` | ||
HTMLURL string `validate:"required"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we've previously used html URLs, but I regret it a bit and it feels fragile. For this use case, should we perhaps store HTTP remotes instead? E.g. https://github.com/account/repo.git
. It would make configuring remotes simpler, and would be better compatible with other Git providers than Github (e.g. Gitlab allows extra levels of nesting for repository paths).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the follow up to this PR, we also need to set this in project URL and it may be inconvenient to see both http_url
and clone_url
in the same column.
I can make it remote
and set clone_url
and then do handling at places wherever it is required.
cli/cmd/admin/start.go
Outdated
@@ -230,7 +231,7 @@ func StartCmd(ch *cmdutil.Helper) *cobra.Command { | |||
emailClient := email.New(sender) | |||
|
|||
// Init github client | |||
gh, err := admin.NewGithub(conf.GithubAppID, conf.GithubAppPrivateKey) | |||
gh, err := admin.NewGithub(cmd.Context(), conf.GithubAppID, conf.GithubAppPrivateKey, conf.RillManagedGithubOrg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily something to do now – but from an abstraction perspective, it feels weird to use the same client for user repositories and managed repositories. Would it make sense to initialize a separate client for managed Git repositories? E.g. admin.Github
and admin.ManagedGit
(incidentally backed by Github, but could be swapped for e.g. Google Cloud Source Repositories)?
Ideally we'd have an admin/git/provider.go
interface with a common interface for Git providers. That'd also make supporting other providers easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are in a complicated situation here.
Github and other providers do not confirm to a common implementation apart from the fact that they serve git repositories. So the serving part (which is mostly in runtime) is a common implementation but a lot of code in admin is already coupled to the provider.
Things like installationID, installation token and several such constructs may not be applicable to other providers and are kept in project table directly.
So we have two choices here :
a) Either embrace the fact that github is going to be an important part of our workflows and make it strongly coupled within our Admin code.
b) Try to find some middle ground for all these providers and keep it completely decoupled. But then we also need to make several changes in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe lets evaluate option 2 in a separate task.
updated_on TIMESTAMP DEFAULT NOW() | ||
); | ||
|
||
ALTER TABLE PROJECTS ADD column managed_git_repo_id UUID REFERENCES managed_git_repo (id) ON DELETE RESTRICT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but for caution/leniency maybe SET NULL
instead of RESTRICT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. But why do you think is leniency required here ? We should prevent deleting managed_git_repo
if it is still being used in project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leniency might make some handling when switching from managed to external Git, or since deletes happen in Github before they happen in Postgres. Honestly not sure about it, just wanted to ask.
Remote: *repo.CloneURL, | ||
Username: "x-access-token", | ||
Password: token, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is duplicated a few places now. Would it make sense to have a single func in s.admin.Github
format the remote, username, password, etc. for use everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some minor variations here and there. Let us keep it hardcoded for now.
if strings.HasSuffix(*proj.GithubURL, ".git") { | ||
cloneURL = *proj.GithubURL | ||
} else { | ||
cloneURL = *proj.GithubURL + ".git" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to run a migration that does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it in future but I think we still need to commit this change and run migration later ? Otherwise between the migration runs and code is deployed it will break.
Co-authored-by: Benjamin Egelund-Müller <[email protected]>
Co-authored-by: Benjamin Egelund-Müller <[email protected]>
INSERT INTO projects (org_id, name, description, public, created_by_user_id, provisioner, prod_olap_driver, prod_olap_dsn, prod_slots, subpath, prod_branch, archive_asset_id, github_url, github_installation_id, prod_ttl_seconds, prod_version) | ||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16) RETURNING *`, | ||
opts.OrganizationID, opts.Name, opts.Description, opts.Public, opts.CreatedByUserID, opts.Provisioner, opts.ProdOLAPDriver, opts.ProdOLAPDSN, opts.ProdSlots, opts.Subpath, opts.ProdBranch, opts.ArchiveAssetID, opts.GithubURL, opts.GithubInstallationID, opts.ProdTTLSeconds, opts.ProdVersion, | ||
INSERT INTO projects (org_id, name, description, public, created_by_user_id, provisioner, prod_olap_driver, prod_olap_dsn, prod_slots, subpath, prod_branch, archive_asset_id, github_url, github_installation_id, github_repo_id, prod_ttl_seconds, prod_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The managed_git_repo_id
appears not to be set/used here or in the UpdateProject
statements? If it's always empty, do we need it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be used in subsequent PRs when we use the managed git repos.
SELECT * FROM managed_git_repos | ||
WHERE project_id IS NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The managed_git_repos
doesn't have a project_id
column atm?
CREATE TABLE managed_git_repos ( | ||
id UUID DEFAULT uuid_generate_v4() PRIMARY KEY, | ||
remote TEXT NOT NULL UNIQUE, | ||
created_by_user_id UUID REFERENCES users (id) ON DELETE SET NULL, | ||
created_on TIMESTAMP DEFAULT NOW(), | ||
updated_on TIMESTAMP DEFAULT NOW() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it have either a project_id
or org_id
column?
Maybe org_id
would make sense? Then it would be similar to assets
, i.e. the deploy process would consist of:
- Creating a managed Git repo in the org and pushing to it
- Creating a project that uses the managed Git repo
But open to better ideas if you have any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the review comment suggested here and that seems to have broken the flow.
a) Removed the org_id
b) Made the relationship reverse i.e. projects now hold a reference to managed_git_repo
We need org_id
here to check the count of repos.
if githubManagedAcct == "" { | ||
g.managedOrgFetchError = fmt.Errorf("managed Git repositories are not configured for this environment") | ||
} | ||
i, _, err := appClient.Apps.FindOrganizationInstallation(ctx, githubManagedAcct) | ||
if err != nil { | ||
logger.Error("failed to get managed org installation ID", zap.Error(err), observability.ZapCtx(ctx)) | ||
g.managedOrgFetchError = err | ||
return g, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need an else
here? Otherwise g.managedOrgFetchError
would always get overridden with the Github error for empty repo name?
Subtask for : https://github.com/rilldata/rill-private-issues/issues/1569
Checklist: