Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: Admin service support for managed github repo APIs #7213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

k-anshul
Copy link
Member

@k-anshul k-anshul commented Apr 24, 2025

Subtask for : https://github.com/rilldata/rill-private-issues/issues/1569

Checklist:

  • Covered by tests
  • Ran it and it works as intended
  • Reviewed the diff before requesting a review
  • Checked for unhandled edge cases
  • Linked the issues it closes
  • Checked if the docs need to be updated
  • Intend to cherry-pick into the release branch
  • I'm proud of this work!

@k-anshul k-anshul self-assigned this Apr 24, 2025
@k-anshul k-anshul requested a review from begelundmuller April 25, 2025 10:24
@@ -308,6 +308,13 @@ type DB interface {
InsertProvisionerResource(ctx context.Context, opts *InsertProvisionerResourceOptions) (*ProvisionerResource, error)
UpdateProvisionerResource(ctx context.Context, id string, opts *UpdateProvisionerResourceOptions) (*ProvisionerResource, error)
DeleteProvisionerResource(ctx context.Context, id string) error

FindManagedGithubRepoMeta(ctx context.Context, htmlURL string) (*ManagedGithubRepoMeta, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment will apply throughout the PR – we should call it "managed Git", not "managed Github".

Github is the provider, which we should be able to swap with another provider in the future by implementing a new integration, but without renaming APIs, database tables, etc..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it in another reply but the problem is that a lot of code is already strongly coupled with Github and we don't know what common ground exists between Github and Gitlab and other providers so decoupling it completely is probably not as straightforward.

Hence as an extension it just felt intuitive to mark it Github and not Git but I am fine with removing Github from managed and made changes in the PR.

Comment on lines 1275 to 1276
OrgID string `db:"org_id"`
ProjectID *string `db:"project_id"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing org_id and project_id feels off, since one can be looked up from the other. Should the relationship to project_id be stored the reverse way? I.e. foreign key from the projects table, not the other way around?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept org_id to easily fetch repo for an org(normalization vs simplcity). But I can remove it as well. Just need an extra join.

Yeah I think storing the relationship other way is also good. That way we no longer need to maintain a bool indicating that this project uses a managed git repo. Though we can continue to duplicate remote in both places to keep rest of the code consistent.

Comment on lines 1272 to 1273
// ManagedGithubRepoMeta represents metadata about a Rill managed GitHub repository for projects deployed on Rill Cloud.
type ManagedGithubRepoMeta struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the database package where many things are metadata, I think "meta" might be a redundant word. For example, we don't say DeploymentMeta or AssetMeta, just Deployment or Asset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManagedGitRepoMeta indicates that it just a metadata about the managed git repo and the actual data is stored somewhere and not in this table/database. That is why I added the extra meta suffix.

ManagedGitRepo sounds like the managed git repository is stored here itself.
However I am fine with dropping the suffix to make it inline with other tables.

@@ -0,0 +1,12 @@
CREATE TABLE IF NOT EXISTS managed_github_repo_meta (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove IF NOT EXISTS. It's very unexpected if it already exists!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added to make it fail proof if we switch between versions. I can remove it as well.

// If managedOrgFetchError is true then the installation ID is not valid and refetched again when needed.
// mu controls access to managedOrgInstallationID and managedOrgFetchError.
managedOrgInstallationID int64
managedOrgFetchError error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases does this get set? Seems like a pretty serious error.

I understand not returning it early since it can cause a crash loop. But seems like it might be worth logging an error log for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases does this get set? Seems like a pretty serious error.

managedOrgInstallationID is fetched in server startup. If it fails managedOrgFetchError is set. If managedOrgFetchError is set to non nil then ID will be refetched on the next request. Once managedOrgInstallationID is set, managedOrgFetchError is set to nil and it is never refetched again.

I can add a log on failure.

type InsertManagedGithubRepoMetaOptions struct {
OrgID string `validate:"required"`
CreatedByUserID *string `validate:"required"`
HTMLURL string `validate:"required"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've previously used html URLs, but I regret it a bit and it feels fragile. For this use case, should we perhaps store HTTP remotes instead? E.g. https://github.com/account/repo.git. It would make configuring remotes simpler, and would be better compatible with other Git providers than Github (e.g. Gitlab allows extra levels of nesting for repository paths).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the follow up to this PR, we also need to set this in project URL and it may be inconvenient to see both http_url and clone_url in the same column.
I can make it remote and set clone_url and then do handling at places wherever it is required.

@@ -230,7 +231,7 @@ func StartCmd(ch *cmdutil.Helper) *cobra.Command {
emailClient := email.New(sender)

// Init github client
gh, err := admin.NewGithub(conf.GithubAppID, conf.GithubAppPrivateKey)
gh, err := admin.NewGithub(cmd.Context(), conf.GithubAppID, conf.GithubAppPrivateKey, conf.RillManagedGithubOrg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily something to do now – but from an abstraction perspective, it feels weird to use the same client for user repositories and managed repositories. Would it make sense to initialize a separate client for managed Git repositories? E.g. admin.Github and admin.ManagedGit (incidentally backed by Github, but could be swapped for e.g. Google Cloud Source Repositories)?

Ideally we'd have an admin/git/provider.go interface with a common interface for Git providers. That'd also make supporting other providers easier.

Copy link
Member Author

@k-anshul k-anshul May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are in a complicated situation here.
Github and other providers do not confirm to a common implementation apart from the fact that they serve git repositories. So the serving part (which is mostly in runtime) is a common implementation but a lot of code in admin is already coupled to the provider.
Things like installationID, installation token and several such constructs may not be applicable to other providers and are kept in project table directly.

So we have two choices here :
a) Either embrace the fact that github is going to be an important part of our workflows and make it strongly coupled within our Admin code.
b) Try to find some middle ground for all these providers and keep it completely decoupled. But then we also need to make several changes in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lets evaluate option 2 in a separate task.

updated_on TIMESTAMP DEFAULT NOW()
);

ALTER TABLE PROJECTS ADD column managed_git_repo_id UUID REFERENCES managed_git_repo (id) ON DELETE RESTRICT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but for caution/leniency maybe SET NULL instead of RESTRICT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But why do you think is leniency required here ? We should prevent deleting managed_git_repo if it is still being used in project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leniency might make some handling when switching from managed to external Git, or since deletes happen in Github before they happen in Postgres. Honestly not sure about it, just wanted to ask.

Comment on lines +421 to +423
Remote: *repo.CloneURL,
Username: "x-access-token",
Password: token,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is duplicated a few places now. Would it make sense to have a single func in s.admin.Github format the remote, username, password, etc. for use everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some minor variations here and there. Let us keep it hardcoded for now.

Comment on lines +1254 to +1258
if strings.HasSuffix(*proj.GithubURL, ".git") {
cloneURL = *proj.GithubURL
} else {
cloneURL = *proj.GithubURL + ".git"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to run a migration that does this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do it in future but I think we still need to commit this change and run migration later ? Otherwise between the migration runs and code is deployed it will break.

@k-anshul k-anshul requested a review from begelundmuller May 6, 2025 06:02
INSERT INTO projects (org_id, name, description, public, created_by_user_id, provisioner, prod_olap_driver, prod_olap_dsn, prod_slots, subpath, prod_branch, archive_asset_id, github_url, github_installation_id, prod_ttl_seconds, prod_version)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16) RETURNING *`,
opts.OrganizationID, opts.Name, opts.Description, opts.Public, opts.CreatedByUserID, opts.Provisioner, opts.ProdOLAPDriver, opts.ProdOLAPDSN, opts.ProdSlots, opts.Subpath, opts.ProdBranch, opts.ArchiveAssetID, opts.GithubURL, opts.GithubInstallationID, opts.ProdTTLSeconds, opts.ProdVersion,
INSERT INTO projects (org_id, name, description, public, created_by_user_id, provisioner, prod_olap_driver, prod_olap_dsn, prod_slots, subpath, prod_branch, archive_asset_id, github_url, github_installation_id, github_repo_id, prod_ttl_seconds, prod_version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The managed_git_repo_id appears not to be set/used here or in the UpdateProject statements? If it's always empty, do we need it at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be used in subsequent PRs when we use the managed git repos.

Comment on lines +2557 to +2558
SELECT * FROM managed_git_repos
WHERE project_id IS NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The managed_git_repos doesn't have a project_id column atm?

Comment on lines +1 to +7
CREATE TABLE managed_git_repos (
id UUID DEFAULT uuid_generate_v4() PRIMARY KEY,
remote TEXT NOT NULL UNIQUE,
created_by_user_id UUID REFERENCES users (id) ON DELETE SET NULL,
created_on TIMESTAMP DEFAULT NOW(),
updated_on TIMESTAMP DEFAULT NOW()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it have either a project_id or org_id column?

Maybe org_id would make sense? Then it would be similar to assets, i.e. the deploy process would consist of:

  1. Creating a managed Git repo in the org and pushing to it
  2. Creating a project that uses the managed Git repo

But open to better ideas if you have any?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the review comment suggested here and that seems to have broken the flow.
a) Removed the org_id
b) Made the relationship reverse i.e. projects now hold a reference to managed_git_repo

We need org_id here to check the count of repos.

Comment on lines +82 to +90
if githubManagedAcct == "" {
g.managedOrgFetchError = fmt.Errorf("managed Git repositories are not configured for this environment")
}
i, _, err := appClient.Apps.FindOrganizationInstallation(ctx, githubManagedAcct)
if err != nil {
logger.Error("failed to get managed org installation ID", zap.Error(err), observability.ZapCtx(ctx))
g.managedOrgFetchError = err
return g, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need an else here? Otherwise g.managedOrgFetchError would always get overridden with the Github error for empty repo name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants