Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

tmnd1991
Copy link
Contributor

It's very drafty and incomplete, but I would love that any maintainer could have a look to point any "big" mistake. To me the most controversial part is that java.nio.file.Path is not ergonomic at all for s3 buckets and keys.

def resolveOne[A](propName: String,
envName: String,
builder: String => A): ZIO[system.System, RuntimeException, A] = {
zio.system.properties.flatMap { map =>
Copy link
Contributor

@jeremyrsmith jeremyrsmith Dec 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use something like

zio.system.property(propName).some
  .orElse(zio.system.env(envName).some)
  .orElseFail(new RuntimeException(s"Cannot find system property $propName or environment variable $envName"))

build.sbt Outdated
"com.vladsch.flexmark" % "flexmark-ext-yaml-front-matter" % "0.34.32",
"org.slf4j" % "slf4j-simple" % "1.7.25"
"org.slf4j" % "slf4j-simple" % "1.7.25",
"net.java.dev.jets3t" % "jets3t" % "0.7.1" // same as Spark one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for using the same thing as Spark, but I wonder if this will cause conflicts across different versions of Spark.

@jeremyrsmith
Copy link
Contributor

jeremyrsmith commented Dec 28, 2020

This is a tricky one... I think in the Spark case (which I'd estimate is the majority) we'd want to use Hadoop APIs to access S3 in order to avoid dependency conflicts. But:

  1. We haven't implemented a plugin hook for notebook storage providers yet (though this would probably be the time to do that! Let me know if you're interested in adding that!), so it wouldn't be possible to define a NotebookFilesystem in polynote-spark.
  2. It would limit S3 notebook support to Spark (or at least Hadoop) users. I think this would probably be OK?

Edit: Actually, I think we don't have to worry about conflicts, with some of the packaging/deployment changes I'm working on. But still, I think I'd rather see S3 support be a plug-in, which means we have to at least figure out the plugin hook for filesystems and get the repository mechanism to support URIs.


override def init(path: Path): RIO[BaseEnv, Unit] = ZIO.unit

def getBucket(p: Path): RIO[BaseEnv, S3Bucket] = {
Copy link
Contributor

@jeremyrsmith jeremyrsmith Dec 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the bucket (and optionally a base path) be part of the notebook repository instead? Then, paths would just be paths. You'd parameterize the filesystem with the bucket and base path, and create a new type of NotebookRepository which is configured with the base S3 URI and passes the bucket and base path to the filesystem (or instead, adapt FileBasedRepository to accept a base URI instead of a base path... some of the plumbing for this exists, but is unused)

@tmnd1991
Copy link
Contributor Author

If I have your "blessing" I would rework the NotebookFilesystem API to use URIs instead of Paths that would simplify a lot working with S3. Then I'm ok with moving the S3 implementation inside the Spark module and use whatever library is shipped with Spark (maybe Hadoop), but I'd love to have some hints in how to implement a plugin system for NotebookFilesystem

@tmnd1991
Copy link
Contributor Author

tmnd1991 commented Jan 6, 2021

Hi @jeremyrsmith , I pushed some changes, are they going in the right direction?
I set up a ServiceLoader, to use the "forScheme" method of the fs ZLayer (which has never been used so far). I added a props: Map[String, String] argument in order to possibly configure in some way the filesystems (i.e. credentials, bucket name in case of s3, etc). Maybe some other changes must be done to make "usable". I think an FS (in case of s3 at least) should be a ZManaged (to be managed and closed eventually) and/or there should be some "cache" of fs (i.e. not instantiating a new one every time we call the forScheme, would be nice). let me know what do you think :) Thank in advance 😄

@jeremyrsmith
Copy link
Contributor

@tmnd1991 absolutely I think a lot of things should be moved to ZManaged... I had a branch which moved in that direction, will be slowly trying to pick things from it over time.

I think configuration of plugin-based filesystems would ideally work similarly to how configuration of plugin-based authentication providers works. To be honest we're discussing this internally right now, so that might be something to wait for before putting a lot more effort here 😞

@tmnd1991
Copy link
Contributor Author

tmnd1991 commented Jan 7, 2021

I have no rush :) if there’s some way to “watch” the discussion would really be interesting to me. Anyway I’ll hold on until you made a decision about those changes 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants