Draft: Use AWS S3 as Storage Backend #1010

tmnd1991 · 2020-12-26T22:09:39Z

It's very drafty and incomplete, but I would love that any maintainer could have a look to point any "big" mistake. To me the most controversial part is that java.nio.file.Path is not ergonomic at all for s3 buckets and keys.

jeremyrsmith · 2020-12-28T19:21:13Z

polynote-server/src/main/scala/polynote/server/repository/fs/S3Filesystem.scala

+  def resolveOne[A](propName: String,
+                    envName: String,
+                    builder: String => A): ZIO[system.System, RuntimeException, A] = {
+    zio.system.properties.flatMap { map =>


You can use something like

zio.system.property(propName).some .orElse(zio.system.env(envName).some) .orElseFail(new RuntimeException(s"Cannot find system property $propName or environment variable $envName"))

jeremyrsmith · 2020-12-28T19:26:22Z

build.sbt

    "com.vladsch.flexmark" % "flexmark-ext-yaml-front-matter" % "0.34.32",
-    "org.slf4j" % "slf4j-simple" % "1.7.25"
+    "org.slf4j" % "slf4j-simple" % "1.7.25",
+    "net.java.dev.jets3t" % "jets3t" % "0.7.1" // same as Spark one


Thanks for using the same thing as Spark, but I wonder if this will cause conflicts across different versions of Spark.

jeremyrsmith · 2020-12-28T19:32:18Z

This is a tricky one... I think in the Spark case (which I'd estimate is the majority) ~~we'd want to use Hadoop APIs to access S3~~ in order to avoid dependency conflicts. But:

We haven't implemented a plugin hook for notebook storage providers yet (though this would probably be the time to do that! Let me know if you're interested in adding that!), so it wouldn't be possible to define a NotebookFilesystem in polynote-spark.
It would limit S3 notebook support to Spark (or at least Hadoop) users. I think this would probably be OK?

Edit: Actually, I think we don't have to worry about conflicts, with some of the packaging/deployment changes I'm working on. But still, I think I'd rather see S3 support be a plug-in, which means we have to at least figure out the plugin hook for filesystems and get the repository mechanism to support URIs.

jeremyrsmith · 2020-12-28T19:37:20Z

polynote-server/src/main/scala/polynote/server/repository/fs/S3Filesystem.scala

+
+  override def init(path: Path): RIO[BaseEnv, Unit] = ZIO.unit
+
+  def getBucket(p: Path): RIO[BaseEnv, S3Bucket] = {


Should the bucket (and optionally a base path) be part of the notebook repository instead? Then, paths would just be paths. You'd parameterize the filesystem with the bucket and base path, and create a new type of NotebookRepository which is configured with the base S3 URI and passes the bucket and base path to the filesystem (or instead, adapt FileBasedRepository to accept a base URI instead of a base path... some of the plumbing for this exists, but is unused)

tmnd1991 · 2020-12-28T22:36:12Z

If I have your "blessing" I would rework the NotebookFilesystem API to use URIs instead of Paths that would simplify a lot working with S3. Then I'm ok with moving the S3 implementation inside the Spark module and use whatever library is shipped with Spark (maybe Hadoop), but I'd love to have some hints in how to implement a plugin system for NotebookFilesystem

tmnd1991 · 2021-01-06T18:20:22Z

Hi @jeremyrsmith , I pushed some changes, are they going in the right direction?
I set up a ServiceLoader, to use the "forScheme" method of the fs ZLayer (which has never been used so far). I added a props: Map[String, String] argument in order to possibly configure in some way the filesystems (i.e. credentials, bucket name in case of s3, etc). Maybe some other changes must be done to make "usable". I think an FS (in case of s3 at least) should be a ZManaged (to be managed and closed eventually) and/or there should be some "cache" of fs (i.e. not instantiating a new one every time we call the forScheme, would be nice). let me know what do you think :) Thank in advance 😄

jeremyrsmith · 2021-01-07T01:17:34Z

@tmnd1991 absolutely I think a lot of things should be moved to ZManaged... I had a branch which moved in that direction, will be slowly trying to pick things from it over time.

I think configuration of plugin-based filesystems would ideally work similarly to how configuration of plugin-based authentication providers works. To be honest we're discussing this internally right now, so that might be something to wait for before putting a lot more effort here 😞

tmnd1991 · 2021-01-07T07:14:18Z

I have no rush :) if there’s some way to “watch” the discussion would really be interesting to me. Anyway I’ll hold on until you made a decision about those changes 😄

jeremyrsmith reviewed Dec 28, 2020

View reviewed changes

agilelab-tmnd1991 added 4 commits January 6, 2021 19:04

Incomplete draft

465c2e0

wip

f096b58

use service loader

4c17032

Revert minor changes

14e9fb9

tmnd1991 force-pushed the feature/674 branch from aa16ae9 to 14e9fb9 Compare January 6, 2021 18:06

Apply suggested changes, and remove unwanted ones

52d6bc0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: Use AWS S3 as Storage Backend #1010

Draft: Use AWS S3 as Storage Backend #1010

Uh oh!

tmnd1991 commented Dec 26, 2020

Uh oh!

jeremyrsmith Dec 28, 2020 •

edited

Loading

Uh oh!

jeremyrsmith Dec 28, 2020

Uh oh!

jeremyrsmith commented Dec 28, 2020 •

edited

Loading

Uh oh!

jeremyrsmith Dec 28, 2020 •

edited

Loading

Uh oh!

tmnd1991 commented Dec 28, 2020

Uh oh!

tmnd1991 commented Jan 6, 2021

Uh oh!

jeremyrsmith commented Jan 7, 2021

Uh oh!

tmnd1991 commented Jan 7, 2021

Uh oh!

Uh oh!


		override def init(path: Path): RIO[BaseEnv, Unit] = ZIO.unit

		def getBucket(p: Path): RIO[BaseEnv, S3Bucket] = {

Draft: Use AWS S3 as Storage Backend #1010

Are you sure you want to change the base?

Draft: Use AWS S3 as Storage Backend #1010

Uh oh!

Conversation

tmnd1991 commented Dec 26, 2020

Uh oh!

jeremyrsmith Dec 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremyrsmith Dec 28, 2020

Choose a reason for hiding this comment

Uh oh!

jeremyrsmith commented Dec 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremyrsmith Dec 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmnd1991 commented Dec 28, 2020

Uh oh!

tmnd1991 commented Jan 6, 2021

Uh oh!

jeremyrsmith commented Jan 7, 2021

Uh oh!

tmnd1991 commented Jan 7, 2021

Uh oh!

Uh oh!

jeremyrsmith Dec 28, 2020 •

edited

Loading

jeremyrsmith commented Dec 28, 2020 •

edited

Loading

jeremyrsmith Dec 28, 2020 •

edited

Loading