-
Notifications
You must be signed in to change notification settings - Fork 393
Draft: Use AWS S3 as Storage Backend #1010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
def resolveOne[A](propName: String, | ||
envName: String, | ||
builder: String => A): ZIO[system.System, RuntimeException, A] = { | ||
zio.system.properties.flatMap { map => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use something like
zio.system.property(propName).some
.orElse(zio.system.env(envName).some)
.orElseFail(new RuntimeException(s"Cannot find system property $propName or environment variable $envName"))
build.sbt
Outdated
"com.vladsch.flexmark" % "flexmark-ext-yaml-front-matter" % "0.34.32", | ||
"org.slf4j" % "slf4j-simple" % "1.7.25" | ||
"org.slf4j" % "slf4j-simple" % "1.7.25", | ||
"net.java.dev.jets3t" % "jets3t" % "0.7.1" // same as Spark one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for using the same thing as Spark, but I wonder if this will cause conflicts across different versions of Spark.
This is a tricky one... I think in the Spark case (which I'd estimate is the majority)
Edit: Actually, I think we don't have to worry about conflicts, with some of the packaging/deployment changes I'm working on. But still, I think I'd rather see S3 support be a plug-in, which means we have to at least figure out the plugin hook for filesystems and get the repository mechanism to support URIs. |
|
||
override def init(path: Path): RIO[BaseEnv, Unit] = ZIO.unit | ||
|
||
def getBucket(p: Path): RIO[BaseEnv, S3Bucket] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the bucket (and optionally a base path) be part of the notebook repository instead? Then, paths would just be paths. You'd parameterize the filesystem with the bucket and base path, and create a new type of NotebookRepository
which is configured with the base S3 URI and passes the bucket and base path to the filesystem (or instead, adapt FileBasedRepository
to accept a base URI instead of a base path... some of the plumbing for this exists, but is unused)
If I have your "blessing" I would rework the NotebookFilesystem API to use URIs instead of Paths that would simplify a lot working with S3. Then I'm ok with moving the S3 implementation inside the Spark module and use whatever library is shipped with Spark (maybe Hadoop), but I'd love to have some hints in how to implement a plugin system for |
Hi @jeremyrsmith , I pushed some changes, are they going in the right direction? |
@tmnd1991 absolutely I think a lot of things should be moved to I think configuration of plugin-based filesystems would ideally work similarly to how configuration of plugin-based authentication providers works. To be honest we're discussing this internally right now, so that might be something to wait for before putting a lot more effort here 😞 |
I have no rush :) if there’s some way to “watch” the discussion would really be interesting to me. Anyway I’ll hold on until you made a decision about those changes 😄 |
It's very drafty and incomplete, but I would love that any maintainer could have a look to point any "big" mistake. To me the most controversial part is that
java.nio.file.Path
is not ergonomic at all for s3 buckets and keys.