Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@james-willis
Copy link
Collaborator

What changes were proposed in this pull request?

Adding some more LDBC graph500 benchmark datasets to make the utils more useful

Why are the changes needed?

I'm testing regressions in connected components and need larger datasets

Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is you may face some problems with downloading and unpacking 10th of gigabytes of tar zst archives from inside of JVM.

@james-willis
Copy link
Collaborator Author

i think i fucked up my commit message, stand by!

@james-willis james-willis changed the title typo add more LDBC benchmark datasets Dec 29, 2025
@james-willis
Copy link
Collaborator Author

My only concern is you may face some problems with downloading and unpacking 10th of gigabytes of tar zst archives from inside of JVM.

Yeah but that is the problem of the user to allocate more resources I suppose.

@rjurney
Copy link
Collaborator

rjurney commented Dec 29, 2025

Is it possible to stream to disk? Then the cluster might not need all the RAM to download? Not sure I understand the issue, just popping my head up from a Lyft :)

@james-willis
Copy link
Collaborator Author

Sem's original code does stream the compressed file to disk:

      val connection = ldbcURL(name).openConnection()
      val inputStream = connection.getInputStream
      val outputStream = Files.newOutputStream(archivePath)
      val buffer = new Array[Byte](bufferSize)
      var bytesRead = 0
      while ({ bytesRead = inputStream.read(buffer); bytesRead } != -1) {
        outputStream.write(buffer, 0, bytesRead)
      }
      inputStream.close()
      outputStream.close()

@james-willis james-willis merged commit 2c80448 into graphframes:main Dec 29, 2025
5 checks passed
@james-willis james-willis deleted the add-ldbc-datasets branch December 29, 2025 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants