Codestin Search App

james-willis · 2025-12-29T21:54:21Z

What changes were proposed in this pull request?

Adding some more LDBC graph500 benchmark datasets to make the utils more useful

Why are the changes needed?

I'm testing regressions in connected components and need larger datasets

SemyonSinchenko

My only concern is you may face some problems with downloading and unpacking 10th of gigabytes of tar zst archives from inside of JVM.

james-willis · 2025-12-29T22:03:03Z

i think i fucked up my commit message, stand by!

james-willis · 2025-12-29T22:04:18Z

My only concern is you may face some problems with downloading and unpacking 10th of gigabytes of tar zst archives from inside of JVM.

Yeah but that is the problem of the user to allocate more resources I suppose.

rjurney · 2025-12-29T22:07:45Z

Is it possible to stream to disk? Then the cluster might not need all the RAM to download? Not sure I understand the issue, just popping my head up from a Lyft :)

james-willis · 2025-12-29T22:14:05Z

Sem's original code does stream the compressed file to disk:

      val connection = ldbcURL(name).openConnection()
      val inputStream = connection.getInputStream
      val outputStream = Files.newOutputStream(archivePath)
      val buffer = new Array[Byte](bufferSize)
      var bytesRead = 0
      while ({ bytesRead = inputStream.read(buffer); bytesRead } != -1) {
        outputStream.write(buffer, 0, bytesRead)
      }
      inputStream.close()
      outputStream.close()

james-willis requested a review from SemyonSinchenko December 29, 2025 21:54

SemyonSinchenko approved these changes Dec 29, 2025

View reviewed changes

add more LDBC benchmark datasets

52757ef

james-willis force-pushed the add-ldbc-datasets branch from 7ba4969 to 52757ef Compare December 29, 2025 22:03

james-willis changed the title ~~typo~~ add more LDBC benchmark datasets Dec 29, 2025

james-willis merged commit 2c80448 into graphframes:main Dec 29, 2025
5 checks passed

james-willis deleted the add-ldbc-datasets branch December 29, 2025 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add more LDBC benchmark datasets#764

add more LDBC benchmark datasets#764
james-willis merged 1 commit intographframes:mainfrom
james-willis:add-ldbc-datasets

james-willis commented Dec 29, 2025

Uh oh!

SemyonSinchenko left a comment

Uh oh!

james-willis commented Dec 29, 2025

Uh oh!

james-willis commented Dec 29, 2025

Uh oh!

rjurney commented Dec 29, 2025

Uh oh!

james-willis commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

james-willis commented Dec 29, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Uh oh!

SemyonSinchenko left a comment

Choose a reason for hiding this comment

Uh oh!

james-willis commented Dec 29, 2025

Uh oh!

james-willis commented Dec 29, 2025

Uh oh!

rjurney commented Dec 29, 2025

Uh oh!

james-willis commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants