Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Feature/shellstruct#38

Open
IanChenUIUC wants to merge 9 commits into
Ladybug-Memory:mainfrom
IanChenUIUC:feature/shellstruct
Open

Feature/shellstruct#38
IanChenUIUC wants to merge 9 commits into
Ladybug-Memory:mainfrom
IanChenUIUC:feature/shellstruct

Conversation

@IanChenUIUC

Copy link
Copy Markdown

Implementation of Shellstruct as described in issue #37 .

@IanChenUIUC IanChenUIUC marked this pull request as draft June 10, 2026 18:52
@adsharma

Copy link
Copy Markdown
Contributor

@IanChenUIUC updating the github workflow to install the necessary libs on windows/linux/mac should fix the problem. You can probably look at https://github.com/LadybugDB/ladybug/blob/main/.github/workflows/precompiled-bin-workflow.yml on how to do it.

@adsharma

adsharma commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

A simpler solution is to use arrow memory to store the Shellstruct and then write arrow to parquet in python.

That way we don't introduce a parquet dependency here.

@IanChenUIUC IanChenUIUC marked this pull request as ready for review June 11, 2026 01:18

return std::make_shared<arrow::UInt64Array>(length, buffer);
}
Graph::Graph(count n, bool directed, std::vector<node> outIndices, std::vector<index> outIndptr,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is an overkill. If the users are writing in CSR format, they may very well write in arrow

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation behind arrow support is zero-copy

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. Was there a reason why this constructor was declared (I did not add it)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it on main

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential use case is load an immutable graph and then mutate it. We should encourage immutable graphs -they're so much more efficient. But not forbid mutability.

Recommended pattern:

GraphR gr(...); // build from csr arrays
GraphW gw(gr); 

The base class Graph should be an interface.

@aheev aheev Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,42 @@
#!/usr/bin/env python3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some doc here?


std::set<node> ShellStruct::expandOneCommunity(const std::set<node> &s) {
if (!built)
throw std::invalid_argument("Need to build() or load() the shell struct");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an invalid_argument err

}

static parquet::WriterProperties::Builder writerPropsBuilder(const std::string &compression) {
parquet::WriterProperties::Builder b;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IanChenUIUC IanChenUIUC Jun 11, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make the fix. Would you like me to revert the parquet dependency as well?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yupp

}
}

ShellStruct::ShellStruct(const Graph &g) : SelectiveCommunityDetector(g) {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialise built

tree = GraphR(next_id, true, treeIndices, treeIndptr);
lca = std::make_unique<NetworKit::LeastCommonAncestor>(*tree, root);

std::vector<int64_t> offsets(tree_v_indptr.begin(), tree_v_indptr.end());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we duplicating tree_v_indptr?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow expects signed integers for the offsets. I was considering casting it, but although I think most compiles supports it, it is technically undefined behavior. A cleaner solution would be to make tree_v_indptr signed to begin with to avoid this copy -- will make the fix!

ARROW_RETURN_NOT_OK(builder.AppendNull());
std::shared_ptr<arrow::Array> indices;
ARROW_RETURN_NOT_OK(builder.Finish(&indices));
auto table = arrow::Table::Make(schema, {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should fail. Column array lengths are not equal

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are N nodes in the tree, then there are (N+1) indptrs and (N-1) indices. I truncated the indptr and padded the indices to size N, but I am very open for cleaner solutions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adsharma should we add a helper function in icebug-format to convert a icebug-memory graph to icebug-disk?

class LeastCommonAncestor {
private:
const Graph *g;
node root;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used

root = v;

GraphR nkTree(n, true, treeIndices, treeIndptr);
lca = std::make_unique<NetworKit::LeastCommonAncestor>(nkTree, root);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nkTree is a dangling pointer

}

node LeastCommonAncestor::Query(std::span<const node> nodes) {
if (nodes.empty())

@aheev aheev Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add bound checks too? incase nodes is corrupt

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants