Feature/shellstruct#38
Conversation
|
@IanChenUIUC updating the github workflow to install the necessary libs on windows/linux/mac should fix the problem. You can probably look at https://github.com/LadybugDB/ladybug/blob/main/.github/workflows/precompiled-bin-workflow.yml on how to do it. |
|
A simpler solution is to use arrow memory to store the Shellstruct and then write arrow to parquet in python. That way we don't introduce a parquet dependency here. |
|
|
||
| return std::make_shared<arrow::UInt64Array>(length, buffer); | ||
| } | ||
| Graph::Graph(count n, bool directed, std::vector<node> outIndices, std::vector<index> outIndptr, |
There was a problem hiding this comment.
I feel this is an overkill. If the users are writing in CSR format, they may very well write in arrow
There was a problem hiding this comment.
The motivation behind arrow support is zero-copy
There was a problem hiding this comment.
I think that makes sense. Was there a reason why this constructor was declared (I did not add it)?
There was a problem hiding this comment.
There was a problem hiding this comment.
One potential use case is load an immutable graph and then mutate it. We should encourage immutable graphs -they're so much more efficient. But not forbid mutability.
Recommended pattern:
GraphR gr(...); // build from csr arrays
GraphW gw(gr);
The base class Graph should be an interface.
There was a problem hiding this comment.
https://github.com/Ladybug-Memory/icebug/blob/main/include/networkit/graph/Graph.hpp#L739-L752
it's a dangling declaration
| @@ -0,0 +1,42 @@ | |||
| #!/usr/bin/env python3 | |||
|
|
|||
|
|
||
| std::set<node> ShellStruct::expandOneCommunity(const std::set<node> &s) { | ||
| if (!built) | ||
| throw std::invalid_argument("Need to build() or load() the shell struct"); |
There was a problem hiding this comment.
This is not an invalid_argument err
| } | ||
|
|
||
| static parquet::WriterProperties::Builder writerPropsBuilder(const std::string &compression) { | ||
| parquet::WriterProperties::Builder b; |
There was a problem hiding this comment.
I can make the fix. Would you like me to revert the parquet dependency as well?
| } | ||
| } | ||
|
|
||
| ShellStruct::ShellStruct(const Graph &g) : SelectiveCommunityDetector(g) {} |
| tree = GraphR(next_id, true, treeIndices, treeIndptr); | ||
| lca = std::make_unique<NetworKit::LeastCommonAncestor>(*tree, root); | ||
|
|
||
| std::vector<int64_t> offsets(tree_v_indptr.begin(), tree_v_indptr.end()); |
There was a problem hiding this comment.
why are we duplicating tree_v_indptr?
There was a problem hiding this comment.
Arrow expects signed integers for the offsets. I was considering casting it, but although I think most compiles supports it, it is technically undefined behavior. A cleaner solution would be to make tree_v_indptr signed to begin with to avoid this copy -- will make the fix!
| ARROW_RETURN_NOT_OK(builder.AppendNull()); | ||
| std::shared_ptr<arrow::Array> indices; | ||
| ARROW_RETURN_NOT_OK(builder.Finish(&indices)); | ||
| auto table = arrow::Table::Make(schema, { |
There was a problem hiding this comment.
this should fail. Column array lengths are not equal
There was a problem hiding this comment.
If there are N nodes in the tree, then there are (N+1) indptrs and (N-1) indices. I truncated the indptr and padded the indices to size N, but I am very open for cleaner solutions.
There was a problem hiding this comment.
@adsharma should we add a helper function in icebug-format to convert a icebug-memory graph to icebug-disk?
| class LeastCommonAncestor { | ||
| private: | ||
| const Graph *g; | ||
| node root; |
| root = v; | ||
|
|
||
| GraphR nkTree(n, true, treeIndices, treeIndptr); | ||
| lca = std::make_unique<NetworKit::LeastCommonAncestor>(nkTree, root); |
There was a problem hiding this comment.
nkTree is a dangling pointer
| } | ||
|
|
||
| node LeastCommonAncestor::Query(std::span<const node> nodes) { | ||
| if (nodes.empty()) |
There was a problem hiding this comment.
can you add bound checks too? incase nodes is corrupt
Implementation of Shellstruct as described in issue #37 .