LGA

This library helps to compute semi-supervised leasing methods[1] and personal pagerank method for large graphs. The implementation could work with any kind of graph which could be implemented from fr.inria.maestro.lga.graph.model.IGraph interface. We supposed that graph without loops and multiedges.

The library provides as well:

the convenient way to work with graphs in BVGraph[2], Pajek Net format and simple readable format: "nodeFrom nodeTo weight", let us name it as matlab format, the numeration of the nodes starts from 1
the namers to make the nodes with names
the semi-supervised learning classification, including computation of the presonal pagerank method
the expert nodes/seed nodes format to load/save them
the estimations to get the precision, recall, modularity and confusion matrix
the various seed choosers

[1] Generalized Optimization Framework for Graph-based Semi-supervised Learning. SDM 2012: 966-974 [2] The WebGraph Framework I: Compression Techniques, by Paolo Boldi and Sebastiano Vigna, in Proc. of the Thirteenth World Wide Web Conference, pages 595-601, 2004, ACM Press.

1)The formats The library provides the convenient way to work with graphs in BVGraph[2], Pajek NET format and simple readable format: "nodeFrom nodeTo weight" for weighted graph and "nodeFrom nodeTo" for unweighted.

BVGraph format To load file in BVGraph format: String file = "/yourpath/yourBVGraph";// without suffix IGraph graph = new BVGGraphLoader().load(file); To save file in BVGraph format: BVGGraphSaver().save(graph, file); If files "/yourpath/yourBVGraphlabels" exists the weghted graph will be loaded.

Pajek Net format One of the common used formats. The first line represents number of vertices, the next lines represents node labels. The edges defined as a pair of two nodes. After line "*Edges" exactly as in matlab format. ILoader loader = new PajekNetLoader().load(file); new PajekNetSaver().save(graph, file); *Vertices 9 1 "Name1" 9 "Name" *Edges 1 2 0.5 1 9 0.1

Matlab format We call it matlab format, because it could be easily loaded in matlab as sparse matrix. This is a simple edge list with or without weights. The numeration of the nodes starts form 1. File file = new File("/yourpath/yourmatlab.txt"); graph = MatlabLoader().load(file); MatLabSaver().save(graph, file);

Any type of graphs could be used, if they implements fr.inria.maestro.lga.graph.model.IGraph interface. See section about how to implement your own graph realization.

2)The namers The library provides the simple realization of namer. The "yourGraphNamer.txt" contains a column of names like: Name1 Name2 ... NameN The number of line corresponds to node id. Note:To make it compatible with BVGraph library the names should be sorted in alphabetical order, not necessary for our library.

String pathToNamer = "yourGraphNamer.txt"; INodeNamer namer = NodeNamerImpl.load(new File(pathToNamer)); or you can use IdenticalIdNodeNamer as your nodeId. To setup the namer to graph: graph.getProperties().setValue(IGraph.NODE_NAMER, namer);

The algorithms The library provides the implementation of semi-supervised methods in general case with parameter sigma. Also the library provides the implementations of particular algorithms: PageRank , Standatd Laplacian, Normalized Laplacian methods, which could be significantly faster for large graphs in BVG format. BVG format compress graph and any operation to access the data could take time.

Mainly use the general implementation to compute methods, if it is slow than try the particular implementations of the methods.

The personal pagerank calculation (the implementation of particular method): List listOfNodes;//the list of nodes ids IPersonalVector pV = new PersonalVector(listOfNodes,graph.numNodes()); CalculatorInputData iD = new CalculatorInputData(alpha); //has different setups ICalculator pc = CalculatorFactory.getPageRankCalculator(graph, iD); double[] ppr = pc.compute(pV.getPersonalVector()); //personalVector is a stohastic vector The CalculatorInputData has defaults values as marked in Constants: epsilon 0.000001, max number of iterations 100, alpha 0.5.

The general personal pagerank calculation (based on sigma parameter): CalculatorInputDataSigma iD = new CalculatorInputDataSigma(sigma, alpha); //has different setups //iD.setWeighted(true); to compute pagerank as for weighted graph, by default is false IPagerankCalculator pc = CalculatorFactory.getGeneralCalculator(graph, iD); double[] gppr = pc.compute(personalVector); //personalVector is a stohastic vector

The semi-supervised learning classification takes as input seeds SemiSupervisedLearningMethod sslm = new SemiSupervisedLearningMethod(); int[] nodeToClassId = sslm.run(seedsInput, pc); // node I classified as nodeToClassId[I] class, if class equals to -1 means unclassified

The expert/seeds files Provides a convinient way to work with seeds/expert files. The format for seeds/expert files: "Short explanation about file. Could be empty line.

Class Name A NodeName1 NodeName2

Class Name B NodeName3 "

IClustering seeds = ClusteringUtils.loadFromFile(new File(seedsName)); //load file ISeedsInput seedsInput = ClusteringConverter.clustering2SeedsInput(seeds, namer, graph.numNodes()); //convert to seedInput object IClustering resultClassification = ClusteringConverter.array2Custering(nodeToClassId, namer, seeds); //convert to clustering object from array
ClusteringUtils.saveToFile(resultClassification, new File("someName.txt"));//save to file

The estimations Provides methods to get precision, recall and confusion matrix. \IClustering expert; IClustering alg; ClusteringPreprocessor algClustering = new ClusteringPreprocessor(alg); PrecisionQuantity prec = new PrecisionQuantity(new ClusteringPreprocessor(expert)); Pair<Double, Double> res = prec.getPrecisionRecall(algClustering); //res.getFirst() -- precision, res.getSecond() -- recall prec.printConfusionMatrix(algClustering, new PrintWriter(System.out)) //to print Confusion Matrix
How to write your own graph implementation The library could work with any kind of graph which could be implemented from fr.inria.maestro.lga.graph.model.IGraph interface. IGraph is a simple interface:
int numNodes(); //number of nodes long numArcs(); //number of arcs (not nessecary for our library) int outdegree(int node); //the outdegree of node int[] successorArray(int node); //the neighbours of node, for the compatibility with webgraph library //the returned array may contain more entries than the outdegree of node, // the successors should be sorted in increasing order. IProperties getProperties(); //some graph properties NODE_NAMER and UNDERLYING_GRAPH. Also see AbstractGraph implementation.

for weighted graphs IArcWeightedGraph has one additional method double[] arcWeights(int node);// the appropriate weight with successorArray;

Note: In BVGNameFilelabels.properties could be written the absolute path to the graph and the name of class which implement Labels. If you have some problem just modify this file manualy.

graphclass = it.unimi.dsi.webgraph.labelling.BitStreamArcLabelledImmutableGraph underlyinggraph = absolute_path//BVGNameFile labelspec = fr.inria.maestro.lga.graph.io.utils.WebGraphLabelsFactory$MyLabel()

class is written in fr.inria.maestro.lga.graph.io.utils.WebGraphLabelsFactory.MyLabel

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LGA

About

Uh oh!

Releases

Packages

Languages

License

sokolm/LGA

Folders and files

Latest commit

History

Repository files navigation

LGA

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages