Thanks to visit codestin.com
Credit goes to github.com

Skip to content

infranodus/datasets

Repository files navigation

Sample Datasets for Data and Text Analysis

InfraNodus is a AI-powered text network analysis tool. You can use it to reveal patterns in text data.

Here we provide some of the sample datasets you can use to try out various workflows.

InfraNodus network graph

Keyword Stats Datasets

This folder contains the data on Google Search volumes for various keywords. Usually you would analyze the column with the keyword combinations to find recurring patterns and use metadata in other columns for filtering (e.g. search volume, difficulty, location, etc.)

Open-Ended Survey Datasets

This folder contains samples of open-ended surveys. Usually one or more columns contain the responses while the other columns contain metadata about the survey participants. This metadata can be used for filtering: e.g. what the people from a certain location or background said about a partciular topic or their sentiment.

Listing Datasets

This folder contains samples of listings. Such listings would often contain a column with a title and description of a listing as well as severeal other columns with categories which can be used for filtering.

Network Graphs Datasets

This folder contains network graph data in Gexf format. Gexf is a type of XML that encodes nodes, relations, and related metadata.

  • Diseasosome / diseasosome-diseases.gexf: A network graph of diseases and their connections based on the “Human Disease Network” study, which contains information about the links between the different diseases and associated genes. To simplify, we’ve removed information about the gene associations, keeping only the connections between the different diseases. The diseases are linked together if there’s at least one gene mutation that is correlated with the both diseases. Use the network graph workflow to see how to analyze this file step by step.

  • Related Artists / related-artists.gexf: A network of related classic rock artists extracted from Spotify, provided by Ifeanyi Idiaye. You can see which artists are central to the field (because they are listened to with the most diverse set of artists) and which artists form clusters of interconnected communities.

  • C Elegans / celegans.gexf: C. elegans connectome of neurons. C. elegans is a more or less simple organism. Its adult hermaphrodite form has 302 neurons and this network shows how those neurons are connnected, which are the most central ones, and which form clusters.

  • Yeast / yeast.gexf: a yeast molecular interaction network that shows which proteins are more central, which form clusters, etc.

Also check out our separate archive of network analysis datasets

Knowledge Graph Datasets

This folder contains knowledge graphs that show relations between different types of entities.

Datasets Extracted from Databases

This folder contains extracts from various interesting databases. For example, an extract of the research papers titles and abstracts from Arxiv up to 2025.

It also contains a Python script you can freely re-use (MIT license) to filter the long JSON files into shorter versions that can be digested by InfraNodus (up to 10Mb limits).

License

All datasets are provided as-is and are subject to the license of the original source.

Try them out with https://infranodus.com.

Use these examples with our InfraNodus tutorials: https://support.noduslabs.com

About

Sample open survey responses, keyword data, and network data datasets to analyze with InfraNodus

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages