InfraNodus is a AI-powered text network analysis tool. You can use it to reveal patterns in text data.
Here we provide some of the sample datasets you can use to try out various workflows.
This folder contains the data on Google Search volumes for various keywords. Usually you would analyze the column with the keyword combinations to find recurring patterns and use metadata in other columns for filtering (e.g. search volume, difficulty, location, etc.)
- Keyword Stats / google_us_ai-tools_matching-terms_2025-07-08_20-05-24.csv: Google Search volumes for keywords related to "ai tools". Use the matching keyword workflow to see how to analyze this file step by step.
This folder contains samples of open-ended surveys. Usually one or more columns contain the responses while the other columns contain metadata about the survey participants. This metadata can be used for filtering: e.g. what the people from a certain location or background said about a partciular topic or their sentiment.
- Open Ended Surveys / OSMI-2019-Mental-Health-Tech-Modified.csv: Open Source Mental Health Initiative (OSMI) 2019 Mental Health Tech Survey. Use the open-ended survey workflow to see how to analyze this file step by step.
This folder contains samples of listings. Such listings would often contain a column with a title and description of a listing as well as severeal other columns with categories which can be used for filtering.
- Listings / ec_europa_data.csv: European Commission Open Data Portal. Use the listings workflow to see how to analyze this file step by step.
This folder contains network graph data in Gexf format. Gexf is a type of XML that encodes nodes, relations, and related metadata.
-
Diseasosome / diseasosome-diseases.gexf: A network graph of diseases and their connections based on the “Human Disease Network” study, which contains information about the links between the different diseases and associated genes. To simplify, we’ve removed information about the gene associations, keeping only the connections between the different diseases. The diseases are linked together if there’s at least one gene mutation that is correlated with the both diseases. Use the network graph workflow to see how to analyze this file step by step.
-
Related Artists / related-artists.gexf: A network of related classic rock artists extracted from Spotify, provided by Ifeanyi Idiaye. You can see which artists are central to the field (because they are listened to with the most diverse set of artists) and which artists form clusters of interconnected communities.
-
C Elegans / celegans.gexf: C. elegans connectome of neurons. C. elegans is a more or less simple organism. Its adult hermaphrodite form has 302 neurons and this network shows how those neurons are connnected, which are the most central ones, and which form clusters.
-
Yeast / yeast.gexf: a yeast molecular interaction network that shows which proteins are more central, which form clusters, etc.
Also check out our separate archive of network analysis datasets
This folder contains knowledge graphs that show relations between different types of entities.
- Knowledge Graphs / similar-sites.md: a text file that can be uploaded to InfraNodus to analyze similar sites in SEO sphere
This folder contains extracts from various interesting databases. For example, an extract of the research papers titles and abstracts from Arxiv up to 2025.
It also contains a Python script you can freely re-use (MIT license) to filter the long JSON files into shorter versions that can be digested by InfraNodus (up to 10Mb limits).
-
Arxiv Research Papers contains a list of research papers on graphs extracted from https://www.kaggle.com/datasets/Cornell-University/arxiv. You can generate your own extract from the Arxiv file by using our python script filter_graph_papers.py — this script will prompt you for the categories and the keywords to look for in that file. Edit the python script if you'd like to filter a file with a different name, otherwise it will look for the file
arxiv-metadata-oai-snapshot.json
which is the default name of the file provided by Cornell university in their Kaggle dataset archive. -
Visual Text Analysis Companies — this CSV file contains a list of the companies operating in the visual text analysis field, their USPs, strengths and weaknesses, as well as the keywords related to their expertise. Can be used for competitive analysis as described in this tutorial: https://support.noduslabs.com/hc/en-us/articles/22905603668636-Competitive-Analysis-Mapping-How-to-Visualize-Expertise-Networks-and-Find-Strategic-Gaps
-
Trump Administration Personnel — this CSV file contains information about the individuals that are a part of the president Trump's administration, listing their skills, background, affiliation, etc. Can be used for social network analysis as described in this tutorial: https://support.noduslabs.com/hc/en-us/articles/22947832720412-Beyond-Organizational-Skills-Matrix-Social-Expertise-Network-Analysis
All datasets are provided as-is and are subject to the license of the original source.
Try them out with https://infranodus.com.
Use these examples with our InfraNodus tutorials: https://support.noduslabs.com