diskover is a file system crawler that indexes your files metadata in Elasticsearch and visualizes your disk usage in Kibana. It crawls and indexes your files on a local computer or remote server using nfs or cifs.
File metadata is bulk added and streamed into Elasticsearch, allowing you to search and visualize your files in Kibana without having to wait until the crawl is finished. diskover is written in Python and runs on Linux, OS X/macOS and Windows.
diskover aims to help manage your storage by identifying old and unused files and give better insights into file duplication and wasted space. It was originally designed for the vfx community to help deal with managing large amounts of data growth.
Kibana dashboards / saved searches and visualizations (included in diskover download)
diskover-web (diskover's web file manager and file system search engine)
Gource visualization support (see videos below)

Linux, OS X/macOS or Windows(tested on OS X 10.11.6, Ubuntu 16.04 and Windows 7)Python 2.7. or Python 3.5.(tested on Python 2.7.10, 2.7.12, 3.5.3)Python elasticsearch client moduleelasticsearch (tested on 5.3.0, 5.4.0)Python requests modulerequestsPython scandir modulescandir (included in Python 3.5.)Elasticsearch(local or AWS ES Service, tested on Elasticsearch 5.3.0, 5.4.2)Kibana(tested on Kibana 5.3.0, 5.4.2)
- diskover-web (diskover's web file manager for searching/tagging files)
- X-Pack (for graphs, reports, monitoring and http auth)
- Gource (for Gource visualizations of diskover Elasticsearch data)
$ git clone https://github.com/shirosaidev/diskover.git
$ cd diskoverYou need to have at least Python 2.7. or Python 3.5. and have installed required Python dependencies using pip.
$ sudo pip install -r requirements.txtStart diskover as root user with:
$ cd /path/you/want/to/crawl
$ sudo python /path/to/diskover.pyFor Windows, run CygWin terminal as administrator and then run diskover.
Defaults for crawl with no flags is to only index files 5+ MB and 30+ days modified time. Use -h to see cli options.
A successfull crawl should look like this:
___ ___ ___ ___ ___ ___ ___ ___
/\ \ /\ \ /\ \ /\__\ /\ \ /\__\ /\ \ /\ \
/::\ \ _\:\ \ /::\ \ /:/ _/_ /::\ \ /:/ _/_ /::\ \ /::\ \
/:/\:\__\ /\/::\__\ /\:\:\__\ /::-"\__\ /:/\:\__\ |::L/\__\ /::\:\__\ /::\:\__\
\:\/:/ / \::/\/__/ \:\:\/__/ \;:;-",-" \:\/:/ / |::::/ / \:\:\/ / \;:::/ /
\::/ / \:\__\ \::/ / |:| | \::/ / L;;/__/ \:\/ / |:\/__/
\/__/ \/__/ \/__/ \|__| \/__/ v1.0.12 \/__/ \|__|
https://github.com/shirosaidev/diskover
2017-05-17 21:17:09,254 [INFO][diskover] Connecting to Elasticsearch
2017-05-17 21:17:09,260 [INFO][diskover] Checking for ES index: diskover-2017.04.22
2017-05-17 21:17:09,262 [WARNING][diskover] ES index exists, deleting
2017-05-17 21:17:09,340 [INFO][diskover] Creating ES index
Crawling: [100%] |########################################| 8570/8570
2017-05-17 21:17:16,972 [INFO][diskover] Finished crawling
2017-05-17 21:17:16,973 [INFO][diskover] Directories Crawled: 8570
2017-05-17 21:17:16,973 [INFO][diskover] Files Indexed: 322
2017-05-17 21:17:16,973 [INFO][diskover] Elapsed time: 7.72081303596
Read the wiki for more documentation on how to use diskover.
For discussions or questions about diskover, please ask on Google Group.
For bugs about diskover, please use the issues page.
See the license file.
