⚠️ This repository is no longer maintained by Lukas Martinelli.
Tools to find the most frequently used C++ algorithms on Github.
You can look at the results of 3869 analyzed C++ repos in my Google Spreadsheets or use the results.csv directly.
| algorithm | sum | avg |
|---|---|---|
| swap | 108363 | 28 |
| find | 81006 | 21 |
| count | 60306 | 16 |
| move | 57595 | 15 |
| copy | 48050 | 12 |
| sort | 33317 | 9 |
| max | 28848 | 7 |
| equal | 27467 | 7 |
| min | 21720 | 6 |
| unique | 18484 | 5 |
| lower_bound | 15017 | 4 |
| remove | 13972 | 4 |
| replace | 13262 | 3 |
| upper_bound | 11835 | 3 |
| for_each | 11518 | 3 |
##Usage
For best mode you should disable input and output buffering of Python.
export PYTHONUNBUFFERED=true
Analyze the top C++ repos on Github and create a CSV file.
./top-github-repos.py | ./algostat.py | ./create-csv.py > results.csv
Analyze all C++ repos listed in GHTorrent.
cat cpp_repos.txt | ./algostat.py | ./create-csv.py > results.csv
Use a Redis Queue to distribute jobs among workers and then fetch the results.
You need to provide the ALGOSTAT_RQ environment variable to the process with the
address of the redis server.
export ALGOSTAT_RQ_HOST="localhost"
export ALGOSTAT_RQ_PORT="6379"
Now you need to fill the job queue with results from top github repos and repos listed in GHTorrent and sort out the duplicates.
./top-github-repos.py >> jobs.txt
cat cpp_repos.txt >> jobs.txt
sort -u jobs.txt | ./enqueue-jobs.py
On your workers you need to tell algostat.py to fetch the jobs from
a redis queue and then store it in a results queue.
./algostat.py --rq | ./enqueue-results.py
After that you aggregate the results in a single csv.
./fetch-results.py | ./create-csv.py > results.csv
- Make sure you have Python 3 installed
- Clone the repository
- Install requirements with
pip install -r requirements.txt
You can use Docker to run the application in a distributed setup.
Run the redis server.
docker run --name redis -p 6379:6379 -d sameersbn/redis:latest
Get the IP address of your redis server. Assign it to the ALGOSTAT_RQ_HOST env variable for all following docker run commands. In this example we will work with 104.131.5.11.
I have already setup an automated build lukasmartinelli/algostat which you can use.
docker pull lukasmartinelli/algostat
Or you can clone the repo and build the docker image yourself.
docker build -t lukasmartinelli/algostat .
docker run -it --rm --name queue-filler \
-e ALGOSTAT_RQ_HOST=104.131.5.11 \
-e ALGOSTAT_RQ_PORT=6379 \
lukasmartinelli/algostat bash -c "cat cpp_repos.txt | ./enqueue-jobs.py"
Assign as many workers as you like.
docker run -it --rm --name worker1 \
-e ALGOSTAT_RQ_HOST=104.131.5.11 \
-e ALGOSTAT_RQ_PORT=6379 \
lukasmartinelli/algostat bash -c "./algostat.py --rq | ./enqueue-results.py"
Note that this step is not repeatable. Once you've aggregated the results the redis list will be empty.
docker run -it --rm --name result-aggregator \
-e ALGOSTAT_RQ_HOST=104.131.5.11 \
-e ALGOSTAT_RQ_PORT=6379 \
lukasmartinelli/algostat bash -c "./fetch-results.py | ./create-csv.py"