Unfortunately, Google Scholar does not support exporting results... I needed the most cited papers for a research project, and after trying an imperfect script I decided to write my own.
Important note: The spiders don't send more than 2 requests per second to Google Scholar. The reason is that we don't like to solve the CAPTCHA, so it's better to wait a little and acting like a human. Changing IP address sometimes is a good idea... 😩
- Supports multiple languages
- Customizable date range
- Sorts by number of citations
- Sorts by year
- Searches for articles
- Searches for case law
- Searches in a profile by ID
- Graphical interface
Install the dependencies:
pip install -r requirements.txtRun the scraper just by typing the keyword:
python core.py "cryptography"Customize the date range:
python core.py "metaverse" -s 1997 -e 2018Limit the languages to one or more:
python core.py "medical" -l en es zh-tw frSet the output file path:
python core.py "machine learning" -s 2002 -o exports/most_cited_ml_articles_since_2002.csvSort the output by year:
python core.py "oceanography" -ySearch for case law:
python core.py "privacy" -cGet a specific profile articles by the user ID:
python core.py "nms69lqaaaaj" -p -o jeff_dean_articles.csvMake the program quiet:
python core.py "philosophy" -e 1234 -qHere is some example exports to see if the scraper meets your needs or not!
This project is licensed under the MIT license found in the LICENSE file in the root directory of this repository.