Thanks to visit codestin.com
Credit goes to github.com

Skip to content

❗ This is a read-only mirror of the CRAN R package repository. easyScieloPack — Easy Interface to Search 'SciELO' Database. Homepage: https://github.com/Programa-ISA/easyScieloPack Report bugs for this package: https://github.com/Programa-ISA/easyScieloPack/issues

License

Notifications You must be signed in to change notification settings

cran/easyScieloPack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

easyScieloPak R badge

easyScieloPak is an R package that allows you to search and access academic articles from SciELO programmatically.

Objective

The main goal of easyScieloPak is to simplify the process of querying SciELO from R by:

  • Making queries readable and reproducible.
  • Allowing filters like year, collection (country), language, journal, and subject category.
  • Handling pagination, data parsing, and cleaning automatically.
  • Providing clear and validated feedback when a query is incorrect.
  • Minimizing errors due to anti-scraping measures (e.g., 403 HTTP errors).

Features

  • Build custom search queries for SciELO.
  • Filter results by year, collection, language, and more.
  • Retrieve article metadata including title, authors, publication year, and link.
  • Designed with intuitive syntax.
  • Intelligent request handling to avoid triggering SciELO's anti-bot protection.

Installation

You can install the development version of easyScieloPak from GitHub using either devtools or remotes:

Using devtools

install.packages("devtools") devtools::install_github("https://github.com/PabloIxcamparij/easyScieloPack.git")

Or using remotes

install.packages("remotes") remotes::install_github("https://github.com/PabloIxcamparij/easyScieloPack.git")

library(easyScieloPak)

Create a query

library(easyScieloPak)

df <- search_scielo("salud ambiental", collections = "Ecuador", languages = "es", n_max = 5) head(df)

df <- search_scielo("ecology", collections = "Chile", languages = "en", n_max = 8)

View(df) # View results in RStudio

Current Limitations

  • Each filter only supports one value at a time (e.g., only one country, language, journal, or category).

  • Web scraping may be sensitive to structural changes in the SciELO website.

  • The number of fetched articles is limited by n_max (default fallback is 100).

  • No official API is available, so the package depends on website scraping.

  • Rate-limiting / Blocking (403 errors): In some cases, SciELO may detect automated access and temporarily block the search, resulting in a 403 HTTP error. This is a common limitation of scraping. If this occurs, try the following:

    • Wait a few minutes before retrying.
    • Restart your R session or switch IP/network.
    • Avoid sending too many queries in a short period.

    Note: Reinstalling the package has no direct effect on the block.

-Default fallback limit: If the total number of available results cannot be determined, the query will default to fetching a maximum of 100 articles.

Recent Improvements -Rotating User-Agents: Each request uses a different User-Agent string (Chrome, Firefox, Safari variants) to appear more like a real browser and avoid blocking.

-Random delays between requests reduce server load and minimize scraping detection.

-Retry logic: If a request fails, the package retries automatically with a different User-Agent.

Planned Improvements

The current version of easyScieloPak is fully functional for basic academic exploration through SciELO. However, the following enhancements are planned for future versions:

  • Support for multiple filter values: Currently, each filter (e.g., language, category, journal) only accepts a single value. Future versions aim to support multiple values for broader and more flexible queries (e.g., languages("es", "en", "pt")).

  • Improved scraping resistance: We plan to implement smarter mechanisms to reduce the chances of triggering SciELO's anti-scraping protections (e.g., rotating user agents, request throttling, caching mechanisms).

  • Caching and offline mode: Possibility to cache previous search results locally for offline use or repeated queries.

  • Enhanced error diagnostics: Provide clearer messages and helper functions when 403 or parsing issues occur.

  • Journal/code normalization functions: Automatic mapping of journal names to their normalized internal identifiers.

About SciELO

SciELO is a multidisciplinary open-access platform hosting scientific journals from over 15 countries. It plays a vital role in disseminating research output from Latin America and beyond.

This package provides a lightweight, unofficial method to interact with SciELO’s search interface.

Disclaimer

  • This package is not affiliated with or endorsed by SciELO.
  • It relies on web scraping, and therefore may break if the HTML structure of SciELO changes.
  • Use this tool responsibly, especially for automated queries.
  • Please review SciELO’s terms of use before running large queries: https://www.scielo.org/en/sobre-o-scielo/

Contributing

Feel free to open issues or submit pull requests to improve functionality, usability, or documentation.

About

❗ This is a read-only mirror of the CRAN R package repository. easyScieloPack — Easy Interface to Search 'SciELO' Database. Homepage: https://github.com/Programa-ISA/easyScieloPack Report bugs for this package: https://github.com/Programa-ISA/easyScieloPack/issues

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages