Thanks to visit codestin.com
Credit goes to github.com

Skip to content

dixslyf/susm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

susm: A scraper for discovering and unpacking source maps

susm recursively crawls a website, following HTML links, scripts, stylesheets and sitemaps. For each file it encounters that references or includes a source map (e.g., JavaScript bundles, CSS files), it attempts to locate and download that map. susm then attempts to extract any source code files and write them to disk, preserving their relative paths as defined in the map.

Building

First, clone the repository:

git clone https://github.com/dixslyf/susm.git
cd susm

To build the scraper, run:

cargo build --release

The compiled binary will be available at target/release/susm (assuming Cargo's default target directory).

Nix

This repository provides a Nix flake.

To build the scraper with Nix, run:

nix build github:dixslyf/susm

To run the scraper:

nix run github:dixslyf/susm

Usage

susm has two primary modes of operation:

  • Crawl a website and unpack discovered source maps:

    susm site <URL> [OPTIONS]
  • Unpack a single local source map file:

    susm file <PATH> [OPTIONS]

For additional options, run:

susm --help

Rate Limiting and robots.txt

susm applies a polite crawling policy by default.

Requests are rate-limited per host to avoid overloading servers. By default, susm waits 500 milliseconds between requests with a slight random jitter. The request interval can be adjusted with the --request-interval (-i) flag.

susm also respects the robots.txt exclusion standard. Before crawling, it retrieves and parses the site’s robots.txt file (if present) and skips any paths disallowed for its user agent.

About

A scraper for discovering and unpacking source maps.

Topics

Resources

License

Stars

Watchers

Forks