Scrappy

Extract rich metadata from URLs.

Installation

npm install scrappy --save

Usage

Starting from extractFromUrl, scrappy creates a HTTP request (scrapeUrl) and streams the response into the scraper (scrapeStream). The scraper extracts metadata based on various specifications and standards, including HTML, RDFa, JSON-LD, Microdata, Open Graph and OEmbed. With all the relevant metadata, it uses extract to select the appropriate snippet. If you need snippets in a different format, you can create your own extraction method which accepts the scraped metadata.

import { scrapeUrl, scrapeStream, extract, extractFromUrl } from 'scrappy'

const url = 'https://medium.com/slack-developer-blog/everything-you-ever-wanted-to-know-about-unfurling-but-were-afraid-to-ask-or-how-to-make-your-e64b4bb9254#.a0wjf4ltt'

extractFromUrl(url).then(function (snippet) {
  // {
  //   "type": "summary",
  //   "imageUrl": "https://cdn-images-1.medium.com/max/1200/1*QOMaDLcO8rExD0ctBV3BWg.png",
  //   "contentUrl": "https://medium.com/slack-developer-blog/everything-you-ever-wanted-to-know-about-unfurling-but-were-afraid-to-ask-or-how-to-make-your-e64b4bb9254",
  //   "originalUrl": "https://medium.com/slack-developer-blog/everything-you-ever-wanted-to-know-about-unfurling-but-were-afraid-to-ask-or-how-to-make-your-e64b4bb9254#.a0wjf4ltt",
  //   "encodingFormat": "html",
  //   "headline": "Everything you ever wanted to know about unfurling but were afraid to ask /or/ How to make your… — Slack Platform Blog",
  //   "caption": "Let’s start with the most obvious question first. This is what an “unfurl” is:",
  //   "siteName": "Medium",
  //   "author": "Matt Haughey",
  //   "publisher": "https://www.facebook.com/medium",
  //   "apps": {
  //     "iphone": {
  //       "id": "828256236",
  //       "name": "Medium",
  //       "url": "medium://p/e64b4bb9254"
  //     },
  //     "ipad": {
  //       "id": "828256236",
  //       "name": "Medium",
  //       "url": "medium://p/e64b4bb9254"
  //     },
  //     "android": {
  //       "id": "com.medium.reader",
  //       "name": "Medium",
  //       "url": "medium://p/e64b4bb9254"
  //     }
  //   }
  // }
})

Development

# Build the fixtures directory with raw content.
node scripts/fixtures.js
# Scrape the metadata results from fixtures.
node scripts/scrape.js
# Extract the snippets from the previous results.
node scripts/extract.js

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
scripts		scripts
src		src
test		test
vendor		vendor
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json
tslint.json		tslint.json
typings.json		typings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scrappy

Installation

Usage

Development

License

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

borderless/unfurl

Folders and files

Latest commit

History

Repository files navigation

Scrappy

Installation

Usage

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages