Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Battle-tested multi-currency price extraction from text. Supports PLN, EUR, GBP, USD with Polish and English number formats.

License

Notifications You must be signed in to change notification settings

justi/price_scanner

Repository files navigation

PriceScanner

Battle-tested multi-currency price extraction from text. Supports PLN, EUR, GBP, USD with Polish and English number formats.

Installation

gem "price_scanner"

Usage

Parse a single price

PriceScanner.parse("1.299,00 zł")
# => { amount: 1299.0, currency: "PLN", text: "1.299,00 zł" }

PriceScanner.parse("£49.99")
# => { amount: 49.99, currency: "GBP", text: "£49.99" }

Extract all prices from text

PriceScanner.scan("Was £49.99 Now £29.99")
# => [{ amount: 49.99, currency: "GBP", text: "£49.99" },
#     { amount: 29.99, currency: "GBP", text: "£29.99" }]

Check if text contains a price

PriceScanner.contains_price?("Only 99,00 zł")  # => true
PriceScanner.contains_price?("No price here")   # => false

Advanced API

For finer control, use Detector and Parser modules directly.

Detect prices in text

PriceScanner::Detector.contains_price?("see price: 49,00 zł")  # => true

PriceScanner::Detector.extract_prices_from_text("Was 49,00 zł, now 29,00 zł")
# => [{ text: "49,00 zł", value: 49.0, position: 4 },
#     { text: "29,00 zł", value: 29.0, position: 18 }]

PriceScanner::Detector::PRICE_PATTERN  # => Regexp matching prices

Parse and normalize prices

PriceScanner::Parser.normalized_price("1.299,00 zł")  # => 1299.0
PriceScanner::Parser.normalized_price("$49.99")        # => 49.99

PriceScanner::Parser.extract_currency("49,00 zł")  # => "PLN"
PriceScanner::Parser.extract_currency("€120")      # => "EUR"

Strip price mentions from text

PriceScanner::Parser.strip_price_mentions("Buy for 49,00 zł or 59,00 zł", "49,00 zł", "59,00 zł")
# => "Buy for or"

Build a regex for a specific price value

PriceScanner::Parser.price_regex_from_value("1.299,00 zł")
# => Regexp matching variations like "1 299,00 zł", "1299,00zł", etc.

Supported currencies

Currency Symbol Code Example input Parsed
PLN , zl PLN 1.299,00 zł 1299.0
EUR EUR €99,00 99.0
USD $ USD $1,019.00 1019.0
GBP £ GBP £49.99 49.99

Currency symbols and codes are matched case-insensitively (pln, PLN, Pln all work).

Supported number formats

Format Example Parsed
Dot thousands, comma decimal (Polish) 1.299,00 zł 1299.0
Space thousands, comma decimal 1 299,00 zł 1299.0
NBSP thousands, comma decimal 1\u00a0299,00 zł 1299.0
Comma thousands, dot decimal (English) $1,299.00 1299.0
No thousands separator 799,00 zł 799.0
Integer (no decimals) £150 150.0
Currency before amount zł 248,86 248.86

Smart filtering

Prices that match the following patterns are automatically excluded from results:

Negative prices

Prices preceded by - or (U+2212) are treated as discount badges and filtered out.

PriceScanner.scan("449,00 zł -100,00 zł 349,00 zł")
# => [{ amount: 449.0, ... }, { amount: 349.0, ... }]
# -100,00 zł is excluded

Price ranges

Two prices connected by an en-dash (, ) or spaced hyphen (-) are recognized as a range and both are removed.

PriceScanner.scan("Size S–XL, £3.29 – £92.71, buy now for £49.99")
# => [{ amount: 49.99, currency: "GBP", text: "£49.99" }]
# range £3.29 – £92.71 is excluded

Savings amounts

When 3+ prices are detected and one equals the difference between two others (within ±2% tolerance), the savings amount is removed.

PriceScanner.scan("Was 449,00 zł, now 349,00 zł. You save 100,00 zł!")
# => [{ amount: 449.0, ... }, { amount: 349.0, ... }]
# 100,00 zł is excluded (449 - 349 = 100)

Per-unit prices

Prices followed by a unit indicator are filtered out.

Supported units: kg, g, mg, l, ml, szt, m, , , cm, mm, op, opak, pcs, pc, unit, each, ea, kaps, tabl, tab

Recognized prefixes: / (slash) and za (Polish "per").

PriceScanner.scan("32,74 zł/kg — buy 500g for 16,37 zł")
# => [{ amount: 16.37, currency: "PLN", text: "16,37 zł" }]
# 32,74 zł/kg is excluded

Deduplication

If the same price value appears multiple times, only one occurrence is kept.

Features

  • Zero dependencies
  • Case-insensitive currency matching
  • Handles regular spaces, non-breaking spaces (NBSP), and mixed whitespace
  • Tracks position of each price in the source text
  • Ignores letter-preceded numbers to avoid false positives from product codes (e.g. DKA2zł)

Used by

  • snipe.sale — price tracking service processing thousands of product pages daily

License

MIT License. See LICENSE.

About

Battle-tested multi-currency price extraction from text. Supports PLN, EUR, GBP, USD with Polish and English number formats.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages