Brief History of Web Scraping

Web scraping originated in 1989 with the creation of the World Wide Web by Tim Berners-Lee, initially aimed at facilitating information sharing among scientists. Over the years, it evolved from basic web crawling tools like the Wanderer and JumpStation to more sophisticated software like BeautifulSoup and visual web scrapers, enabling users to extract data easily. Today, web scraping is a vital method for businesses to gain competitive advantages and is expected to continue growing alongside advancements in technology and data accessibility.

Uploaded by

Jimmy Teng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views3 pages

Brief History of Web Scraping

Uploaded by

Jimmy Teng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Brief History of Web Scraping

May 14, 2021

Data, web scraping








Web scraping is becoming a more widely known term. Most associate it with web data
extraction, the most efficient and the simplest way of copying large chunks of information
online; however, did you know that web scraping was born for a completely different purpose
and it took almost two decades for it to transform into web scraping we are familiar with
now?

Here is the timeline:

The birth of the World Wide Web

The origins of very basic web scraping can be dated back to 1989 when a British scientist
Tim Berners-Lee created the World Wide Web. Originally the idea was to have a platform
where information could be automatically shared between scientists in universities and
institutes all around the world. However, with the World Wide Web came three very
important features that are the key elements for every web scraping tool nowadays:

 the URLs which we now use to designate a scraper to a specific website,

 embedded hyperlinks that allow us to navigate through the designated website,
 and web pages that contained various types of data - text, images, audios, videos, etc.

First web browser

Continuing his work, two years later, Tim Berners-Lee created the very first web browser, an
http:// web page, all run on a server from his NeXT computer, giving a way for people to
access and interact with the World Wide Web.

The Wanderer
Time-wise not much apart, in 1993, the first concept of crawling was born. The Wanderer,
more precisely - the World Wide Web Wanderer developed by Matthew Gray at the
Massachusetts Institute of Technology was a first of its kind, Perl-based web crawler whose
sole purpose was to measure out the size of the web. The same year, the Wanderer was used
to generate an index called the Wandex. Even though the author does not claim it, the
Wanderer with Wandex had the potential to become the first general-purpose World Wide
Web search engine.

JumpStation
However, the same year, 1993, the technology that laid grounds for big names such as
Google, Bing, Yahoo, and other search tools on the web today - JumpStation was born and
became the actual very first crawler-based web search engine. With it, millions of web pages
indexed - the internet turned into an open-source platform of data in various forms.

BeautifulSoup
A bit more than a decade later, in 2004, came BeautifulSoup - HTML parser, a library of
commonly used algorithms written in Python programming language. BeautifulSoup helped
to grasp the sense of site structure and parse the contents within the HTML containers;
therefore, saving hours of work for programmers. And since the internet had become this
immense source of information that anyone with a computer and internet connection had
access to, as well as it being easily searchable, people had started to take advantage of this by
extracting the information available to them. For some time websites did not prohibit the
ability to download the content of their sites; however, slowly that changed, and for the
amount of data that was getting downloaded - simply manually copy-pasting was not an
option; therefore, other ways of obtaining the information was bound to be developed.

Rise of visual web scrapers

Soon after, web scraping as we know it was born. The visual web scraping software Web
Integration Platform version 6.0 which was launched by Stefan Andresen, allowed users to
highlight the necessary information of a web page and structure that data into a usable excel
file, or database which provided an opportunity for non-programmers to join and easily
extract data from the web.

Nowadays, as technologies and industries progress, companies are looking to gain an

advantage over their competition. And, due to the fact, that the amount of information
available on the internet is growing exponentially, Web scraping is becoming one of the most
prominent and widely-used methods of acquiring data at scale across various industries and
business spheres

Future of web scraping

Web scraping has grown immensely in recent years, and almost guaranteed to continue
upward growth. Currently, the commercial web scraping scene is mostly for gaining a
competitive advantage by collecting leads, scraping competitors, price monitoring, etc.
However, as technology develops, such as Artificial Intelligence, and data becomes even
more accessible and crucial to different aspects of life, web scraping will advance with it and
produce various new and remarkable applications that we are only looking forward to
experimenting with.

Woman-Centered Coaching Blueprint - Workshop 3 - Handout
No ratings yet
Woman-Centered Coaching Blueprint - Workshop 3 - Handout
14 pages
R3 - To Build A Fire
100% (1)
R3 - To Build A Fire
20 pages
Operation Strategy
100% (1)
Operation Strategy
22 pages
Web Crawler A Review
No ratings yet
Web Crawler A Review
5 pages
Web Crawlers: History & Function
No ratings yet
Web Crawlers: History & Function
3 pages
Unit 1: Search Engine Optimisation
No ratings yet
Unit 1: Search Engine Optimisation
10 pages
Research On Redrawing The Tag Base Search Model On The Deep Invisible Web
No ratings yet
Research On Redrawing The Tag Base Search Model On The Deep Invisible Web
6 pages
Nayak (2022) - A Study On Web Scraping
No ratings yet
Nayak (2022) - A Study On Web Scraping
3 pages
Effective Web Crawler Strategies
No ratings yet
Effective Web Crawler Strategies
3 pages
Websearch
No ratings yet
Websearch
21 pages
Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application
No ratings yet
Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application
25 pages
Web Crawler A Review
No ratings yet
Web Crawler A Review
6 pages
Web Crawlers & Hyperlink Analysis
No ratings yet
Web Crawlers & Hyperlink Analysis
50 pages
RRIOC 11 1 Gheorghe
No ratings yet
RRIOC 11 1 Gheorghe
13 pages
Web Crawler Types and Functions
No ratings yet
Web Crawler Types and Functions
8 pages
Module 1
No ratings yet
Module 1
53 pages
WWW, TCP Ip, Url, Isdn
No ratings yet
WWW, TCP Ip, Url, Isdn
7 pages
Web Search Engines: A Brief History
No ratings yet
Web Search Engines: A Brief History
5 pages
Understanding Internet
No ratings yet
Understanding Internet
13 pages
History of Web
No ratings yet
History of Web
6 pages
Web Mining
No ratings yet
Web Mining
71 pages
Crawler: 1.0 Introduction
No ratings yet
Crawler: 1.0 Introduction
12 pages
It File
No ratings yet
It File
51 pages
EDS WebCrawlerArchitecture
No ratings yet
EDS WebCrawlerArchitecture
3 pages
Module 4
No ratings yet
Module 4
14 pages
@7724353 PDF
No ratings yet
@7724353 PDF
5 pages
A Dive Into Web Scraper World
100% (1)
A Dive Into Web Scraper World
5 pages
Web Mining Unit-1
No ratings yet
Web Mining Unit-1
26 pages
History of The World Wide Web
No ratings yet
History of The World Wide Web
4 pages
Web Search. Web Spidering
No ratings yet
Web Search. Web Spidering
44 pages
Web Scraping: Legal and Ethical Insights
No ratings yet
Web Scraping: Legal and Ethical Insights
7 pages
Unit-1 Upto HTML Tags
No ratings yet
Unit-1 Upto HTML Tags
36 pages
Search Engine Basics
No ratings yet
Search Engine Basics
31 pages
Distil Networks Ebook Web Scraping
0% (1)
Distil Networks Ebook Web Scraping
19 pages
Web Search Engines: Practice and Experience: Content Analysis Query Prcessing Search Log
No ratings yet
Web Search Engines: Practice and Experience: Content Analysis Query Prcessing Search Log
21 pages
Web Crawling State of ArtTechniques ApproachesandApplication
No ratings yet
Web Crawling State of ArtTechniques ApproachesandApplication
26 pages
World Wide Web
No ratings yet
World Wide Web
4 pages
Semantic Web Unit - 1 & 2
No ratings yet
Semantic Web Unit - 1 & 2
16 pages
Web Scraping
No ratings yet
Web Scraping
4 pages
Brief Introduction On Working of Web Crawler: Rishika Gour Prof. Neeranjan Chitare
No ratings yet
Brief Introduction On Working of Web Crawler: Rishika Gour Prof. Neeranjan Chitare
4 pages
On Internet
No ratings yet
On Internet
38 pages
Web Design
No ratings yet
Web Design
12 pages
20 - 3 - A Study
No ratings yet
20 - 3 - A Study
5 pages
10.1007@s11280 018 0602 1
No ratings yet
10.1007@s11280 018 0602 1
34 pages
Scraping
100% (1)
Scraping
25 pages
1.2 A Brief History of The Web and The Internet
No ratings yet
1.2 A Brief History of The Web and The Internet
6 pages
Miss Anns
No ratings yet
Miss Anns
14 pages
History and Evolution of The Web
No ratings yet
History and Evolution of The Web
7 pages
WT HTML
No ratings yet
WT HTML
39 pages
Unit I
No ratings yet
Unit I
12 pages
Explores The Ways of Usage of Web Crawler in Mobile Systems
No ratings yet
Explores The Ways of Usage of Web Crawler in Mobile Systems
5 pages
Chapter-1 Basics of Internet
No ratings yet
Chapter-1 Basics of Internet
20 pages
Pptinternet Basics
No ratings yet
Pptinternet Basics
23 pages
Abstract: YSPM'S YTC, Faculty of MCA, Satara. 1
No ratings yet
Abstract: YSPM'S YTC, Faculty of MCA, Satara. 1
15 pages
Web Assignment
No ratings yet
Web Assignment
16 pages
World Wide Web
No ratings yet
World Wide Web
23 pages
UNIT1
No ratings yet
UNIT1
37 pages
History of The World Wide Web
No ratings yet
History of The World Wide Web
2 pages
TheInternationalSystemLevelofAnalysis Elmira
No ratings yet
TheInternationalSystemLevelofAnalysis Elmira
4 pages
Reading Report #3 Inkar
No ratings yet
Reading Report #3 Inkar
10 pages
US - Hegemony - Report Nizor Comments
No ratings yet
US - Hegemony - Report Nizor Comments
4 pages
Student Nazerke Abuova FL 11, 12
No ratings yet
Student Nazerke Abuova FL 11, 12
6 pages
PARUKH GULIM Paquin CH 4, SN CH 3
No ratings yet
PARUKH GULIM Paquin CH 4, SN CH 3
10 pages
TheDyadicLevelofAnalysis, PartI - TheNatureofDyads-ReallyBadDyads - 2 Elmira
No ratings yet
TheDyadicLevelofAnalysis, PartI - TheNatureofDyads-ReallyBadDyads - 2 Elmira
5 pages
Gulazor - Report 1 Comments
No ratings yet
Gulazor - Report 1 Comments
5 pages
Reading Report 1 On Paquin Chapter 4, SN Chapter 3 Temirlan
No ratings yet
Reading Report 1 On Paquin Chapter 4, SN Chapter 3 Temirlan
7 pages
RDR Dilnaz Comments
No ratings yet
RDR Dilnaz Comments
7 pages
Reading Report #2 Inkar
No ratings yet
Reading Report #2 Inkar
9 pages
Chapter Five Selected Topics Ayazhan
No ratings yet
Chapter Five Selected Topics Ayazhan
4 pages
Chapter Six - Madina
No ratings yet
Chapter Six - Madina
8 pages
International Political Economy Ayazhan
No ratings yet
International Political Economy Ayazhan
6 pages
Dilnaz Ruslanova (Report) Comments
No ratings yet
Dilnaz Ruslanova (Report) Comments
4 pages
Chapter 12, 13 Parukh Gulim
No ratings yet
Chapter 12, 13 Parukh Gulim
4 pages
Bad Samaritans - Aiym, Dilrabo, Mika
No ratings yet
Bad Samaritans - Aiym, Dilrabo, Mika
15 pages
Book Review Allison-Darya V
No ratings yet
Book Review Allison-Darya V
3 pages
Chapter 9 Report Aigerim
No ratings yet
Chapter 9 Report Aigerim
10 pages
Book Review Buchanan Darya V
No ratings yet
Book Review Buchanan Darya V
3 pages
Aripbaeva Elmira CH 2-3 FL Comments
No ratings yet
Aripbaeva Elmira CH 2-3 FL Comments
2 pages
Alana Clark
No ratings yet
Alana Clark
3 pages
Assignment 1
No ratings yet
Assignment 1
1 page
Alana Freakonomics
No ratings yet
Alana Freakonomics
3 pages
Acemoglu Zhaniya
No ratings yet
Acemoglu Zhaniya
2 pages
Lecture Set 1
No ratings yet
Lecture Set 1
52 pages
The Rise of Western World - Mika, Aiym, Dilrabo
No ratings yet
The Rise of Western World - Mika, Aiym, Dilrabo
12 pages
Gulim Ir Lynn
No ratings yet
Gulim Ir Lynn
4 pages
Chatgptforresearchguide
No ratings yet
Chatgptforresearchguide
13 pages
Introduction to Computational Social Science
No ratings yet
Introduction to Computational Social Science
43 pages
Moving Beyond Simple Experiments
No ratings yet
Moving Beyond Simple Experiments
24 pages
The Empathetic School
100% (1)
The Empathetic School
9 pages
Disorders of The Thyroid Gand
No ratings yet
Disorders of The Thyroid Gand
167 pages
Cleaning Validation MACO Swab Rinse Ovais v1.1
No ratings yet
Cleaning Validation MACO Swab Rinse Ovais v1.1
8 pages
Embankment Design Basic Nov20
No ratings yet
Embankment Design Basic Nov20
83 pages
Chapter 1 SAD
No ratings yet
Chapter 1 SAD
8 pages
DLL Speech Style
100% (1)
DLL Speech Style
2 pages
Sample ICT Action Plan
100% (2)
Sample ICT Action Plan
2 pages
Why Weightlifting Is Superior
No ratings yet
Why Weightlifting Is Superior
4 pages
Funk MMQ 30 Days
100% (1)
Funk MMQ 30 Days
34 pages
Anchoring Script For Sports Day
No ratings yet
Anchoring Script For Sports Day
17 pages
Grade 9 Chapter 10 Review Exercise
No ratings yet
Grade 9 Chapter 10 Review Exercise
6 pages
Research Paper 2 Group 3 Watson
No ratings yet
Research Paper 2 Group 3 Watson
6 pages
P 1515 - Design and Contstruction of Anchored and Strutted Sheet Pile Walls Iin Soft Clay PDF
No ratings yet
P 1515 - Design and Contstruction of Anchored and Strutted Sheet Pile Walls Iin Soft Clay PDF
36 pages
MITinformation Brochure 2 June 2023
No ratings yet
MITinformation Brochure 2 June 2023
18 pages
Getting Started With Excel: Comprehensive
0% (1)
Getting Started With Excel: Comprehensive
10 pages
Lecture O03: ENGR90024 Computational Fluid Dynamics
No ratings yet
Lecture O03: ENGR90024 Computational Fluid Dynamics
43 pages
Aspiring Entrepreneur's CV
No ratings yet
Aspiring Entrepreneur's CV
4 pages
Blower & Vacuum Pump: IRS-32A・IRS-40A・IRS-50H/L・IRS-65H/L IRS-80H/L・IRS-100L・IRS-125R/L・IRS-150R/L
No ratings yet
Blower & Vacuum Pump: IRS-32A・IRS-40A・IRS-50H/L・IRS-65H/L IRS-80H/L・IRS-100L・IRS-125R/L・IRS-150R/L
68 pages
The World During Rizal's Time PDF
No ratings yet
The World During Rizal's Time PDF
29 pages
MSDS Pigment Yellow 14
No ratings yet
MSDS Pigment Yellow 14
3 pages
For Green Marketing Project
No ratings yet
For Green Marketing Project
16 pages
Business Plan Zulkifli Collection
No ratings yet
Business Plan Zulkifli Collection
58 pages
CSF Anatomy & Physiology
No ratings yet
CSF Anatomy & Physiology
20 pages
Lesson 5 Freedom of The Human Person
No ratings yet
Lesson 5 Freedom of The Human Person
16 pages
Gotaq QPCR Master Mix Quick Protocol
No ratings yet
Gotaq QPCR Master Mix Quick Protocol
1 page
Marine Crane Failure Analysis
100% (1)
Marine Crane Failure Analysis
27 pages
Assignment MHDD 160
No ratings yet
Assignment MHDD 160
2 pages