0% found this document useful (0 votes)

110 views54 pages

Howto Serve 2500 Ad Requests / Second

This document summarizes how SAPO serves over 2500 advertisement requests per second. It discusses how the SAPO ads team designs and develops their text ads platform to handle this load. Key points include using techniques like caching, decoupling processes, and horizontal scaling to achieve low latency. The system uses technologies like MySQL, memcached, message queues, and redundancy across servers and databases to ensure high availability, reliability, and scalability under heavy traffic.

Uploaded by

Miguel Mascarenhas Sousa Filipe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODP, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views54 pages

Howto Serve 2500 Ad Requests / Second

Uploaded by

Miguel Mascarenhas Sousa Filipe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODP, PDF, TXT or read online on Scribd

You are on page 1/ 54

How to serve

2500 requests per second

SAPO text Ads

Miguel Mascarenhas Filipe @ codebits, November 2010

Who am I ?

Team lead of
Anúncios Sapo (text ads) team

Team of 5 (+ me) software engineers,

designing & developing the
text ads platform
How to serve VS How we serve

Are we a role-model?

Are there recipes ?

Should there be a how to?

Summary

●
Project Overview
●
Full System Overview
●
Serving Text Ads
●
Latency & Speed
●
Scaling on the Front
●
Backend & backend Services
●
Availability, Reliability & Fault Tolerance
●
Scalability Issues
●
Monitoring & Logging
●
Programming Languages & Technologies
Anúncios SAPO

●
Text Advertisement System
●
Search Engines:
●
BING ( http://www.bing.pt )
●
SAPO's Search ( http://sl.pt )
●
SAPO & Partners' Content Network
●
~200 web sites
●
Impresa, Publico, Cofina, IOL, controlinveste
●
Peaks at 90M ad requests per day, 2500/sec
Serving

Text Ads
Concepts

●
Pay Per Click Business Model
●
CPC - Cost Per Click

●
QPS – Queries Per Second
●
CTR - Click Through Rate
(clicks / impressions)
Serving text-ads ..
Major features:

●
choose & serve ads

●
register requests, impressions, clicks,
conversions

●
maintain user budget up to date

●
Quickly reflect changes in ad pool
Serving text-ads ...
elect_get_ads() {
if( words )
ads = get_ads_keywords()
else {
if (crawled_site)
ads = get_ads_crawler()
else
ads = get_ads_site_keywords()
}
site_ads = get_ads_site_targeting()
merge_ads(ads,site_ads)
}
Serving text-ads ...

Election of ads:

●
requires index
ads['word'] -> [ad1, ad2, ad3..]
●
ads ordered by:
'score' -> f(CTR, CPC, quality)
●
Auction based on
Generalized second-price Auction
Serving text-ads..

Other essential features:

contextualization of webpages/sites

blacklisting of ads per site

reporting information

scoring quality of ads

anti-fraud systems/fraud detectors

LATENCY & SPEED

.. and their importance

Latency

Low latency is required:

Search pages (BING,SAPO,..) have to:
search ads (that's us!)
search results
and merge results together.

«ads added last» - site developers put ad-

request code at the end of the page (last thing
to load, usually)
Latency

Without good latency

ads are slow to appear and
users have moved on...
Latency

Slow ads Low CTR BAD!

Latency has a BIG impact

on REVENUE.
Latency Service Level Agreement

99.9% of reqs under:

150 milliseconds

Average response time is:

20 milliseconds

Never take more than 1 second.

serve blank ads in that case
How to keep low Latency ?

Pre-computing everything is essential

Fast contextualization lookup

Handle lack of information gracefully

(turning essential into optional)
How to keep low Latency ?

Decouple (and postpone) everything

that isn't essential to serve ads

.. such as DB writes & other side effects of

serving ads.
Fast word lookups - LiveDB

Fast word/site lookup(inverted index of ads)

●
cache it in local RAM (memcached)
●
'persistent' backing store is RAM
Fast word lookups - LiveDB

Offline creation of index:

ads['word'] -> [ ad1, ad2, ad3, ad4, ... ]

Lots of details, need to compute additional

information for each tuple: (word, ad, CPC):
CTR, Evaluation Score
Fast word lookups - LiveDB

We choose on MySQL for:

●
fast 'inverted index' creation
(by using Stored procedures & replication)

●
fast index lookup based on the 'fame'
of MySQL speed in simple workloads

●
Replication for free using MySQL's
master-slave replication
Fast word lookups - LiveDB

Workload is almost read-only.

(in fact, we can make it read-only with some tricks)

Storage engines FAST for read-only workloads:

MySQL MEMORY
MySQL MyISAM

Very, very similar

MEMORY has more restrictions & limitations
MySQL MEMORY
Extremely fast lookup.
data is guaranteed to be in RAM (or in swap..)

Benchmarked MySQL Memory engine:

.. avg response time was around 10-20msecs,
..within our needs!

Constraints:
• There is a maximum table size in bytes
• VARCHAR is in fact.. CHAR()
MySQL MyISAM

.. After months in production use,

MEMORY engine proved problematic..

Evaluated MyISAM, did benchmarks:

same speed, lower standard deviation.
Speed

Speed is .. ?

Queries per second ?

Sequential or Concurrently (Throughput) ?

Speed

Speed is ..

Queries per second.

Sequential or Concurrently (Throughput) ?

Throughput is obviously what matters in this

case..
Speed!

avg time is 20msecs = 50 QPS

but... it's totally parallel workload.

And most of the time is IOwait on index
lookup.

1 server cpu can do ~6x this: ~300 QPS

.. current servers: ~1200 QPS ..
Scaling on the Front..

Se scale horizontally because:

●
We can add more Frontends
to handle more QPS

●
We can add more LiveDB slaves
to handle more SQL Queries
Backend

Message queueing system:

SAPO BROKER
Backend Services


'compact' & apply operations to the DBMS


runs anti-fraud system


runs contextualizer & crawler


runs semantic analyser


runs reports & suggestions system
Building the LiveDB

MySQL is the ACID DBMS

MySQL is the non-ACID LiveDB.

Python & Stored Procedures create LiveDB

in a MySQL DBMS slave,

MySQL replication pushes

to the read-only slaves
Availability & Reliability

(no downtime please..)

Reliability & Fault Tolerance

Almost every service/component is redundant.

Currently there are only 2 single points of

failure:

Master DB server

Master LiveDB server

And even if BOTH FAIL,

we keep serving ads...
Reliability & Fault Tolerance

Failure in Master LiveDB server:


We have a hot spare,

Can change master in aprox 5 to 10 minutes

Failure in Master DB:

●
Data starts to pile up on Backend services
●
Backoffices are unable to operate (no new ads)
●
if failure lasts a long time:
we might serve ads without budget
●
Electing a new Master is performed manually
Has happened before, no ad serving downtime.
Scalability Issues

We can scale horizontally in all but two

components currently:


Master DBMS MySQL server
(but we are far from saturating it..)
we currently don't plan to 'solve' this


Master LiveDB server
...
Scalability Issues
Building LiveDB doesn't scale:
●
We build a full new LiveDB everytime
●
It isn't distributed nor is it easily made parallel
●
Time is proportional to nº of active Bids

LiveDB should be updated only with recent changes in ad pool.

Impossible to do with current main DB data model and

with current LiveDB design

We are currently investing heavily on a solution to this,

LiveCouchDB
Monitoring & Logging

(is everything okay?)

Monitoring & Logging

Bad things happen:

Log it, deal with it...

We need to know about it:

monitor logs
trigger alarm if errors on log..
Monitoring & Alarmistics

frontend code failures

intercept error
serve blank ad
log error
trigger alarm
Monitoring & Alarmistics

network failures

reconnect with exponential backoff

log error
trigger alarm ?
Monitoring & Alarmistics

machine failures

replication & redundancy

save state to disk
Monitoring & Alarmistics

software bugs..
bad (or lack of) data
radio silence

log error
trigger alarm
Programming Languages

.. and software used

Programming Languages

Python (backend)
Perl (frontend code)

C (1 app only)
Java (broker & reporting)

PHP (backoffices)

SQL
Javascript
Software used

Linux

memcached

MySQL

squid

nginx
Currently Evaluating

Languages: Technologies:

Hadoop PIG Hadoop

CouchDB
Questions?

Internal Combustion Engine Fundamentals 2nd Edition
94% (17)
Internal Combustion Engine Fundamentals 2nd Edition
426 pages
Project Sample
No ratings yet
Project Sample
84 pages
Assignment JTW115E 2023-2024 v5
No ratings yet
Assignment JTW115E 2023-2024 v5
5 pages
Designing Data Intensive Applications
25% (4)
Designing Data Intensive Applications
61 pages
Building Fast Search Engines
No ratings yet
Building Fast Search Engines
21 pages
Performance Strategies: Alexander Meindl at Drupalcamp Wien 2009
No ratings yet
Performance Strategies: Alexander Meindl at Drupalcamp Wien 2009
22 pages
MySQL Scalability for Developers
No ratings yet
MySQL Scalability for Developers
73 pages
System Design Cheat Sheet
No ratings yet
System Design Cheat Sheet
6 pages
Indexing and Search Engines For The Intranets: by Suvarsha Walters (Suvarsha@ncsi - Iisc.ernet - In)
No ratings yet
Indexing and Search Engines For The Intranets: by Suvarsha Walters (Suvarsha@ncsi - Iisc.ernet - In)
33 pages
Building Scalable Web Architectures: Aaron Bannert
No ratings yet
Building Scalable Web Architectures: Aaron Bannert
74 pages
38_SAE International Journal of Passenger Cars - Mechanical Systems Volume 7 Issue 1 2014 [Doi 10.4271_2014!01!0872] Li, Bin; Yang, Xiaobo; Yang, James -- Tire Model Application and Parameter Identific
No ratings yet
38_SAE International Journal of Passenger Cars - Mechanical Systems Volume 7 Issue 1 2014 [Doi 10.4271_2014!01!0872] Li, Bin; Yang, Xiaobo; Yang, James -- Tire Model Application and Parameter Identific
13 pages
CCI/IOne PHP/DRCP Oracle Open World Presentation
No ratings yet
CCI/IOne PHP/DRCP Oracle Open World Presentation
26 pages
Notes Dual
No ratings yet
Notes Dual
13 pages
Complex Analysis
100% (1)
Complex Analysis
305 pages
Synopsis Online Shopping
0% (2)
Synopsis Online Shopping
13 pages
Mechanical Seals Guide 2002
100% (1)
Mechanical Seals Guide 2002
91 pages
Another MySQL Performance Talk
100% (1)
Another MySQL Performance Talk
35 pages
NoSQL For MySQL
No ratings yet
NoSQL For MySQL
31 pages
High-Value Transaction Processing With MySQL
No ratings yet
High-Value Transaction Processing With MySQL
53 pages
Ex No: 1 Study of Client-Server Based RDBMS, Oodbms and Ordbms
No ratings yet
Ex No: 1 Study of Client-Server Based RDBMS, Oodbms and Ordbms
12 pages
Design For Torsion and Shear According To ACI-318-99
No ratings yet
Design For Torsion and Shear According To ACI-318-99
1 page
Web Scalability - Part - 2
100% (2)
Web Scalability - Part - 2
25 pages
Lamp Technology
100% (1)
Lamp Technology
13 pages
NetSDK Programming Manual
No ratings yet
NetSDK Programming Manual
49 pages
Ac 2005 Scalable We Barch
No ratings yet
Ac 2005 Scalable We Barch
74 pages
Acids Bases and Salts IGCSE
No ratings yet
Acids Bases and Salts IGCSE
22 pages
Web Scalability & Performance Guide
100% (26)
Web Scalability & Performance Guide
189 pages
Lessons Learned Building A Web 2.0 Application Using Mysql
100% (3)
Lessons Learned Building A Web 2.0 Application Using Mysql
50 pages
Elevator Installation Safety Guide
No ratings yet
Elevator Installation Safety Guide
30 pages
LiveJournal's Backend Evolution
100% (7)
LiveJournal's Backend Evolution
49 pages
Linux & Hardware Optimizations for MySQL
No ratings yet
Linux & Hardware Optimizations for MySQL
160 pages
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
100% (10)
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
34 pages
Client/Server Architecture Guide
No ratings yet
Client/Server Architecture Guide
39 pages
Web Databases: Deepen The Web: Thanaa Ghanem Walid Aref
No ratings yet
Web Databases: Deepen The Web: Thanaa Ghanem Walid Aref
5 pages
PHP & MySQL: Server-Side Basics
No ratings yet
PHP & MySQL: Server-Side Basics
3 pages
Vineet Gupta - GM - Software Engineering - Directi: Intelligent People. Uncommon Ideas
No ratings yet
Vineet Gupta - GM - Software Engineering - Directi: Intelligent People. Uncommon Ideas
73 pages
YouTube Architecture Dmvdivc90jj5hh1a9
No ratings yet
YouTube Architecture Dmvdivc90jj5hh1a9
5 pages
Altas Copco FD 230 PDF
No ratings yet
Altas Copco FD 230 PDF
16 pages
Building Scalable Web Architectures: Aaron Bannert
No ratings yet
Building Scalable Web Architectures: Aaron Bannert
75 pages
Search Engine Using Apache Lucene
No ratings yet
Search Engine Using Apache Lucene
5 pages
YouTube Architecture Overview
No ratings yet
YouTube Architecture Overview
25 pages
General Anisotropic Elasticity: Abstract This Chapter Is An Introduction To General Anisotropic Elasticity, I.E. To The
100% (1)
General Anisotropic Elasticity: Abstract This Chapter Is An Introduction To General Anisotropic Elasticity, I.E. To The
56 pages
Untitled
No ratings yet
Untitled
44 pages
Performance Is Overrated - NEDB 2012
100% (2)
Performance Is Overrated - NEDB 2012
44 pages
MySQL Perf Tuning OOW2015 Dim
No ratings yet
MySQL Perf Tuning OOW2015 Dim
141 pages
Fire Protection System
No ratings yet
Fire Protection System
60 pages
General Tests, Processes and Apparatus PDF
No ratings yet
General Tests, Processes and Apparatus PDF
334 pages
S11 - System Architecture
No ratings yet
S11 - System Architecture
79 pages
Search Engine Architecture Guide
No ratings yet
Search Engine Architecture Guide
23 pages
PA 6 Paragon Kiln Prices Jan 2012vb
No ratings yet
PA 6 Paragon Kiln Prices Jan 2012vb
12 pages
CPX27xx-0010: Installation and Operating Instructions - EN
No ratings yet
CPX27xx-0010: Installation and Operating Instructions - EN
39 pages
Computer Science S-75: Building Dynamic Websites
No ratings yet
Computer Science S-75: Building Dynamic Websites
22 pages
F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
No ratings yet
F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
19 pages
Database Security Essentials
100% (1)
Database Security Essentials
53 pages
Sage X3 Technology
No ratings yet
Sage X3 Technology
1 page
Large Datasets in MySQL On Amazon EC2
No ratings yet
Large Datasets in MySQL On Amazon EC2
30 pages
PLC Components & Functions Guide
No ratings yet
PLC Components & Functions Guide
2 pages
Civil Engineering Channel Flow
No ratings yet
Civil Engineering Channel Flow
22 pages
Critical Resistance and Critical Speed For DC Shunt Generator For PDF
No ratings yet
Critical Resistance and Critical Speed For DC Shunt Generator For PDF
10 pages
BIOS Instructor Setup Rev 6 65
No ratings yet
BIOS Instructor Setup Rev 6 65
24 pages
Edexcel IGCSE Chemistry 4CH0 Section B7
No ratings yet
Edexcel IGCSE Chemistry 4CH0 Section B7
2 pages
E-Commerce Tips
No ratings yet
E-Commerce Tips
4 pages
UART Interface Design & UVM Verification
No ratings yet
UART Interface Design & UVM Verification
4 pages
Strain Gauges For Integration in Fiber Composite Materials LI66
No ratings yet
Strain Gauges For Integration in Fiber Composite Materials LI66
2 pages
Polarization Index Value Measurement
No ratings yet
Polarization Index Value Measurement
12 pages
Google App Engine for Developers
No ratings yet
Google App Engine for Developers
14 pages
Chapter 3
No ratings yet
Chapter 3
39 pages
TTH Module 1
No ratings yet
TTH Module 1
4 pages
System Design Notes 1664811186
No ratings yet
System Design Notes 1664811186
24 pages
LDPC Codes
No ratings yet
LDPC Codes
3 pages
1 Intro - Web Application Testing
No ratings yet
1 Intro - Web Application Testing
42 pages
An Introduction To Role Provisioning and De-Provisioning in Oracle Fusion HCM Cloud Application
No ratings yet
An Introduction To Role Provisioning and De-Provisioning in Oracle Fusion HCM Cloud Application
6 pages
L01
No ratings yet
L01
33 pages
YouTube Architecture
No ratings yet
YouTube Architecture
8 pages
(Business Statistics) Chapter 3 Part 1
No ratings yet
(Business Statistics) Chapter 3 Part 1
30 pages
Chap 2
No ratings yet
Chap 2
29 pages
Five Steps Performance Postgres
No ratings yet
Five Steps Performance Postgres
94 pages
VV - IR - UNIT-I - Part2
No ratings yet
VV - IR - UNIT-I - Part2
35 pages
Chapter 6
No ratings yet
Chapter 6
16 pages
NIC Asia Bank Limited
No ratings yet
NIC Asia Bank Limited
49 pages
Designing Scalable Systems - A Guide For Engineers
No ratings yet
Designing Scalable Systems - A Guide For Engineers
56 pages
Basics
No ratings yet
Basics
4 pages

Howto Serve 2500 Ad Requests / Second

Uploaded by

Howto Serve 2500 Ad Requests / Second

Uploaded by

How to serve

2500 requests per second

Miguel Mascarenhas Filipe @ codebits, November 2010

Team of 5 (+ me) software engineers,

Are there recipes ?

Should there be a how to?

Other essential features:

blacklisting of ads per site

scoring quality of ads

anti-fraud systems/fraud detectors

.. and their importance

Low latency is required:

«ads added last» - site developers put ad-

Without good latency

Slow ads Low CTR BAD!

Latency has a BIG impact

99.9% of reqs under:

Average response time is:

Never take more than 1 second.

Pre-computing everything is essential

Fast contextualization lookup

Handle lack of information gracefully

Decouple (and postpone) everything

.. such as DB writes & other side effects of

Fast word/site lookup(inverted index of ads)

Offline creation of index:

Lots of details, need to compute additional

We choose on MySQL for:

Workload is almost read-only.

Storage engines FAST for read-only workloads:

Very, very similar

Benchmarked MySQL Memory engine:

.. After months in production use,

Evaluated MyISAM, did benchmarks:

Queries per second ?

Sequential or Concurrently (Throughput) ?

Queries per second.

Sequential or Concurrently (Throughput) ?

Throughput is obviously what matters in this

avg time is 20msecs = 50 QPS

but... it's totally parallel workload.

1 server cpu can do ~6x this: ~300 QPS

Se scale horizontally because:

Message queueing system:

MySQL is the ACID DBMS

Python & Stored Procedures create LiveDB

MySQL replication pushes

(no downtime please..)

Almost every service/component is redundant.

Currently there are only 2 single points of

And even if BOTH FAIL,

Failure in Master LiveDB server:

Failure in Master DB:

We can scale horizontally in all but two

LiveDB should be updated only with recent changes in ad pool.

Impossible to do with current main DB data model and

We are currently investing heavily on a solution to this,

(is everything okay?)

Bad things happen:

We need to know about it:

frontend code failures

reconnect with exponential backoff

replication & redundancy

.. and software used

Hadoop PIG Hadoop

You might also like