1
- # ** Python Cache: How to Speed Up Your Code with Effective Caching**
1
+ # Python Cache: How to Speed Up Your Code with Effective Caching
2
2
3
3
This article will show you how to use caching in Python with your web
4
4
scraping tasks. You can read the [ <u >full
5
5
article</u >] ( https://oxylabs.io/blog/python-cache-how-to-use-effectively )
6
6
on our blog, where we delve deeper into the different caching
7
7
strategies.
8
8
9
- ## ** How to implement a cache in Python**
9
+ ## How to implement a cache in Python
10
10
11
11
There are different ways to implement caching in Python for different
12
12
caching strategies. Here we’ll see two methods of Python caching for a
13
13
simple web scraping example. If you’re new to web scraping, take a look
14
14
at our [ <u >step-by-step Python web scraping
15
15
guide</u >] ( https://oxylabs.io/blog/python-web-scraping ) .
16
16
17
- ### ** Install the required libraries**
17
+ ### Install the required libraries
18
18
19
19
We’ll use the [ <u >requests
20
20
library</u >] ( https://pypi.org/project/requests/ ) to make HTTP requests
21
21
to a website. Install it with
22
22
[ <u >pip</u >] ( https://pypi.org/project/pip/ ) by entering the following
23
23
command in your terminal:
24
24
25
+ ``` bash
25
26
python -m pip install requests
27
+ ```
26
28
27
- Other libraries we’ll use in this project, specifically time and
28
- functools, come natively with Python 3.11.2, so you don’t have to
29
+ Other libraries we’ll use in this project, specifically ` time ` and
30
+ ` functools ` , come natively with Python 3.11.2, so you don’t have to
29
31
install them.
30
32
31
- ### ** Method 1: Python caching using a manual decorator**
33
+ ### Method 1: Python caching using a manual decorator
32
34
33
35
A [ <u >decorator</u >] ( https://peps.python.org/pep-0318/ ) in Python is a
34
36
function that accepts another function as an argument and outputs a new
@@ -42,143 +44,116 @@ saving them in the cache for future use.
42
44
Let’s start by creating a simple function that takes a URL as a function
43
45
argument, requests that URL, and returns the response text:
44
46
47
+ ``` python
45
48
def get_html_data (url ):
46
-
47
- response = requests.get(url)
48
-
49
- return response.text
49
+ response = requests.get(url)
50
+ return response.text
51
+ ```
50
52
51
53
Now, let's move toward creating a memoized version of this function:
52
54
55
+ ``` python
53
56
def memoize (func ):
57
+ cache = {}
54
58
55
- cache = {}
56
-
57
- def wrapper(\* args):
58
-
59
- if args in cache:
60
-
61
- return cache\[ args\]
62
-
63
- else:
64
-
65
- result = func(\* args)
59
+ def wrapper (* args ):
60
+ if args in cache:
61
+ return cache[args]
62
+ else :
63
+ result = func(* args)
64
+ cache[args] = result
65
+ return result
66
66
67
- cache \[ args \] = result
67
+ return wrapper
68
68
69
- return result
70
-
71
- return wrapper
72
69
73
70
@memoize
74
-
75
71
def get_html_data_cached (url ):
72
+ response = requests.get(url)
73
+ return response.text
74
+ ```
76
75
77
- response = requests.get(url)
78
-
79
- return response.text
80
-
81
- The wrapper function determines whether the current input arguments have
76
+ The ` wrapper ` function determines whether the current input arguments have
82
77
been previously cached and, if so, returns the previously cached result.
83
78
If not, the code calls the original function and caches the result
84
- before being returned. In this case, we define a memoize decorator that
85
- generates a cache dictionary to hold the results of previous function
79
+ before being returned. In this case, we define a ` memoize ` decorator that
80
+ generates a ` cache ` dictionary to hold the results of previous function
86
81
calls.
87
82
88
- By adding @memoize above the function definition, we can use the memoize
89
- decorator to enhance the get_html_data function. This generates a new
90
- memoized function that we’ve called get_html_data_cached. It only makes
83
+ By adding ` @memoize ` above the function definition, we can use the memoize
84
+ decorator to enhance the ` get_html_data ` function. This generates a new
85
+ memoized function that we’ve called ` get_html_data_cached ` . It only makes
91
86
a single network request for a URL and then stores the response in the
92
87
cache for further requests.
93
88
94
- Let’s use the time module to compare the execution speeds of the
95
- get_html_data function and the memoized get_html_data_cached function:
89
+ Let’s use the ` time ` module to compare the execution speeds of the
90
+ ` get_html_data ` function and the memoized ` get_html_data_cached ` function:
96
91
92
+ ``` python
97
93
import time
98
94
99
- start_time = time.time()
100
95
96
+ start_time = time.time()
101
97
get_html_data(' https://books.toscrape.com/' )
102
-
103
98
print (' Time taken (normal function):' , time.time() - start_time)
104
99
105
- start_time = time.time()
106
100
101
+ start_time = time.time()
107
102
get_html_data_cached(' https://books.toscrape.com/' )
108
-
109
- print('Time taken (memoized function using manual decorator):',
110
- time.time() - start_time)
103
+ print (' Time taken (memoized function using manual decorator):' , time.time() - start_time)
104
+ ```
111
105
112
106
Here’s what the complete code looks like:
113
107
114
- \# Import the required modules
115
-
108
+ ``` python
109
+ # Import the required modules
116
110
from functools import lru_cache
117
-
118
111
import time
119
-
120
112
import requests
121
113
122
- \# Function to get the HTML Content
123
114
115
+ # Function to get the HTML Content
124
116
def get_html_data (url ):
117
+ response = requests.get(url)
118
+ return response.text
125
119
126
- response = requests.get(url)
127
-
128
- return response.text
129
-
130
- \# Memoize function to cache the data
131
120
121
+ # Memoize function to cache the data
132
122
def memoize (func ):
123
+ cache = {}
133
124
134
- cache = {}
135
-
136
- \# Inner wrapper function to store the data in the cache
137
-
138
- def wrapper(\* args):
139
-
140
- if args in cache:
141
-
142
- return cache\[ args\]
143
-
144
- else:
145
-
146
- result = func(\* args)
125
+ # Inner wrapper function to store the data in the cache
126
+ def wrapper (* args ):
127
+ if args in cache:
128
+ return cache[args]
129
+ else :
130
+ result = func(* args)
131
+ cache[args] = result
132
+ return result
147
133
148
- cache \[ args \] = result
134
+ return wrapper
149
135
150
- return result
151
-
152
- return wrapper
153
-
154
- \# Memoized function to get the HTML Content
155
136
137
+ # Memoized function to get the HTML Content
156
138
@memoize
157
-
158
139
def get_html_data_cached (url ):
140
+ response = requests.get(url)
141
+ return response.text
159
142
160
- response = requests.get(url)
161
-
162
- return response.text
163
-
164
- \# Get the time it took for a normal function
165
143
144
+ # Get the time it took for a normal function
166
145
start_time = time.time()
167
-
168
146
get_html_data(' https://books.toscrape.com/' )
169
-
170
147
print (' Time taken (normal function):' , time.time() - start_time)
171
148
172
- \# Get the time it took for a memoized function (manual decorator)
173
-
149
+ # Get the time it took for a memoized function (manual decorator)
174
150
start_time = time.time()
175
-
176
151
get_html_data_cached(' https://books.toscrape.com/' )
152
+ print (' Time taken (memoized function using manual decorator):' , time.time() - start_time)
153
+ ```
177
154
178
- print('Time taken (memoized function using manual decorator):',
179
- time.time() - start_time)
180
-
181
- Here’s the output:
155
+ And here’s the output:
156
+ ![ ] ( images/output_normal_memoized.png )
182
157
183
158
Notice the time difference between the two functions. Both take almost
184
159
the same time, but the supremacy of caching lies behind the re-access.
@@ -190,82 +165,70 @@ increase the number of calls to these functions, the time difference
190
165
will significantly increase (see [ <u >Performance
191
166
Comparison</u >] ( #performance-comparison ) ).
192
167
193
- ### ** Method 2: Python caching using LRU cache decorator**
168
+ ### Method 2: Python caching using LRU cache decorator
194
169
195
170
Another method to implement caching in Python is to use the built-in
196
- @lru_cache decorator from functools. This decorator implements cache
171
+ ` @lru_cache ` decorator from ` functools ` . This decorator implements cache
197
172
using the least recently used (LRU) caching strategy. This LRU cache is
198
173
a fixed-size cache, which means it’ll discard the data from the cache
199
174
that hasn’t been used recently.
200
175
201
- To use the @lru_cache decorator, we can create a new function for
176
+ To use the ` @lru_cache ` decorator, we can create a new function for
202
177
extracting HTML content and place the decorator name at the top. Make
203
- sure to import the functools module before using the decorator:
178
+ sure to import the ` functools ` module before using the decorator:
204
179
180
+ ``` python
205
181
from functools import lru_cache
206
182
207
- @lru_cache(maxsize=None)
208
183
184
+ @lru_cache (maxsize = None )
209
185
def get_html_data_lru (url ):
186
+ response = requests.get(url)
187
+ return response.text
188
+ ```
210
189
211
- response = requests.get(url)
212
-
213
- return response.text
214
-
215
- In the above example, the get_html_data_lru method is memoized using the
216
- @lru_cache decorator. The cache can grow indefinitely when the maxsize
217
- option is set to None.
190
+ In the above example, the ` get_html_data_lru ` method is memoized using the
191
+ ` @lru_cache ` decorator. The cache can grow indefinitely when the ` maxsize `
192
+ option is set to ` None ` .
218
193
219
- To use the @lru_cache decorator, just add it above the get_html_data_lru
194
+ To use the ` @lru_cache ` decorator, just add it above the ` get_html_data_lru `
220
195
function. Here’s the complete code sample:
221
196
222
- \# Import the required modules
223
-
197
+ ``` python
198
+ # Import the required modules
224
199
from functools import lru_cache
225
-
226
200
import time
227
-
228
201
import requests
229
202
230
- \# Function for getting HTML Content
231
203
204
+ # Function to get the HTML Content
232
205
def get_html_data (url ):
206
+ response = requests.get(url)
207
+ return response.text
233
208
234
- response = requests.get(url)
235
-
236
- return response.text
237
-
238
- \# Memoized using LRU Cache
239
209
210
+ # Memoized using LRU Cache
240
211
@lru_cache (maxsize = None )
241
-
242
212
def get_html_data_lru (url ):
213
+ response = requests.get(url)
214
+ return response.text
243
215
244
- response = requests.get(url)
245
-
246
- return response.text
247
-
248
- \# Getting time for Normal function to extract HTML content
249
216
217
+ # Get the time it took for a normal function
250
218
start_time = time.time()
251
-
252
219
get_html_data(' https://books.toscrape.com/' )
253
-
254
220
print (' Time taken (normal function):' , time.time() - start_time)
255
221
256
- \# Getting time for Memoized function (LRU cache) to extract HTML
257
- content
258
-
222
+ # Get the time it took for a memoized function (LRU cache)
259
223
start_time = time.time()
260
-
261
224
get_html_data_lru(' https://books.toscrape.com/' )
262
-
263
- print('Time taken (memoized function with LRU cache):', time.time() -
264
- start_time)
225
+ print (' Time taken (memoized function with LRU cache):' , time.time() - start_time)
226
+ ```
265
227
266
228
This produced the following output:
229
+ ![ ] ( images/output_normal_lru.png )
267
230
268
- ### ** Performance comparison**
231
+ ### Performance comparison
269
232
270
233
In the following table, we’ve determined the execution times of all
271
234
three functions for different numbers of requests to these functions:
@@ -280,6 +243,7 @@ three functions for different numbers of requests to these functions:
280
243
As the number of requests to the functions increases, you can see a
281
244
significant reduction in execution times using the caching strategy. The
282
245
following comparison chart depicts these results:
246
+ ![ ] ( images/comparison-chart.png )
283
247
284
248
The comparison results clearly show that using a caching strategy in
285
249
your code can significantly improve overall performance and speed.
0 commit comments