You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorial/AJAX-and-more-HTTP.md
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ AJAX
12
12
13
13
You may find elements mission in HTML fetched by pyspider or [wget](https://www.gnu.org/software/wget/). When you open it in browser some elements appear after page loaded with(maybe not) a 'loading' animation or words. For example, we want to scrape all channels of Dota 2 from [http://www.twitch.tv/directory/game/Dota%202](http://www.twitch.tv/directory/game/Dota%202)
14
14
15
-

15
+

16
16
17
17
But you may find nothing in the page.
18
18
@@ -27,11 +27,11 @@ As [AJAX] data is transferred in [HTTP], we can find the real request with the h
27
27
28
28
While resources are been loaded, you may find a table of requested resources.
AJAX is using [XMLHttpRequest](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest) object to send and retrieve data which is generally shorted as "XHR". Use Filter (funnel icon) to filter out the XHR requests. Glance over each requests using preview:
33
33
34
-

34
+

35
35
36
36
To determine which one is the key request, you can use a filter to reduce the number of requests, guess the usage of the request by this path and parameters, then view the response contents for confirmation. Here we found the request: [http://api.twitch.tv/kraken/streams?limit=20&offset=0&game=Dota+2&broadcaster_language=&on_site=1](http://api.twitch.tv/kraken/streams?limit=20&offset=0&game=Dota+2&broadcaster_language=&on_site=1)
You can find "Create" on the bottom right of baseboard. Click and name a project.
47
47
48
-

48
+

49
49
50
50
Changing the crawl URL in `on_start` callback:
51
51
@@ -60,7 +60,7 @@ Changing the crawl URL in `on_start` callback:
60
60
61
61
Click the green `run` button, you should find a red 1 above follows, switch to follows panel, click the green play button:
62
62
63
-

63
+

64
64
65
65
Index Page
66
66
----------
@@ -103,7 +103,7 @@ You can use CSS Selector with built-in `response.doc` object, which is provided
103
103
104
104
pyspider provide a tool called `CSS selector helper` to make it easier to generate a selector pattern to element you clicked. Enable CSS selector helper by click the button and switch to `web` panel.
The element will be highlighted in yellow while mouse over. When you click it, a pre-selected CSS Selector pattern is shown on the bar above. You can edit the features to locate the element and add it to your source code.
109
109
@@ -142,7 +142,7 @@ Add keys you need to result dict and collect value using `CSS selector helper` r
142
142
143
143
Note that, `CSS Selector helper` may not always work. You could write selector pattern manually with tools like [Chrome Dev Tools](https://developer.chrome.com/devtools):
144
144
145
-

145
+

146
146
147
147
You doesn't need to write every ancestral element in selector pattern, only the elements which can differentiate with not needed elements, is enough. However, it needs experience on scraping or Web developing to know which attribute is important, can be used as locator. You can also test CSS Selector in the JavaScript Console by using `$$` like `$$('[itemprop="director"] span')`
0 commit comments