Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 64dbde1

Browse files
authored
Create coreyms_com.py
1 parent b743be5 commit 64dbde1

File tree

1 file changed

+37
-0
lines changed

1 file changed

+37
-0
lines changed

Web Scraping/coreyms_com.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
from bs4 import BeautifulSoup
2+
import requests
3+
import csv
4+
5+
source = requests.get('http://coreyms.com').text
6+
7+
soup = BeautifulSoup(source, 'lxml')
8+
9+
csv_file = open('cms_scrape.csv', 'w')
10+
11+
csv_writer = csv.writer(csv_file)
12+
csv_writer.writerow(['headline', 'summary', 'video_link'])
13+
14+
for article in soup.find_all('article'):
15+
headline = article.h2.a.text
16+
print(headline)
17+
18+
summary = article.find('div', class_='entry-content').p.text
19+
print(summary)
20+
21+
try:
22+
vid_src = article.find('iframe', class_='youtube-player')['src']
23+
24+
vid_id = vid_src.split('/')[4]
25+
vid_id = vid_id.split('?')[0]
26+
27+
yt_link = f'https://youtube.com/watch?v={vid_id}'
28+
except Exception as e:
29+
yt_link = None
30+
31+
print(yt_link)
32+
33+
print()
34+
35+
csv_writer.writerow([headline, summary, yt_link])
36+
37+
csv_file.close()

0 commit comments

Comments
 (0)