CLOUD COMPUTING – 20CSA303
Lab Project on
COVID-19 Data Querying using Python and AWS
submitted by
Rahul v MY.SC.U3BCA21216
Siddharth A Shetty MY.SC.U3BCA21231
Esha Kumar M MY.SC.U3BCA21260
BACHELOR OF COMPUTER APPLICATIONS
DATA SCIENCE
SUBMITTED TO
Mr. Harshit C
Assistant Professor
Department of Computer Science
Amrita Vishwa Vidyapeetham
Mysuru Campus
PYTHON
# importing libraries
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
import os
import numpy as np
import matplotlib.pyplot as plt
extract_contents = lambda row: [x.text.replace('\n', '') for x in row]
URL = 'https://www.mohfw.gov.in/'
SHORT_HEADERS = ['SNo', 'State','Indian-Confirmed(Including Foreign
Confirmed)','Cured','Death']
response = requests.get(URL).content
soup = BeautifulSoup(response, 'html.parser')
header = extract_contents(soup.tr.find_all('th'))
stats = []
all_rows = soup.find_all('tr')
for row in all_rows:
stat = extract_contents(row.find_all('td'))
if stat:
if len(stat) == 4:
# last row
stat = ['', *stat]
stats.append(stat)
elif len(stat) == 5:
stats.append(stat)
stats[-1][0] = len(stats)
stats[-1][1] = "Total Cases"
objects = []
for row in stats :
objects.append(row[1])
y_pos = np.arange(len(objects))
performance = []
for row in stats[:len(stats)-1] :
performance.append(int(row[2]))
performance.append(int(stats[-1][2][:len(stats[-1][2])-1]))
table = tabulate(stats, headers=SHORT_HEADERS)
print(table)
plt.barh(y_pos, performance, align='center', alpha=0.5,
color=(234/256.0, 128/256.0, 252/256.0),
edgecolor=(106/256.0, 27/256.0, 154/256.0))
plt.yticks(y_pos, objects)
plt.xlim(1,performance[-1]+1000)
plt.xlabel('Number of Cases')
plt.title('Corona Virus Cases')
plt.show()
AWS
Global Coronavirus (COVID-19) Data – Tracks confirmed COVID-19 cases in provinces,
states, and countries across the world with a breakdown to the county level in the US.
Tracks confirmed cases and deaths in the US by state and county.
Tracks the number of people tested, pending tests, and positive and negative tests for
COVID-19.
Data on hospital beds and their utilization in the US.
A collection of over 45,000 research articles (over 33,000 with full text) about COVID-19,
SARS-CoV-2, and related coronaviruses. AWS has preprocessed and enriched these with
annotations extracted from Amazon Comprehend Medical.
Lookup tables to support visualizations.
The following query returns the growth of confirmed cases for the past 7 days joined
side-by-side with hospital bed availability, broken down by US county:
SELECT
cases.fips,
admin2 as county,
province_state,
confirmed,
growth_count,
sum(num_licensed_beds) as num_licensed_beds,
sum(num_staffed_beds) as num_staffed_beds,
sum(num_icu_beds) as num_icu_beds
FROM
"covid-19"."hospital_beds" beds,
( SELECT
fips,
admin2,
province_state,
confirmed,
last_value(confirmed) over (partition by fips order by last_update)
- first_value(confirmed) over (partition by fips order by last_update) as
growth_count,
first_value(last_update) over (partition by fips order by
last_update desc) as most_recent,
last_update
FROM
"covid-19"."enigma_jhu"
WHERE
from_iso8601_timestamp(last_update) > now() - interval '7' day AND
country_region = 'US') cases
WHERE
beds.fips = cases.fips AND last_update = most_recent
GROUP BY cases.fips, confirmed, growth_count, admin2, province_state
ORDER BY growth_count desc
The following screenshot shows the results of this query.