PittDataMining

University of Pittsburgh Data Mining Course Spring 2020

Homework 1 Concepts

Linear Regression and evaluation techniques
Logistic Regression and evaluation techniques
KNN and evaluation techniques

Homework 2 Concepts

Naive Bayes
Random Forests
Multinomial Logistic Regression
t-SNE
K-means clustering
Uniform manifold approximation projection

Python Lab

In this python lab, four models were evaluated on surgical procedures data for procedures conducted between June 2017 and June 2018. The goal was to develop an algorithm to accurately predict a patient’s level of risk for a length of stay (LOS) greater than five days post-surgery. Please read the final essay, essay_Python_lab.pdf, for lab details.

Homework 3 Concepts

XGBoost
LightGBM
CatBoost
Gridsearch techniques
LIME and Shapley Additive Explanations (SHAP)

Library Data Project: Digital Humanities Topic Modeling

Access the project web page

The Humanities Data Librarian for the University Library at the University of Pittsburgh, Terry Kapral, provided three data sets for an analysis of the digital collections in the Humanities department. This was an exploratory, unsupervised learning project. The high-level goal was to investigate which topics are present within the humanities digital collection, and how those topics vary over time. Specifically, Mrs. Kapral was interested in answers to the following questions about the data:

What are the latent topics across the digital items?
What items are related by topic?
How do topics change over time with respect to the time period covered by the items within each topic?
Are there any problems with the data?

These questions were answered through data exploration, including word embeddings and t-SNE plots, and topic modeling, using the unsupervised learn- ing algorithm, Latent Dirichlet Allocation (LDA). Data exploration revealed problems in the data, some of which were mitigated. The final LDA model revealed 19 latent topics from the titles and abstracts in the metadata for the 124,517 digitized items that had a title.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PittDataMining

Homework 1 Concepts

Homework 2 Concepts

Python Lab

Homework 3 Concepts

Library Data Project: Digital Humanities Topic Modeling

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Homework 1		Homework 1
Homework 2		Homework 2
Homework 3		Homework 3
Library Data Project		Library Data Project
Python Lab		Python Lab
README.md		README.md
_config.yml		_config.yml

lisaover/PittDataMining

Folders and files

Latest commit

History

Repository files navigation

PittDataMining

Homework 1 Concepts

Homework 2 Concepts

Python Lab

Homework 3 Concepts

Library Data Project: Digital Humanities Topic Modeling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages