Thanks to visit codestin.com
Credit goes to github.com

Skip to content

lisaover/PittDataMining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PittDataMining

University of Pittsburgh Data Mining Course Spring 2020

Homework 1 Concepts

  • Linear Regression and evaluation techniques
  • Logistic Regression and evaluation techniques
  • KNN and evaluation techniques

Homework 2 Concepts

  • Naive Bayes
  • Random Forests
  • Multinomial Logistic Regression
  • t-SNE
  • K-means clustering
  • Uniform manifold approximation projection

Python Lab

In this python lab, four models were evaluated on surgical procedures data for procedures conducted between June 2017 and June 2018. The goal was to develop an algorithm to accurately predict a patient’s level of risk for a length of stay (LOS) greater than five days post-surgery. Please read the final essay, essay_Python_lab.pdf, for lab details.

Homework 3 Concepts

  • XGBoost
  • LightGBM
  • CatBoost
  • Gridsearch techniques
  • LIME and Shapley Additive Explanations (SHAP)

Library Data Project: Digital Humanities Topic Modeling

Access the project web page

The Humanities Data Librarian for the University Library at the University of Pittsburgh, Terry Kapral, provided three data sets for an analysis of the digital collections in the Humanities department. This was an exploratory, unsupervised learning project. The high-level goal was to investigate which topics are present within the humanities digital collection, and how those topics vary over time. Specifically, Mrs. Kapral was interested in answers to the following questions about the data:

  1. What are the latent topics across the digital items?
  2. What items are related by topic?
  3. How do topics change over time with respect to the time period covered by the items within each topic?
  4. Are there any problems with the data?

These questions were answered through data exploration, including word embeddings and t-SNE plots, and topic modeling, using the unsupervised learn- ing algorithm, Latent Dirichlet Allocation (LDA). Data exploration revealed problems in the data, some of which were mitigated. The final LDA model revealed 19 latent topics from the titles and abstracts in the metadata for the 124,517 digitized items that had a title.

About

Course files for Data Mining Spring 2020 at the University of Pittsburgh

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published