-
Microsoft
- Indianapolis, IN
- https://www.linkedin.com/in/leondragonzalez/
Lists (7)
Sort Name ascending (A-Z)
✨ Inspiration
LLMs
⭐ Python EDA
Projects which mainly highlight Python for the purpose of EDA and descriptive analytics.💥 Python ML & Deep Learning
Projects that include the use of machine learning and/or deep learning techniques.🌀 R Statistical Modeling
Projects which feature the use of R for statistical modeling, hypothesis testing, and/or ML⚡ Spark & Big Data
This list details projects which significantly leverages big data frameworks, including Spark (ex: Spark QL, PySpark, etc.), Hive, Sqoop, Hive, Hadoop, etc.SQL & Databases
Projects which predominantly use SQLStarred repositories
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
Examples and guides for using the OpenAI API
Finding similar, high-valued users based on seed users. The model includes 1805 features using Hive HQL and AWS Redshift.
Python/STAN Implementation of Multiplicative Marketing Mix Model, with deep dive into Adstock (carry-over effect), ROAS, and mROAS
Re-Imagination of The Economist: Corruption v. Development
Data Science & Machine Learning Data Capstone based on Moneyball dataset
An exploratory analysis of the Kaggle bikeshare data set with the application of linear regression models, which are not optimal for this particular problem of predicting bikes rented.
"What Your Heart Is Telling You" Logit Model
This is my first attempt at a KNN model, where I attempt to classify the purchase of caravan insurance in the Caravan data set (ISLR package).
My first attempt with building a SVM model, and optimizing the cost and gamma parameters using the Gaussian Kernel grid search method.
My first attempt at implementing a neural network using the Boston housing data set from the MASS library.
This is a descriptive and exploratory data analysis project from DataCamp which aims to explore real data on every Chipotle location to identify franchising opportunities. The goal is to scout out …
Capstone Submission #1 for the Harvard University Professional Certificate in Data Science.
A cluster analysis leveraging the kmeans algorithm to determine which degrees are likely to yield which levels of income based on historical data.
Capstone project #2 for the Harvard University Professional Certificate in Data Science
EDA project using SQL in Jupyter Notebooks, focusing on the history of games, broadcasts and performances for the National Football League
Use of associative rule mining using the APRIORI algorithm
2 A/B tests, testing the difference in 1) average player 1 day and 2) 7 day retention against control (old player level) and new version (new player level)
Python XGBoost model, using Amazon SageMaker, EC2 instances and S3 buckets. Used to prepare, partition, train, tune, predict and evaluate model. Project involves predicting customers who sign up fo…
Multi touch attribution models, including Markov chains
Analysis of Disney's top grossing films (adjusted for inflation) in Python, using regression to attribute film genre to success. The project includes using regression on the data, as well as bootst…
Used NLP techniques (tokenization, stemming, vectorization for TF-IDF) and clustering algorithms (Kmeans and Hierarchical clustering) to mine the "similarities" between films based on their plots p…
Computer Vision project
An EDA of Walmart stock data using Databricks, Spark and PySpark.
Predicting the number of required crew needed for manning a Hyundai Cruise ship based on information like number of cabins and passengers using linear regression. Leveraged SQL and PySpark,
Predicting whether a university is private or public using tree based models (ie: decision tree classifier, random forest classifier and gradient boosted tree classifier) using PySpark and Databricks.