Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View daiphuongngo's full-sized avatar
๐Ÿ’ญ
Aiming for Data Science / AI Engineering realization
๐Ÿ’ญ
Aiming for Data Science / AI Engineering realization

Block or report daiphuongngo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
daiphuongngo/README.md

Hi, Iโ€™m Dai-Phuong Ngo (Liam Ngo) ๐Ÿ‘‹ ๐Ÿ‘

Permanent Resident of Canada ๐Ÿ‡จ๐Ÿ‡ฆ ๐Ÿ

Contact: Main Email | Linkedin | 2nd Email | Tableau Public | Alteryx stack | HackerRank |

My Motto:

"Don't let what you think you canโ€™t do interfere with what you can do."

Languages, Technologies, Skills:

Criteria Details
Programming Certified SQL, Python (Pandas, Keras, SkLearn, PySpark, Tensorflow, Pytorch, OpenCV, H20.ai, PyTesseract, PyMuPDF, OpenPyXL, EasyOCR, PyWin32), R, Spark, Spark SQL, KQL, Shell, Scala, Cypher, Java, C#
Viz Certified Power BI, Tableau Desktop, Tableau Prep, Cognos, Qlik
Automation Certified Alteryx Advanced Designer, Alteryx Designer Cloud Advanced, Alteryx Machine Learning Fundamentals, Alteryx Intelligence Suite, Certified Dataiku Machine Learning Practitioner, Dataiku Developer, KNIME, SPSS (Modeler, Statistics), SAS (Studio, Enterprise Miner)
Big Data Certified Azure Data Fundamentals, Azure AI Fundamentals, Alteryx Server Administration, Databricks Accredited Lakehouse Fundamentals, Azure (AI Foundry, ML, Synapse, MS SQL, Fabric, Factory), AWS (Redshift, SageMaker, S3, Glue, Kinesis, Athena) & GCP (Vertex AI, BigQuery, GCS, Pub/Sub, CloudScheduler, Colab), Fivetran, Kafka, MySQL, MongoDB, Oracle, PostgreSQL, Hadoop (Hive, Zeppelin), Neo4j, Splunk
Data & AI Science Predictive Analytics (Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, AI, Generative AI, LLMs), Causal Inference, Statistical Inference and Modelling (Sampling, A/B Testing, Bayesian), Financial Reporting, Tax Compliance and Recovery, UiPath, Excel (VBA, Pivot, Vlookup, Hlookup, Solver, GoalSeek, Macros), GDPR, ServiceNow
Languages English ๐Ÿ‡บ๐Ÿ‡ฒ (fluent), Vietnamese (native), French ๐Ÿ‡จ๐Ÿ‡ฆ๐Ÿ‡จ๐Ÿ‡ต (basic overall, intermediate reading), German ๐Ÿ‡ฉ๐Ÿ‡ช (basic overall, intermediate reading)
Others Certified Six Sigma White Belt, Atlassian Confluence, Jira, Trello

Education & Experience:

Jul 2025 - now - Manager, Data Analytics, Canadian Corporate Tax, Tax Technology - Asset Management Digital Solutions - KPMG Canada - Toronto, Ontario, Canada ๐Ÿ๐Ÿ‡จ๐Ÿ‡ฆ

  • Architected an enterprise-grade Intelligent Document Processing (IDP) ecosystem for U.S. and Canadian tax forms, combining applied AI (OCR) for deep-learning detection, PDF structural parsing, recognition, classification, and multi-stage data engineering pipelines.
  • Spearheaded Azure-cloud-integrated ingestion and extraction workflows using Azure storage and service-based Python, C#, Javascript orchestration, enabling automated object detection, file type classification (tax form vs. non-tax form, form template versions), and high-fidelity structured outputs for downstream analytics.
  • Enabled the modernization of tax reporting automation by transitioning fragmented Power BI-driven workflows into robust Alteryx pipelines, enabling scalable, multi-client, multi-form operational efficiency.

Oct 2024 - Apr 2025 - Finance Transformation Analyst, Finance & Controlling - Haventree Bank ๐Ÿฆ - Toronto, Ontario, Canada ๐Ÿ๐Ÿ‡จ๐Ÿ‡ฆ

  • Advanced Analytics & Reporting: Supported risk analytics & credit risk team with logical development in Alteryx, Python, Tableauโ€™s KPIs, codes to transform data, plot geospatial visuals, analyze fire weather climate risk & OSFI metrics.
  • Data Automation & Workflow Optimization: Designed and deployed Alteryx, Python, SQL workflows to consolidate accounting logics and automate bank account reconciliation, match mortgage, deposit, corporate & funding transactions between banking partners vs. internal general ledger, integrating data from SharePoint, Excel, and SQL databases, improving reconciliation accuracy and efficiency by 90% and reducing manual reconciliation efforts by 70%.
  • Collaboration & Governance: Worked closely with accounting, finance teams to deliver data analytics, insights and establish best translated rule-based logic solutions for financial data automation and reconciliation on Alteryx Designer workflows deployed to Alteryx Server.

Apr 2023 - Oct 2024 - Analyst, Business Insights, Accounting, Tax & Finance - Hudson's Bay Company ๐Ÿ›๏ธ (HBC: Hudsonโ€™s Bay, The Bay, Saks Fifth Avenue, Saks Off Fifth) - Toronto, Ontario, Canada ๐Ÿ๐Ÿ‡จ๐Ÿ‡ฆ ๐Ÿ‡บ๐Ÿ‡ธ

  • SQL, APA Pipelines & Data Engineering: Built SQL pipelines and automated Alteryx, Dataiku workflows to streamline high-level reporting, reducing manual reconciliation efforts by 70-95%, integrated sources to enhance accuracy for tax & accounting teams, and discuss directly with VP โ€“ Tax, DVP โ€“ Indirect Tax for reports to CEO, US ๐Ÿ‡บ๐Ÿ‡ธ, Puerto Rican ๐Ÿ‡ต๐Ÿ‡ท & Canadian ๐Ÿ‡จ๐Ÿ‡ฆ state & provincial auditors, Big 4 consulting firms: Deloitte, KPMG, EY.
  • Machine Learning & AI: Implemented ML models, NLP, Computer Vision in Python to classify tax codes in SKU items in both the US & Canada, identify features from PDF invoices, increasing tax compliance and reducing manual efforts by 60%.
  • Data Architecture & Analytics: Cooperated with Data Engineer, Architect to design and develop a Snowflake-based Data Hub to centralize tax data from Snowflake, Oracle for later reporting and analytics, optimizing ETL workflows and reducing reporting errors by 70%.
  • Data Visualization & Business Intelligence: Designed Tableau, Power BI dashboards with advanced LOD, DAX, MDX measures, improving engagement and decision-making insights to replace static dashboards provided by Big 4 and save US$20K annually.
  • Multiclass Classification & Few-shot LLM Prompting for Tax Code Mapping (e.g., concatenating product transactions to tax logic using OpenAI models, based on retail data) as one of my Classification Modelling layers.
  • Tax Slip PDF Signature Detection using pretrained models like fasterrcnn_resnet50_fpn (developed by Microsoft and Meta), helping replicate manual audit marking using computer vision and replace manual tasks on non-structured data files.

Jan 2024 - Dec 2026 - Master of Liberal Arts (ALM), Extension Studies, Data Science, Graduate Student - Harvard University (online part-time evening) - Cambridge, Massachusetts, USA ๐Ÿ‡บ๐Ÿ‡ธ

From Clusters to Retrieval: Hybrid BERT-Based Taxonomy and Similarity Search for Medical Chatbot Questions (CSCI E-108 Data Mining, Discovery & Exploration):

  • Designed and evaluated a multi-version clustering and retrieval pipeline on ~47.5k medical chatbot questions, moving from TF-IDF/LSA + KMeans to BERT+tags with Nystrรถm spectral clustering. Built a production-style hybrid similarity search and reranking stack (BERT dense vectors + BM25 + tag Jaccard) to support intent discovery, taxonomy building, and downstream RAG use cases.

Auto-Tagging Medical Questions with Multi-Label Learning: A Comparative Analysis of NLP-Based Deep Learning Models (CSCI E-89B Natural Language Processing):

  • Developed model training pipeline to preprocess, embed, analyze text-based data and predict label for each incoming conversational phrase efficiently and precisely while leveraging statistical metrics for unsupervised evaluation.

๐Ÿ›ฐ๏ธ๐Ÿ›ธ Satellite & UAV Aerial Image Semantic Segmentation (CSCI S-89 Deep Learning):

  • Developed a deep learning project for multi-class semantic segmentation on aerial imagery using fine-tuned PSPNet, UNet, DeepLabV3+, applied to three datasets: UAVID, modified Bhuvan Land Cover, and Dubai semantic tile datasets in disregard to complex ground object types.

๐Ÿ“ˆ Predicting Market Movements and Building Smart Portfolios with SVR, Random Forest, and LSTM Models: Evidence from Five Major Canadian Banks (CSCI S-278 Applied Quantitative Finance and Machine Learning):

  • Developed a multi-stage machine learning pipeline combining econometrics, supervised learning (SVR, Random Forest), Deep Learning (RNN, LSTM) and Reinforcement Learning to forecast, classify, and dynamically optimize a portfolio of five major Canadian bank stocks for enhanced returns and controlled risk against stocks' volatility.

๐Ÿง  Brain Tumor MRI Image Segmentation & Detection (CSCI E-25 Computer Vision):

  • Designed & fine-tuned Deep Learning pipelines (Keras, Pytorch) for MRI image segmentation, leveraging CNNs, U-Net, DeepLab V3+ for high-precision tumor detection regardless of brain artifacts' complexity.

๐Ÿ’ณ Scalable Cloud-Based Credit Card Fraud Detection GCP AWS (CSCI E-192 Modern Data Analytics):

  • Applied modern data analytics and machine learning techniques to detect fraudulent credit card transactions using Google Cloud Platform & AWS. Built on BigQuery, Vertex AI, Dataproc, GCS, EMR, Athena, S3 via PySpark, SQL, the pipeline includes data preprocessing, model training with Random Forest, model evaluation, and deployment for real-time prediction while handling imbalanced data.

๐Ÿ“ˆ Causal Aware Stock Prediction Integrating LSTM and Causal Inference for Tech Sector Asset Evaluation (CSCI S-278 Applied Quantitative Finance and Machine Learning):

  • Integrated causal inference and deep learning in Python to improve stock prediction by combining LSTM forecasts with heterogeneous treatment effect models, enabling more confident, personalized trading decisions.

โš•๏ธ Scalable Cloud-Based NLP Text Classification for Clinical Examination (CSCI E-192 Modern Data Analytics):

  • Built a real-time Natural Language Processing feedback processing platform using Python, PySpark, SQL integrated with AWS SageMaker, Redshift, Glue, GCP Vertex AI, BigQuery, supporting Doctor to determine medical specialties for patients.

๐Ÿก Housing Affordability Statistical Inferences (CSCI E-83 Fundamentals of Data Science in Python):

  • Applied Bayesian causal inference models (pooled, unpooled, hierarchical), Linear Regression, and Maximum Likelihood Estimation (MLE) to analyze key housing affordability indicators and posterior distributions.

๐Ÿจ Hotel Daily Room Rate & Booking Cancellation Prediction (STAT E-109 Statistical Modeling in R):

  • Implemented XGBoost, Random Forest, and Deep Neural Networks (DNNs) to predict ADR (Average Daily Rate) and booking cancellation probability, and applied logistic regression, hypothesis testing, and ensemble models with increased revenue forecast accuracy using grid search hyperparameter tuning.

Jan 2023 - Apr 2023 - Alteryx Administrator, AWS Cloud Ops Data Migration - Billennium IT Inc ๐Ÿ–ฅ๏ธ for Roche โš—๏ธ (Swiss BioTech), Data Engineering - Integration, Data Services & Insights Foundational Domain - Mississauga, Ontario, Canada ๐Ÿ๐Ÿ‡จ๐Ÿ‡ฆ

  • Data Governance & Log Analysis: Monitored and analyzed IT log data from MongoDB on Alteryx Designer to track user activities, workflow hubs, and Alteryx Server performance across Rocheโ€™s North America & Europe operations.
  • Automation & Performance Optimization: Collaborated with team leader to develop Alteryx-based automatable flows to enhance userโ€™s workflow performance and identify bottlenecks in data processing.
  • System Monitoring & Security: Evaluated user authentication, server logs, and data access patterns to ensure compliance with Rocheโ€™s data protection standards and global security policies.
  • Alteryx Server Administration: Optimized server configurations, managed workflow execution, and collaborated with IT teams to troubleshoot high-performance computing issues.

Jan 2021 - Aug 2022 - Business Insights & Analytics Post-Graduate Program - Humber College - Toronto, Ontario, Canada ๐Ÿ๐Ÿ‡จ๐Ÿ‡ฆ

  • ๐Ÿฆ IEEE-CIS Fraud Detection (Capstone, Humber College): - Preprocessed data in Python, designed architecture solution, analyzed performance between ML classifiers to determine the best performers on the imbalanced dataset, Balanced Random Forest with ROC AUC around 0.9 & Random Forest with ROC AUC, Precision around 0.9.
  • ๐Ÿ‘ฎ๐Ÿš“ Safe Roads 2022 Competition - Toronto Police Service: - Used Power BI, Python, Azure Machine Learning to analyze geospatial datasets, provide interpretation, conduct A/B testing, determine factors, recommend on road conditions, awareness, top fatal intersections to enhance traffic safety, prevent fatal accidents, achieve prediction using Random Forestโ€™s ROC AUC & Precision around 0.8.

May 2022 - Aug 2022 - Data Science Intern (remote) - Cohost AI ๐Ÿจ - Toronto, Ontario, Canada ๐Ÿ๐Ÿ‡จ๐Ÿ‡ฆ

  • Data Pipeline Automation: Automated data ingestion from multiple APIs & databases into Python, SQL, enabling real-time financial reporting, improving analytics accuracy with domain expertise by 40-60%.
  • Visualization & Reporting: Created domain-based KPIs to embed with developed interactive Power BI dashboards in advanced DAX to support revenue decision-making, boosting user engagement by 10-25%.

Jan'22 - Apr 2022 - Product Data Analyst Intern - iRestify Inc. ๐Ÿข๐Ÿ‘ท - Toronto, Ontario, Canada ๐Ÿ๐Ÿ‡จ๐Ÿ‡ฆ

  • Geospatial & Business Intelligence Analytics: Built GIS-based Power BI dashboards to analyze revenue performance by location, optimizing territory-based pricing and operations.
  • Data Cleansing & Feature Engineering: Applied Python & SQL for data wrangling, increasing accuracy by 20% for key KPIs.
  • Data Automation: Designed workflow automations in Power BIโ€™s DAX and MDX, reducing manual reporting efforts by 30%.

Aug-Dec 2021 - Data Engineering & Analytics Intern (remote) - Center of Talent in AI (CoTAI) ๐Ÿค– - Toronto, Ontario, Canada ๐Ÿ๐Ÿ‡จ๐Ÿ‡ฆ

  • Big Data Engineering: Managed 4M+ data records, optimizing ETL pipelines between Vietnam & North America for Sentiment Analysis & behavior detection.
  • Sentiment Analysis & Target Detection: Developed NLP-based classification models in Python to detect sentiment and reaction from customer feedback on e-commerce platforms.
  • Visualization & Predictive Insights: Designed Tableau dashboards to track consumer sentiment trends and signals.
  • Machine Learning Classification: Compiled Machine & Deep Learning classifiers tackling imbalanced datasets to detect target customers for Bankingโ€™s Marketing Targets

Jun 2017 - Jun 2019 - Sales Executive & Sales Coordinator - Sofitel Saigon Plaza ๐Ÿจ - Ho Chi Minh City, Viet Nam

  • Revenue Forecasting: Prepared, consolidated financial Excel & Power BI reports to track sales performance and forecast departmental revenue targets, supporting executive decision-making and driving quarterly sales growth by 1-10% per account.
  • Revenue Generation: Managed key accounts, segments, and markets, consistently meeting and exceeding team & personal revenue targets for approximately 16 months, contributing to 65% of sales duration while consulting with the Revenue team on target settings.

Projects:

Topic more projects available on GitHub & Tableau Public
From Clusters to Retrieval: Hybrid BERT-Based Taxonomy and Similarity Search for Medical Chatbot Questions (CSCI E-108 Data Mining, Discovery & Exploration) - Designed and evaluated a multi-version clustering and retrieval pipeline on ~47.5k medical chatbot questions, moving from TF-IDF/LSA + KMeans to BERT+tags with Nystrรถm spectral clustering. Built a production-style hybrid similarity search and reranking stack (BERT dense vectors + BM25 + tag Jaccard) to support intent discovery, taxonomy building, and downstream RAG use cases.
Auto-Tagging Medical Questions with Multi-Label Learning: A Comparative Analysis of 7 NLP-Based Deep Learning Models (CSCI E-89B Natural Language Processing)
๐Ÿ“ˆ Predicting Market Movements and Building Smart Portfolios with SVR, Random Forest, and LSTM Models: Evidence from Five Major Canadian Banks (CSCI S-278 Applied Quantitative Finance and Machine Learning: - Developed a multi-stage machine learning pipeline combining econometrics, supervised learning (SVR, Random Forest), Deep Learning (RNN, LSTM) and Reinforcement Learning to forecast, classify, and dynamically optimize a portfolio of 5 major Canadian bank stocks for enhanced returns and controlled risk.
๐Ÿ›ฐ๏ธ Satellite & UAV Aerial Image Semantic Segmentation (CSCI S-89 Deep Learning) - Developed a deep learning project for multi-class semantic segmentation on aerial imagery using fine-tuned PSPNet, UNet, and DeepLabV3+, applied to three datasets: UAVID, modified Bhuvan Land Cover, and Dubai semantic tile datasets.
๐Ÿ“ˆ Causal Aware Stock Prediction Integrating LSTM and Causal Inference for Tech Sector Asset Evaluation (CSCI S-278 Applied Quantitative Finance and Machine Learning) - Integrated causal inference and deep learning in Python to improve stock prediction by combining LSTM forecasts with heterogeneous treatment effect models, enabling more confident, personalized trading decisions.
๐Ÿง  Brain Tumor MRI Image Segmentation & Detection (CSCI E-25 Computer Vision) - Designed deep learning pipelines (Keras, Pytorch) for MRI image segmentation, leveraging CNNs, U-Net for high-precision tumor detection.
๐Ÿ’ณ Scalable-Cloud-Based-Credit-Card-Fraud-Detection-Vertex-AI-on-Cloud-Platforms-GCP-AWS (CSCI E-192 Modern Data Analytics) - Applied modern data analytics and machine learning techniques to detect fraudulent credit card transactions using Google Cloud Platform & AWS. Built on BigQuery, Vertex AI, Dataproc, EMR, Athena, the pipeline includes data preprocessing, model training with multiple classifiers (Random Forest, XGBoost), evaluation, and deployment for real-time prediction while handling imbalanced data.
โš•๏ธ Scalable Cloud-Based NLP Text Classification for Clinical Examination (CSCI E-192 Modern Data Analytics) - Built a real-time Natural Language Processing feedback processing platform using Python, PySpark, SQL integrated with AWS SageMaker, Redshift, Glue, GCP Vertex AI, BigQuery, supporting Doctor to determine medical specialties for patients.
๐Ÿก Housing Affordability Statistical Inferences (CSCI E-83 Fundamentals of Data Science) - Applied Bayesian models (pooled, unpooled, hierarchical), Linear Regression, and Maximum Likelihood Estimation (MLE) to analyze key housing affordability indicators and posterior distributions.
๐Ÿจ Hotel Daily Room Rate & Booking Cancellation Prediction (STAT E-109 Statistical Modeling in R) - Implemented XGBoost, Random Forest, and Deep Neural Networks (DNNs) to predict ADR (Average Daily Rate) and booking cancellation probability, and applied logistic regression, hypothesis testing, and ensemble models with increased revenue forecast accuracy using grid search hyperparameter tuning.
๐Ÿฆ IEEE-CIS Fraud Detection (Capstone, Humber College) - Preprocessed data in Python, designed architecture solution, analyzed performance between ML classifiers to determine the best performers on the imbalanced dataset, Balanced Random Forest with ROC AUC around 0.9 & Random Forest with ROC AUC, Precision around 0.9.
๐Ÿ‘ฎ๐Ÿš“ Safe Roads 2022 Competition - Toronto Police Service - Used Power BI, Python, Azure Machine Learning to analyze geospatial datasets, provide interpretation, conduct A/B testing, determine factors, recommend on road conditions, awareness, top fatal intersections to enhance traffic safety, prevent fatal accidents, achieve prediction using Random Forestโ€™s ROC AUC & Precision around 0.8.
โš—๏ธ Pharma Portfolio Predictive Analysis - Coded in Python and AzureML to analyze time-series pharmaceutical sales data, forecast the key pharma product and predict the patterns in the future.
๐Ÿ›๏ธ Sentiment Analysis of E-commerce Clients - Conducted Sentiment Analysis on customerโ€™s comments & analyzed data generated from a system using Natural Language Processing through API on Fan Pagesโ€™ dialogs of diet products & participated in Data Operations, ETL in Python, SQL in MySQL, Azure, Visualization in Tableau to determine top customers, top efficient fan pages, most crucial intentions & demand entities, peak effective contact hours, peak periods of confirmations, common complaints.
๐Ÿฆ Banking Dataset โ€“ Marketing Targets - Used classification methods of ML, DL in Python to predict more accurately filing a claim while avoiding overfitting on an imbalanced dataset; - RUS Boost had the highest Balanced Accuracy, Geometric Mean, F1 scores & best Confusion Matrix among classifiers.
๐Ÿš— Porto Seguroโ€™s Safe Driver Prediction - Used classification methods of ML, DL in Python to predict more accurately auto insurance policy holders filing a claim (predict the probability) while avoiding overfitting on imbalanced dataset - RUS Boost had the highest Balanced Accuracy, Geometric Mean, F1 scores & best Confusion Matrix among classifiers.
๐Ÿ’ธ Income Analysis & Classification - Preprocessed, analyzed the Income background of all records in Python, SQL & visualized key variables in Tableau / Power BI to determine highlights, trends & predictions of Income types with ML, DL Classifiers.
๐Ÿจ Hotels & Resorts Analysis - Created a Sales Incentive Plan in Java: input, check password, calculate Salespersons, Revenues & export reports, calculated Hotel Revenueโ€™s metrics in Excel to analyze, visualize different types of KPIs - Designed Database and inserted sample data into tables of hotels, guests, employees & bookings in SQL queries.
๐Ÿซ University Admission - Led a team & built a Java program (< 150 coding lines) to store information of the newly admitted students, prompted user to enter the student name & high school grades, calculated GPA & assigned to the Universityโ€™s schools
๐Ÿ“‹ Investment Analysis of Shopify and Lightspeed in Canada - Managerial Finance & Accounting Report
Governance & Ethics in Data - Gained the highest grade of 95% in all Professor's classes analyzing ethics & governance models about data manipulated in Cybersecurity, COVID-19, Vaccination, etc. - Analyzed 3 aspects of the ethics model, data governance to mitigate potential challenges in the chosen context
๐Ÿฆ TD Bank's Porterโ€™s Value Chain Analysis (available for being shown only in a section) - Conducted an analysis of TD Bank over history, vision, mission, strategic and financial objectives, External environment based on PESTEL and Five Forces analysis, Internal environment based on SWOT-analysis, resource and capability analysis, and a value chain analysis, the current strategic approach and its various strategic actions, the staffing practices and strategy execution, Organizational structure.
๐ŸŒŽ Better Working Word - EY, NASA, Microsoft - Using Python, Machine Learning, Azure Studio, Azure Machine Learning in 3 challenges for 3 months to help locate and protect the biodiversity of frogs by discovering and counting local and global frogs on weather data sampled over space and time (spatiotemporal sampling) with given preliminary F1 score.
๐Ÿ’Š US Medicaid Pharmacy Pricing Analysis - Establishing tables by nodes and Graph on Neo4j in Cypher, and on Azure in SQL to predict future prices/quantities and important pharmaceutical products of US Medicaid datasets in Python, AzureML.
๐Ÿฆ Home Credit Default Risk - Connected, transformed datasets, conducted EDA in SQL, Scala on Hive, Zeppelin on customized datasets on the to analyze the loan applicants' background and help expanding to those unable to access financial services - Determined on Zeppelin/ Tableau/ Power BI the most significant background check of applicants who got most loan approvals.
SQL Murder Mystery - Determined the extract murder and killing planner with the shortest-possible SQL queries from basic to intermediate querying skills & approaches using: INNER/LEFT JOIN, GROUP BY, WITH, WHERE, Sub-Queries.
Acquisition & Merger Analysis - Compared techniques between loading dataset in Pythonโ€™s SQL Alchemy to MySQL & loading it in SQL to Hadoop, investigated & identified organizations for the most profitable merger and acquisition by examining accumulated data sets in terms of Sales, Revenue, Product Line in SQL on Zeppelin, visualized charts in Tableau, Power BI.
Annual Sales Analysis & Visualization - Applied EDA in Python, visualized 200K datapoints to answer Revenue questions - Visualized & compared results between charts in Tableau & Power BI to determine that the variables which caused the highest Sales Value: December, San Francisco, peak hours placing orders, top sold products, correlation between Prices & Volumes.

Harvard University - Master, Data Science - Academic Progress:

Courses Grade (at Harvard University, the highest grade is an "A," (93-100%) equivalent to a 4.00 on the 4-point scale) / Progress

CSCI E-104 Advanced Deep Learning | Spring'26 (graduate) ๐Ÿš€ CSCI E-94 Fundamentals of Cloud Computing and OpenAI with Microsoft Azure | Spring'26 (non-credit) ๐Ÿš€ CSCI E-89B Natural Language Processing in Python | Fall'25 โœ… | Grade A (93-100%) CSCI E-108 Data Mining, Discovery, and Exploration in Python | Fall'25 โœ… | Grade A (93-100%) CSCI E-597 Pre-Capstone & CSCI E-599A Capstone | Summer'26 & Fall'26 CSCI S-89 Deep Learning in Python โœ… | Grade A (93-100%) CSCI S-278 Applied Quantitative Finance and Machine Learning in Python โœ… | Grade A (93-100%) CSCI E-25 Computer Vision in Python with Deep Learning, Deep CNN, Transfer Learning, Generative Models โœ… | Grade A (93-100%) CSCI E-192 Modern Data Analytics with Spark Core, Spark SQL, Spark MLLib, GraphX, NLP, AWS, GCP, Python โœ… | Grade A (93-100%) CSCI E-83 Fundamentals in Data Science in Python (computational statistical inference with maximum likelihood, modern resampling methods, and Bayesian models) โœ… | Grade A (93-100%) STAT E-109 Introduction to Statistical Modelling in R โœ… | Grade A- (90-92%) CSCI E-101 Foundations of Data Science & Engineering in Python, SQL, Tableau โœ… | Grade A (93-100%)


Humber College - Post Graduate, Business Insights and Analytics - Academic Progress:

Courses Details
Data Analytics Tools โœ… SAS, SPSS Modeler, SPSS, Excel, Cognos
Managerial Finance & Accounting โœ… Excel (Investment Analysis of Shopify and Lightspeed in Canada)
Big Data โœ… Hadoop, R, Neo4j, Cypher, Graph
Quantitative Research Methods I & II โœ… Descriptive & Inferential Statistics, Probability, Normal Distribution, Estimation, Hypothesis Testing
Database & SQL โœ… SQL, ERD, Normalization
Governance & Ethics in Data โœ… Reflection & Integration of Knowledge: Governance & Ethics of Analytics in in Data, AI & Technology - only available from hyperlink in my Resume - (graded 95/100 & feedbacked by Professor. Kathleen Mcginn : "My goodness Phuong,Thank you for sharing this with me. It is indeed a very deep, intelligent and meaningful piece of writing that deserves an excellent grade - 95 (!) - the highest grade I have given so far. Congratulations - you have truly earned it.")
Canadian Business & Strategy โœ… TD Bank's Porterโ€™s Value Chain Analysis & Nucor Corporation Analysis
Marketing โœ…
Predictive Analytics โœ… linear and multiple regression, decision trees, linear programming, factor analysis, cluster analysis, modelling
Machine Learning and Programming 1 & 2 โœ… Python: Data Mining, Data Science, Data Visualization, Dimension Reduction, CRM, Evaluation Predictive Performance, Multiple Linear Regression, K-NN, Naives Bayes Classifier, Classification, Regression Trees, Logistic Regression, Cluster Analysis
Communication & Data Visualization โœ… Excel, Tableau
Business Intelligence โœ… Power BI
Machine Learning and Programming 2 โœ… Python: Time Series Forecasting, Market Basket Analysis, Natural Language Processing
Capstone Course โœ… IEEE-CIS Fraud Detection (Capstone, Humber College)
Project Management โœ… Boeing Aviation Case Report of Sales and Supply Boost

Contact BI ETL / Automation Cloud / ML Hackathon Server E- Learning
Main Email Microsoft Certified: Power BI Data Analyst Associate Alteryx Certified: Advanced Designer Databricks Certified: Data Analyst SQL Certified Advanced HackerRank Alteryx Certified: Server Implementation Alteryx 9-Comet & Completed Challeges
Linkedin Tableau Certified: Desktop Specialist Alteryx Certified: Advanced Designer Cloud Dataiku Certified: Machine Learning Practitioner Python Certified Problem Solving Intermediate HackerRank Alteryx Certified: Server Administration GitHub
2nd Email Databricks Certified: Fundamentals of Databricks Lakehouse Platform Google Certified: Tensorflow Developer & Machine Learning Practitioner (soon) R Certified Intermediate HackerRank (soon) Tableau Public
Alteryx Certified: Core Designer Microsoft Certified: Azure Data Fundamentals HackerRank Credly
Alteryx Certified: Designer Cloud Core Microsoft Certified: Azure AI Fundamentals CodeSignal Six Sigma Certified White Belt
Microsoft Certified: Azure Fabric Data Engineer (soon) Alteryx Ceritified: Machine Learning Fundamentals SAS Safe Roads 2022 Competition Participant

Other Certificates:

Earned ๐Ÿ… Details
ProtonX Tensorflow Developer (Statistics, Probability, Algebra, Machine Learning, Deep Learning, AI)
Center of Talent in AI Python, Machine Learning, Deep Learning, AI, Reinforcement Learning
Nordic Coder Python, Tableau
DataCamp SQL Intermediate
Microsoft Office Specialist Word, Excel, Powerpoint
Udemy Power BI for Business Intelligence

Pinned Loading

  1. Analyzing-Housing-Affordability-with-Statistical-Inference-using-OLS-Regression-Bayesian-Models Analyzing-Housing-Affordability-with-Statistical-Inference-using-OLS-Regression-Bayesian-Models Public

    3

  2. Brain-Tumor-Deep-Learning-Detection-VGG-ResNet-EfficientNet-ConvNeXt-Segmentation-SAM-UNet-DeepLabV3 Brain-Tumor-Deep-Learning-Detection-VGG-ResNet-EfficientNet-ConvNeXt-Segmentation-SAM-UNet-DeepLabV3 Public

    3

  3. Mortgage-Accounting-and-Risk-Analytics Mortgage-Accounting-and-Risk-Analytics Public

    Jupyter Notebook 1

  4. Scalable-Cloud-Based-Credit-Card-Fraud-Detection-Vertex-AI-on-Google-Cloud-Platform-GCP Scalable-Cloud-Based-Credit-Card-Fraud-Detection-Vertex-AI-on-Google-Cloud-Platform-GCP Public

    Python 1

  5. Predicting-Market-Movements-Building-Smart-Portfolios-with-ML-DL-RL-from-5-Major-Canadian-Banks Predicting-Market-Movements-Building-Smart-Portfolios-with-ML-DL-RL-from-5-Major-Canadian-Banks Public

    1

  6. Satellite-UAV-Aerial-Image-Semantic-Segmentation-DeepLabV3-UNet-PSPNet Satellite-UAV-Aerial-Image-Semantic-Segmentation-DeepLabV3-UNet-PSPNet Public

    2