Aakash Dwivedi
Sr. Data Scientist
Email:
[email protected]Phone: +1 412 376 7866
GitHub
------------------------------------------------------------------------------------------------------------------------------------------------------
Professional Snapshot:
Over 8 years of experience as Data Scientist in AI, NLP, Machine Learning, Computer Vision, Inferential Statistics, Graph Theory, and Probabilistic Graphical Models.
Strong understanding of the product development lifecycle, from ideation to launch, and the ability to manage projects from start to finish.
Expertise in translating business objectives into analysis designs and using multiple data assets to assess opportunities.
Proficiency in Python, NumPy, Scikit-Learn, genism, NLTK, TensorFlow, Keras, BERT, Prophet, Seaborn, and Plotly.
Experienced in solving business problems such as credit risk assessment, fraud detection, finance and reporting, and marketing analysis using machine learning techniques.
Skilled in collaborating with executives to make strategic business decisions and manage internal teams.
Knowledge of advanced algorithms and predictive modeling to make data-driven decisions that improve operational efficiency, reduce financial risks, and increase revenue.
Skilled in Advanced Regression Modelling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and
application of Statistical Concepts
Proficient in designing and implementing experimental tests for marketing campaigns and measuring their success.
Experienced in performing data profiling, data quality, and data reporting.
Knowledge of cloud versioning technologies such as Git, AWS (Sagemaker, lambda, Redshift, Glue, S3), and Azure.
Experienced in automating reports to drive process improvement.
Skilled in using MS Office tools, advanced Excel techniques, Tableau, and PowerBI to create monthly executive management presentations.
Possess strong interpersonal skills and effectively communicate within the team.
Capable of adopting new domains, technologies, concepts, and environments.
Technical Skills
Languages Python, R, Scala and Java, SQL, PL/SQL, ASP, Visual Basic, SQL, T-SQL, SQL Server, C, C++, UNIX, PERL
Machine learning library Spark ML, Spark MLlib, Scikit-Learn, BERT, Pytorch, NLTK & Stanford NLP
Deep learning framework TensorFlow, Google Dialog flow, Kera’s
Big Data Frameworks Apache Spark, Apache Hadoop, Kafka, Mongo DB, Cassandra.
Project Management Tools Figma, Rally, Jira, Rally, AHA
Regression, Classification, Clustering, Association, Logistic Regression, Simple Linear Regression, Polynomial Regression, Decision Trees,
Machine learning Algorithms
Random Forest, Multiple linear Regression, K-Nearest Neighbors (K-NN), Kernel SVM
Big data Distribution Amazon EMRCloud, Redshift, Glue, Cloudera
Web Technologies Flask, Django, and spring MVC
Front End Technologies JSP, HTML5, Ajax, JQuery and XML
Web server Apache2, Nginx Web Sphere and Tomcat
Visualization Tool Tableau, Domo, Power BU, Apache Zeppelin, Matplotlib, Seaborn and Plotly
Databases Oracle 11g/12c, MySQL and Postgress, MS Access, SQL Server 2012/2014, Sybase and DB2, Teradata14/15, Hive, Amazon S3
No SQL MongoDB and Cassandra
Operating Systems Linux and windows
Scheduling Tools Airflow & oozie
Education
Degree Tenure University GPA
4
Master of Science in Business Analytics July 2018- May 2020 Oklahoma State University, US
3.7
Bachelor of Technology in ECE July 2010-Jun 2014 GGSIPU, India
CERTIFICATION:
IBM Professional Program certificate in Data Science LQSZTFD6CJFD through Coursera.
Machine Learning from Duke University RLBQKABU3B57 through Coursera
Tableau Desktop Specialist certificate issued by Tableau Organization
SAS Visual Text Analytics in SAS Viya certificate issued by SAS Organization
SAS Certified Base Programmer for SAS 9 issued by SAS Organization
Work History & Key Projects
Company : Mastercard, USA
Designation : Analytics Manager/Sr. Data Scientist
Industry : US Finance
Tenure : August 2022- Present
Business Vertical : Global Delivery Enablement – Dashboard and Reporting
Description:
Global Delivery Enablement team at Mastercard is responsible for facilitating the delivery of high-quality products and services to customers globally. The team provides end-to-end project management and
delivery support to ensure that Mastercard's products and services are delivered on time, within budget, and with the required level of quality. They had a dashboard that tracks various KPI’s for project
delivery across US and Europe
Roles and Responsibility:
Strategic decisions were made to breakdown the Tableau dashboards for different user types.
Developed surveys for three personas to gather requirements and understand the current pain points.
Created project plan to develop the dashboards, requirement intake process and handoff process.
Created design roadmaps to iterate over versions of dashboards with different views, KPIs and filters.
Worked with team of analysts and developers to create an MVP for the second-round reviews.
Designed UI for requirements ingestion process for all three dashboards and worked with developers to deploy it.
Created and successfully completed the handoff of all dashboards to GDE team.
Technologies used: Python (Pandas, NumPy, EDA, Seaborn, Plotly), Hadoop, HDFS, Hbase, Hive, Pig, Hive-SQL, No-SQL, Tableau, Alteryx
Business Vertical : Merchants Operation and Technology – Anomaly Detection
Description:
Merchant O&T team manages the merchant databases across globe. They process the incomings transactions from various products across globe to maintain a merchant database. For this they use GME
(Global Merchant Engine), which consist of various rules that process the transactions and create the new merchants. GME consists of 12MM rules which are still growing, and any anomalies seen in process
is handled with new rules added to GME. For identifying anomalies, they have manual scripts run by analysts. Our task was to replace the anomalies with an ML solution which can also give out severity.
Roles and Responsibility:
Defining the scope; Determine what type of anomalies will be detected and what data sources will be used.
Collecting data; Gathered data from various sources such as transactional, acquirers, location data.
Preprocessing data: Performed ETL functions to create and preprocess data to create training data for
Models using AWS S3 and Redshift
Analyzing data: Used statistical and machine learning techniques to analyze the data and identify patterns and
outliers.
Trained anomaly detection models: Used unsupervised learning techniques (FB prophet, IQR and K means) to train models that can detect anomalies in the data.
Tested and validated the models; Tested the models on a subset of the data and validate the accuracy of the
model's predictions.
Model Implement and monitor; Deployed model in production and continuously monitor the results in AWS.
Redshift to ensure the models are working effectively.
Technologies used: Python (Pandas, NumPy, EDA, Seaborn, Plotly), Hadoop, HDFS, Hbase, Hive, Pig, Hive-SQL, No-SQL, AWS (Sagemaker, Redshift, Glue, S3) , Jira, Figma, Timeseries analysis, Deep
learning(Pytorch Prophet)
Company : Momentum Financial Services Group, USA
Designation : Data Scientist
Industry : US Finance
Tenure : March 2021- July 2022
Business Vertical : Credit Risk – Policy and Strategies
Description:
The Credit Risk department aims to evaluate the creditworthiness of potential borrowers and assess the level of risk associated with lending money to them. The department develops
credit risk policies, procedures, and models to guide lending decisions, continuously monitors the loan portfolio, and minimizes the risk of loan defaults and potential losses. The objective
is to maintain a healthy loan portfolio while ensuring loans are provided to creditworthy borrowers who are likely to repay their debts on time.
Roles and Responsibility:
As a point person, collaborated with VPs of Analytics, Credit Risk, and Securitization to address all analytics needs for decision-making.
Created Tableau dashboards to supervise and report on the performance of all loan products and manage risk analysis.
Analyzed the performance of the loan book for various products and formulated data-driven strategy changes to credit policies.
Oversaw development efforts for new credit risk models and implemented improvements in identifying delinquent customers by 12%.
Maintained regular communication with senior leadership to provide updates on performance, risk trends, and strategy changes.
Conducted regular reviews of risk management processes and implemented changes to improve efficiency and effectiveness.
Developed and delivered training programs to educate teams on data analysis, risk management, and reporting.
Maintained a comprehensive understanding of industry trends and emerging technologies to ensure the organization remains ahead of competitors.
Technologies used: Python (NumPy, Pandas, Seaborn), PostgreSQL, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT,
JAVA, HIVE, AWS, Jira, Confluence
Company : EXL Services, USA
Designation : Analytics Consultant II
Industry : Consulting Services
Tenure : September 2020- February 2021
Client Name : PNC Bank
Business Vertical : Fraud and Operations
Description:
As one of the largest banks in the United States, PNC Bank offers a broad range of financial services to businesses and individuals. However, like many other banks, PNC Bank faced a
significant challenge in identifying and preventing internal teller fraud, a type of fraud that involves bank employees stealing money from customer accounts. The challenge was
particularly complex due to the large number of employees involved in processing transactions and the difficulty of identifying fraudulent activity. To address this problem, PNC Bank
employed EXL services to create fraud detection system that uses advanced analytics and machine learning algorithms to identify patterns of fraudulent behavior.
Roles and Responsibility:
Worked as a consultant on a project focused on detecting and preventing internal teller check fraud for a major banking and financial services client based in Pittsburgh.
Managed and coordinated efforts between various workstreams involved in developing internal AI systems and detecting check fraud.
Performed extract, transform, and load (ETL) jobs using PySpark to combine data from various sources and create a feature store for modeling.
Developed an advanced anomaly detection model using machine learning algorithms to replace the existing rule-based system for detecting fraudulent activity.
The new model was successful in identifying fraudulent activity and saving the client approximately $800,000 in losses.
Implemented various data validation and quality checks to ensure the accuracy and reliability of the model's outputs.
Provided ongoing support and maintenance for the system, including troubleshooting, and resolving issues as they arose.
Worked closely with stakeholders across the organization to ensure that the model aligned with their needs and requirements.
Conducted regular audits and assessments of the system's performance to identify areas for improvement and optimization.
Continuously monitored the latest developments in fraud detection and machine learning to ensure that the system remained up-to-date and effective.
Technologies used: PAE (Python anaconda Environment), PySpark, Spark SQL, Rally, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT,
JAVA, HIVE, AWS, Jira, Confluence
Company : Oklahoma State University, USA
Designation : Research Analyst
Industry : Energy
Tenure : Jan 2019- August 2020
Business Vertical : Industrial Assessment Centre (Department of Energy)
Description:
The organization's mission is to provide industrial assessments to clients aimed at improving energy, waste, and productivity management while educating and training the next
generation of professionals. Their focus is on addressing various plant-related concerns, including those related to energy, water, waste, and productivity to increase their clients'
efficiency and productivity. The organization offers comprehensive industrial assessments to identify areas of improvement and recommend solutions, as well as training programs for
industry professionals. The goal is to reduce energy consumption and waste while increasing productivity for clients while supporting the growth and development of professionals in the
industry. By providing clients with specialized assessments, the organization aims to help them optimize their processes, save resources, and ultimately contribute to a more sustainable
future.
Roles and Responsibility:
Worked on an energy-saving project for commercial workshops in the Midwest region with the US Department of Energy.
Segmented commercial workshops into groups based on their power consumption needs using clustering techniques.
Developed regression models to identify the key contributing factors to power consumption across various industries.
Created Power BI reports to visualize recommendations for implementing energy-saving measures in different geographic locations.
Conducted assessments to identify areas of energy waste and provided recommendations to reduce energy consumption.
Collaborated with clients to implement energy-saving strategies and monitor their effectiveness.
Analyzed data from smart meters to identify patterns of energy usage and waste.
Conducted cost-benefit analyses of potential energy-saving measures to identify the most cost-effective solutions.
Provided training and educational resources to clients to promote energy-saving practices and awareness.
Monitored energy usage and waste over time to track progress and identify areas for further improvement.
Technologies used: Python, SQL, Power BI, Alteryx, DB2, Teradata, SQL-Server2008, Informatica 9.1, Enterprise Architect, Power Designer, MS SSAS, Crystal Reports, SSRS, ER Studio,
Lotus Notes, Windows XP, MS Excel, word, and Access.
Company : Capgemini, India
Designation : Senior Data Analyst
Industry : Consulting Services
Tenure : March 2015- April 2018
Client Name : Walmart
Business Vertical : Operations and Reporting
Description:
Walmart is a global retail giant operating in 27 countries. Walmart wanted to create customer value campaigns aimed at increasing sales among high-value customers. However,
accurately identifying the customers who account for many sales is a challenge. Walmart employed Capgemini to manage the entire campaign, from designing the offer strategy to
executing, monitoring, and managing results. These analysts use data mining techniques such as the RFM model to segment customers based on their purchasing behavior. The
campaigns need to be designed to encourage high-value customers to continue shopping at Walmart, increase their spending, and ultimately increase the company's revenue. In this
analysis we used tools such as SQL and RDBMS databases to retrieve customer data and generate reports using ODS in SAS, Excel, and Tableau to continually improve customer value
campaigns.
Roles and Responsibility:
Collected and analyzed large sets of data from various sources to derive insights and trends.
The role involved managing end-to-end customer value campaigns, including designing offer strategies, execution, monitoring, and results management.
The aim of the campaigns was to increase spending among customers who accounted for a high percentage of household ID's sales at Checkout Markets.
A RFM model was used to segment customers based on their purchasing behavior, from most valuable to least valuable, to identify the best customers.
SQL queries were written to retrieve customer data from RDBMS databases for Ad hoc analytics requests and performance reporting of sales and other KPIs as needed.
Reports were generated using ODS in SAS, Excel, and Tableau.
The data analyst would have needed expertise in SQL, RDBMS databases, RFM models, and data visualization tools like SAS, Excel, and Tableau.
Technologies used: SAS, Tableau, Excel, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.
Client Name : Gopro
Business Vertical : Strategic Decision and Development
GoPro is a technology company that designs and produces cameras and related accessories. The company's products are popular among adventure sports enthusiasts, allowing them to
capture high-quality video footage while engaging in extreme activities. As a data-driven company, GoPro faces several data challenges, including the need to manage and analyze large
volumes of data from various sources. The company also needs to ensure data quality and accuracy, as well as the security of its data. In addition, GoPro faces the challenge of turning
data into actionable insights that can drive business decisions for product development.
Roles and Responsibility:
Collected and analyzed large sets of data from various sources to derive insights and trends.
Collaborated with cross-functional teams to identify key business questions and data requirements.
Designed and implemented data models and databases to store and manage large datasets.
Developed and maintained dashboards and reports using visualization tools like Power BI, Looker to help stakeholders make data-driven decisions.
Conducted ad-hoc analyses and research to support business decisions and drive growth.
Identified opportunities for process improvement and optimization based on data insights.
Created and maintained data quality processes and data governance policies to ensure accuracy and consistency of data.
Communicated complex data insights and technical findings to stakeholders in a clear and concise manner.
Mentored junior data analysts and providing guidance on best practices and technical skills.
Technologies used: SAS, Power BI, Looker, Excel, SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query
Analyzer.
ACHIEVEMENTS
st
Awarded 1 place for ‘From research to app’ competition conducted by Oklahoma state University for both phase 1 and II.
Awarded tuition fee waiver for excellent graduate assistant research work with faculty at Oklahoma state University.
Awarded ‘Best employee’ for the year 2016 by Gopro during tenure at Capgemini.