Ruchi Pandey
[email protected] | (551) 430-9837 | linkedin.com/in/ruchi-pandey-83b677142 | Seattle, WA
EDUCATION
Stevens Institute of Technology, Hoboken,NJ, USA May 2023
Master of Science in Computer Science GPA: 3.8/4
SKILLS AND EXPERTISE
Programming Languages : Python, R, HiveQL, SQL, PLSQL, Unix Shell Scripting
Databases : MySQL, MongoDB, Oracle, Teradata, PostgreSQL, Snowflake
Cloud Technology : Amazon Web Services (EC2, Athena, S3, Glue, Lambda, Step, RDS, Redshift, Airflow), Databricks
Big Data : Apache Hadoop, HDFS, MapReduce, Hive, HBase, Spark (PySpark), Sqoop
Scheduling & CI/CD Tools : Airflow, AppWorx, GitLab, Jenkins
Other Tools : PyCharm, Alteryx, Informatica, Tableau, Kibana, Jira, ServiceNow, Spring Boot, ODI, OBIEE, Excel
EXPERIENCE
Gusto, USA Dec 2023 – Present
Data Engineer
Integrated Slack notification library using wheel file to streamline communication and boost team productivity by 75%.
Optimized SQL query transformations to accurately forecast monthly data, reducing processing time by 60% and enhancing
decision-making capabilities.
Leveraged PyDeequ framework to rigorously assess data quality, identifying and addressing integrity issues for 80%
improvement in data reliability.
SMCP, NY, USA Jun 2022 - Dec 2022
Data Engineer
Scripted a customized data pipeline process using Python and SQL, providing real-time tracking of brand progress for the Finance
department
Optimized complex SQL transformation query, achieving a 50% reduction in processing time through query restructuring
Automated the daily sales report using web scraping tools, resulting in a time-saving of 60 hours per month in manual effort
Managed and optimized relational databases (e.g., MySQL, PostgreSQL) for high-performance data storage and retrieval
Integrated Tableau with existing data infrastructure, ensuring seamless data connectivity and real-time updates
Developed a data visualization dashboard that provides real-time insights into sales performance, enabling data-driven sales
strategies.
Established data pipelines to extract data from Redshift and deliver it to downstream systems via S3 and Sftp
Western Union, Hyderabad, India Sept 2018 - Apr 2021
Data Engineer
Engineered a real-time data ingestion pipeline using Apache Kafka, reducing data latency by 30% and enabling timely decision-
making for the analytics team
Orchestrated end-to-end ETL processes, ensuring seamless extraction, transformation, and loading of data from diverse sources into
a centralized data warehouse, resulting in improved data consistency
Investigated data quality issues through Exploratory data analysis (EDA) using SQL, Python and Pandas
Streamlined the data validation process, decreasing error rates in reports by 15% and improving overall data quality
Deployed Apache Airflow for orchestrating complex data workflows, enhancing workflow monitoring and scheduling capabilities
Stewarded a data pipeline up-time of 99.8%, overseeing the seamless ingestion of streaming and transactional data from 8 distinct
primary sources. Leveraged technologies like Spark, Redshift, S3, and Python for efficient data processing
Elevated data quality by 90% by implementing a data quality framework that includes data profiling, cleansing, and validation
Migrated a data warehouse to AWS, improving data access and a 26% cost reduction
PROJECTS
COVID-19 Sentimental Analysis Apr 2022
Architected a sentiment analysis system that leveraged NLP and machine learning techniques to classify COVID-19 tweets based
on assorted policies; improved classification accuracy by 15% and eliminated false alarms by 25%
Market analysis and data analysis for university admissions Aug 2022
Performed extensive data collection and cleaning across multiple universities' admission databases using SQL, conducted
advanced SQL analyses to identify key predictors, and developed a Tableau dashboard for enhanced data-driven decision-making
in university admissions, including demographic-specific trend analysis
CERTIFICATIONS
Amazon Web Services Cloud Practitioner
Oracle Database 11g: Program with SQL Release 2