Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views5 pages

SreeDEResume AWS

Teja Sree is a Data Engineer with 10 years of experience in deploying data solutions across various industries, particularly in Banking, Retail, and Healthcare. Proficient in AWS services, Teja specializes in designing scalable architectures, developing data pipelines, and ensuring data governance and compliance. Key skills include expertise in Apache Spark, AWS Glue, Amazon Redshift, and real-time data processing, contributing to improved operational efficiency and data-driven decision-making.

Uploaded by

viniwaldia9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views5 pages

SreeDEResume AWS

Teja Sree is a Data Engineer with 10 years of experience in deploying data solutions across various industries, particularly in Banking, Retail, and Healthcare. Proficient in AWS services, Teja specializes in designing scalable architectures, developing data pipelines, and ensuring data governance and compliance. Key skills include expertise in Apache Spark, AWS Glue, Amazon Redshift, and real-time data processing, contributing to improved operational efficiency and data-driven decision-making.

Uploaded by

viniwaldia9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Name: TEJA SREE

Phone No: 469-269-0419


[email protected]

Professional Summary
Data Engineer with 10 years of hands-on experience in deploying robust data solutions across diverse industries, including
Banking, Retail, and Healthcare. Proficient in utilizing AWS cloud services to design and implement scalable architectures.
Expertise in end-to-end development of data pipelines using advanced tools like Apache Spark, AWS Glue and Amazon Redshift.

 Skilled in designing scalable architectures using AWS services including Amazon EMR, AWS Glue, Amazon S3, Amazon
Redshift, AWS Lambda, Amazon RDS, Amazon Kinesis and Amazon DynamoDB to meet the data processing and
analytical needs of various industries.
 Developed comprehensive end-to-end data pipelines using Python, PySpark, SQL, Shell Scripting and various AWS
services, including AWS Glue, Amazon S3 and Amazon RDS, to automate data ingestion, transformation, and loading
processes for large-scale datasets.
 Implemented and managed data warehouses and analytical systems on Amazon Redshift to support reporting, business
intelligence, and data analytics needs, enabling data-driven decision-making.
 Enforced data governance policies in AWS, establishing access controls and security measures using AWS IAM and AWS
KMS to ensure data privacy, confidentiality, and compliance with regulations such as GDPR and HIPAA.
 Expertise in optimizing Snowflake architecture for efficient data management and querying; collaborated with AWS
Redshift to facilitate similar data warehousing capabilities.
 Proficient in real-time data processing using Amazon Kinesis and AWS Glue Streaming, handling streaming data for
timely analytics and reporting.
 Experienced in developing and implementing Big Data Management Platforms using AWS technologies including
Amazon EMR, Apache Spark, Hadoop, Hive, and Pig for large-scale data processing.
 Comprehensive understanding of Hadoop architecture and management of Amazon EMR, including configuration and
optimization of cluster resources for efficient data processing.
 Designed and managed Hadoop clusters on Amazon EMR using Hive partitioning and tools like Apache Zeppelin to
analyze extensive transactional data.
 Experienced in migrating ETL tasks from Teradata to Amazon Redshift for enhanced data warehousing capabilities and
improved analytics performance.
 Utilized Terraform to automate the deployment and scaling of AWS resources including AWS EC2 instances, Amazon S3,
and other storage solutions for processing large-scale data streams.
 Developed data quality audit systems using AWS Lambda, Amazon CloudWatch, and AWS S3 to monitor table health
and ensure data integrity across pipelines.
 Streamlined weekly jobs on AWS EMR by fine-tuning Apache Spark configurations, reducing runtime by 30% and
improving resource utilization.
 Proficient in implementing cross-account data sharing using AWS S3 and AWS IAM with secure connections, facilitating
collaboration across teams and departments.
 Hands-on experience with NoSQL databases including Amazon DynamoDB and Amazon DocumentDB, enabling flexible
data storage and retrieval solutions for various applications.
 Experience in coding MapReduce/YARN programs using Java, Scala, and Python, building data pipelines with Big Data
technologies on Amazon EMR.
 Proficient in data visualization tools like Tableau and Amazon QuickSight for creating insightful reports and dashboards,
facilitating data-driven insights for stakeholders.

Technical Skills
Amazon Web Services (AWS): S3, Glue, Kinesis, MSK, Redshift, Lambda Functions, EMR, Athena, Databricks, SageMaker Data
Science Studio, CloudWatch, DynamoDB, RDS, IAM Policies, CloudFormation Templates, Jupyter Notebooks
Big Data Technologies: Apache Hadoop (HDFS, MapReduce, YARN, Hive, Pig), Apache Spark (Spark Core, Spark SQL, Spark
Streaming), Hudi, CDC, Oozie, NiFi, Airflow, and Sqoop
NoSQL Databases: Apache HBase, Apache Cassandra, MongoDB, Amazon DynamoDB
Programming Languages: Python, Scala, Java
Data Visualization: Tableau, Microsoft Power BI, Looker
DevOps Tools: Jenkins, Terraform, Docker and Kubernetes
RDBMS: Microsoft SQL Server, Oracle, MySQL, and PostgreSQL
Version Control Systems: TCP/IP, HTTP, FTP, and SOAP
Web services: Rest (JAX-RS), SOAP (JAX-WS)
Version Control Tools: Git, GitHub, Bitbucket
Other Tools: Apache Kafka, Apache Flume, Snowflake, Control-M, Informatica, Talend, SQL Server Integration Services (SSIS),
InfluxDB, Grafana and Qlik
Professional Experience

Client: BOFA, Plano, Texas Jan 2022 –


Present
Role: Senior Data Engineer

Responsibilities:
 Created and managed data extraction pipelines in AWS Glue and AWS Step Functions to collect and integrate data from
diverse sources, including transactional databases, APIs, and streaming platforms (via Amazon Kinesis Data Streams),
ensuring compliance with banking regulations and data governance policies.
 Implemented data ingestion processes utilizing AWS services such as Amazon S3 (for object storage), Amazon RDS (for
relational databases), Amazon Redshift (for data warehousing), and Amazon Kinesis (for real-time streaming data),
supporting real-time analytics and reporting for banking operations, thereby improving data access speed by 40%.
 Designed and developed ETL (Extract, Transform, Load) processes using AWS Glue and AWS Lambda to cleanse,
validate, and transform collected banking data into standardized formats for regulatory reporting and analytical
purposes, which reduced data processing time by 30%.
 Utilized Amazon EMR (Elastic MapReduce) and Amazon S3 to process both structured and unstructured data from
transaction logs, customer interactions, and external market feeds, enhancing the bank’s ability to derive actionable
insights for risk management.
 Provisioned Spark clusters on Amazon EMR, employing IAM roles to securely access Amazon S3 for efficient processing
of large datasets, facilitating data-driven decision-making in risk assessment and customer segmentation, resulting in a
25% improvement in predictive accuracy.
 Built on-demand data warehouses using Amazon Redshift and Amazon EMR to process high volumes of financial data,
providing datasets for data scientists to conduct predictive analytics for credit scoring and fraud detection.
 Programmed in Hive, Spark SQL, and Python (PySpark) within Amazon EMR to streamline data processing and build
robust data pipelines that generate insights for improving customer service and optimizing product offerings, resulting
in a 20% increase in customer satisfaction scores.
 Orchestrated data pipelines using AWS Glue Workflows and AWS Step Functions, managing the flow of data and
scheduling regular processing tasks to ensure timely updates for risk management and regulatory compliance reporting,
aligning with GDPR and PCI-DSS standards.
 Stored raw banking data in optimized formats such as ORC and Parquet within Amazon S3 and queried it using Amazon
Athena to facilitate efficient retrieval and processing for analytical purposes.
 Imported data from various sources into Amazon S3 using AWS Glue and AWS Database Migration Service (DMS),
creating internal and external tables in AWS Glue Data Catalog for data organization and analysis, supporting various
banking applications, including loan processing and customer analytics.
 Developed shell scripts and automated workflows for incremental loads, Hive, and Spark jobs using AWS Lambda,
Amazon EC2, and Amazon CloudWatch Events to improve operational efficiency in data handling, resulting in a 15%
reduction in manual processing time.
 Optimized Hive performance through techniques like partitioning, bucketing, and implementing AWS Glue Crawlers and
custom SerDes for efficient data retrieval and processing, ensuring high availability of data for critical banking
applications.
 Developed Spark applications using Python (PySpark) for seamless data processing and analytics, leveraging Apache
Spark on Amazon EMR to handle large-scale data in the AWS environment, ultimately improving reporting speed and
accuracy by 35%.

Environment: Amazon S3, AWS Glue, Amazon EMR, Amazon Redshift, Amazon Kinesis, AWS Lambda, AWS RDS, Amazon EC2,
Spark, Spark SQL, ORC, GDPR, PCI-DSS, Python (PySpark), SQL, Scala, Parquet, ETL, Hive.

Client: Cisco Systems, Dallas, Texas Feb 2021 – Dec 2021


Role: Big Data Engineer
Responsibilities:
 Extensive knowledge in utilizing AWS Glue for complex ETL workflows, enabling multi-source data integration, data
cleansing, and transformation within scalable architectures tailored for retail analytics.
 Created and managed intricate ETL workflows to extract data from various sources (databases, files, APIs), transforming
it according to business rules, and loading it into target systems using AWS Glue integrated with Amazon S3 and
Amazon Redshift.
 Designed and implemented scalable data solutions on AWS, leveraging services such as Amazon S3, AWS Glue, Amazon
EMR, and Amazon Redshift to address the complex data processing needs of the retail industry, improving inventory
management and sales forecasting by 25%.
 Orchestrated end-to-end data ingestion and processing workflows using AWS Glue and Amazon Kinesis, ensuring
seamless integration of retail data streams from diverse sources while adhering to industry compliance standards (e.g.,
PCI-DSS).
 Implemented Change Data Capture (CDC) techniques for real-time synchronization of retail data, enhancing data
accuracy and timeliness for dynamic retail operations, including inventory tracking and sales analytics.
 Developed error handling and recovery mechanisms to manage ETL process failures, ensuring minimal data loss and
enabling quick recovery of critical retail metrics and operational insights.
 Utilized AWS Lambda for serverless compute solutions, automating data processing tasks to optimize resource
utilization and meet performance requirements for retail data analytics workflows.
 Managed and orchestrated data workflows using AWS EMR, AWS Step Functions, and AWS CloudFormation, ensuring
reliability, scalability, and fault tolerance in processing large-scale retail datasets.
 Integrated Hadoop ecosystem tools such as HDFS, Hive, and Pig on Amazon EMR to efficiently manage and process
large retail datasets, supporting key analytics initiatives, including customer behavior analysis and sales trends
prediction.
 Ensured the protection of sensitive data, including Personally Identifiable Information (PII), by implementing encryption
protocols (e.g., AWS KMS) and stringent access controls (via IAM roles) to maintain data confidentiality in compliance
with Federal Data Privacy Laws.
 Developed automated processes to detect and mask PII in data pipelines, enhancing data privacy and regulatory
compliance within retail applications.
 Implemented Python and shell scripts for data processing and automation tasks, incorporating retail-specific business
logic and regulatory compliance rules to ensure data integrity and consistency.
 Collaborated closely with retail analysts and stakeholders to analyze requirements and design data solutions addressing
unique challenges, such as inventory optimization and customer segmentation, resulting in a 30% improvement in
inventory turnover rates.
 Created interactive dashboards and visualizations using Amazon QuickSight to provide actionable insights into retail
data, supporting decision-making processes related to sales and marketing strategies, leading to a 20% increase in sales
performance.
 Documented technical specifications, architectural designs, and deployment procedures for retail-specific data
solutions, ensuring alignment with regulatory requirements and industry best practices for data governance.
 Actively contributed to AWS community forums and knowledge-sharing platforms, sharing insights and best practices
with the broader cloud data engineering community, enhancing collaborative learning within the retail data landscape.

Environment: Amazon S3, AWS Glue, Amazon EMR, Amazon Redshift, Amazon Kinesis, AWS Lambda, AWS CloudFormation,
AWS Step Functions, AWS KMS, Amazon QuickSight, Python, Shell scripting, Apache Spark, Apache Hudi, HDFS, Hive, Pig, Oracle,
MySQL, PostgreSQL, SQL Server, Tableau.

Client: City of New Orleans, New Orleans, Louisiana Sep 2018 - Jan 2021
Role: Data Engineer

Responsibilities:
 Designed and built scalable, secure data solutions using AWS managed services to support data ingestion,
transformation and reporting tailored for healthcare data, including medical records, claims and operational data,
resulting in a 30% reduction in data processing time.
 Extracted, transformed and loaded (ETL) large volumes of data from diverse healthcare systems into AWS data storage
services, utilizing AWS Glue and Amazon S3 and loading processed data into Amazon Redshift as the primary data
warehouse for advanced analytics and regulatory reporting, processing over 5 million patient records monthly.
 Developed data pipelines in AWS Glue, leveraging jobs, crawlers and data catalog features to automate and optimize
ETL workflows that load data directly into Amazon Redshift, significantly improving operational efficiency and patient
data management.
 Orchestrated data workflows across multiple AWS services, including Amazon S3, Amazon RDS, Amazon EMR and
Amazon Redshift, ensuring seamless integration and real-time access to patient records and clinical trial data, which
improved data availability for healthcare professionals by 40%.
 Utilized AWS EMR with PySpark and Spark SQL for large-scale data processing, enabling high-performance analysis on
healthcare claims and treatment outcomes, and facilitating data movement to Amazon Redshift for comprehensive
reporting.
 Implemented data security and governance policies in compliance with healthcare regulations (e.g., HIPAA, GDPR) to
ensure patient data privacy and compliance. Established robust access controls and encryption mechanisms using AWS
IAM and AWS Key Management Service (KMS) to safeguard sensitive data.
 Developed Spark applications to process healthcare data in various formats (e.g., ORC, Parquet) and stored it in Amazon
S3, facilitating fast retrieval and aggregation of health-related metrics for analytics.
 Used AWS Glue and AWS Database Migration Service (DMS) for incremental data loading from electronic health record
(EHR) systems ensuring timely updates to health data stored in Amazon Redshift.
 Optimized performance of AWS EMR clusters for real-time processing of large healthcare datasets, ensuring efficient
memory utilization and resource management, which decreased operational costs by 25%.
 Automated data processing and transformation workflows using AWS Glue, SQL scripts and AWS CloudFormation
templates for infrastructure deployment, reducing manual intervention and enhancing operational efficiency in health
data reporting.
 Implemented advanced data governance practices for healthcare data, ensuring compliance with HIPAA and GDPR,
applying encryption and access control to sensitive health information in Amazon Redshift, thus maintaining data
integrity and confidentiality.
 Worked with healthcare data formats including HL7, FHIR, JSON, and CSV for processing and ingestion into Amazon
RDS, Amazon S3 and Amazon Redshift, supporting various healthcare applications and analytics use cases.
 Used Apache Airflow on Amazon MWAA for scheduling and managing data pipeline tasks related to patient records, lab
results and treatment plans, ensuring consistent data availability and operational continuity.
 Developed user-defined functions (UDFs) in PySpark for specific healthcare business logic, enhancing data processing
capabilities for reporting on patient outcomes and operational metrics, contributing to improved care delivery.
 Worked extensively with AWS CodePipeline and AWS CodeCommit for continuous integration and deployment (CI/CD)
of data solutions, ensuring smooth releases of healthcare data applications into production environments while
maintaining high standards of code quality and documentation.
 Leveraged Amazon Redshift as a centralized data warehouse for healthcare analytics, enabling healthcare professionals
to run complex queries and perform ad-hoc analysis on large datasets, leading to actionable insights into patient
outcomes, operational metrics, and regulatory compliance.
 Utilized Redshift's features such as concurrency scaling and data sharing to enhance collaborative analytics across
departments, ensuring timely and accurate reporting that supports clinical decision-making and operational efficiency.

Environment: AWS (Amazon S3, AWS Glue, Amazon EMR, Amazon Redshift, AWS Lambda, AWS RDS, AWS Key Management
Service, AWS IAM, Amazon MWAA), SQL Server Integration Services (SSIS), Shell scripting, Python, SQL, PySpark, Spark SQL,
Scala, ORC, Parquet, Avro formats, Apache Airflow, Power BI.

Client: Innovative Software Solutions, Bengaluru, India Jul 2016 - Aug 2017
Role: Big Data Developer

Responsibilities:
 Developed and maintained Hadoop-based data processing applications, leveraging technologies such as Hadoop
MapReduce, HDFS and Hive.
 Designed and implemented data ingestion pipelines to load large volumes of data from diverse sources into Hadoop
clusters using tools like Sqoop and Flume.
 Implemented custom MapReduce jobs in Java to process and analyze structured and unstructured data, extracting
meaningful insights and patterns.
 Utilized Hive for data querying and analysis optimizing queries by leveraging Hive partitions and bucketing for improved
performance.
 Integrated and utilized additional data processing frameworks and libraries such as Apache Spark, Pig and Cascading to
enhance data processing capabilities.
 Worked with diverse data formats including Avro, Parquet and ORC optimizing data storage and retrieval efficiency
within Hadoop clusters.
 Collaborated with Data Architects and Data Scientists to understand data requirements and design data models and
schemas for efficient data processing.
 Developed and maintained Oozie workflows to orchestrate and schedule Hadoop jobs ensuring reliable and timely
execution of data processing tasks.
 Implemented data security measures in Hadoop clusters including authentication, authorization and encryption to
ensure data privacy and compliance.
 Optimized Hadoop cluster performance by tuning various parameters such as heap size, block size and replication
factor based on workload characteristics.
 Implemented data lineage and metadata management solutions using tools like Apache Atlas enabling data traceability
and governance.
 Developed and maintained monitoring and alerting mechanisms using tools like Nagios, Ganglia and Ambari ensuring
the health and performance of Hadoop clusters.
 Integrated Hadoop clusters with other data storage and processing systems such as relational databases and cloud
storage for seamless data integration and analysis.
 Implemented data backup and disaster recovery solutions for Hadoop clusters ensuring data availability and business
continuity in case of system failures.
 Kept up to date with emerging technologies and trends in the big data ecosystem continuously exploring new tools and
frameworks to enhance data processing capabilities.
Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, Flume, Apache Spark, Pig, Cascading, Avro, Parquet, ORC, Java, Oozie,
Apache Atlas, Nagios, Ganglia, Ambari.

Client: Bodhtree, Bengaluru, India May 2014 - Jun 2016


Role: Hadoop Developer

Responsibilities:

 Configured, supported and monitored Hadoop clusters using Cloudera distribution.


 Worked in an Agile Scrum development model, analyzing Hadoop clusters and various big data analytics tools, including
MapReduce, Pig, Hive, Flume, Oozie and Sqoop.
 Developed MapReduce jobs in Java for data cleaning and preprocessing, configuring Hadoop MapReduce and HDFS.
 Established custom MapReduce programs to analyze data and utilized Pig Latin for data cleansing and preprocessing.
 Created Hive tables and wrote Hive queries that were executed internally in a MapReduce manner.
 Implemented partitioning, dynamic partitions and bucketing in Hive to enhance performance.
 Loaded and transformed datasets in various formats, including structured and semi-structured data.
 Scheduled Oozie workflows to automate job execution.
 Implemented NoSQL database solutions such as HBase, for storing and processing diverse data formats.
 Collaborated with business stakeholders for user testing and coordinated testing activities.
 Conducted unit testing and delivered comprehensive unit test plans and results documentation.
Environment: Apache Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Oozie, HBase, UNIX shell scripting, Zookeeper, Java, Eclipse.

Education

 Master’s in computer science, UNT, Denton, Texas


 Bachelors in electrical and Electronics Engineering, VRSEC, India.

You might also like