0% found this document useful (0 votes)

394 views8 pages

Data Engineering Interview Prep

This document contains over 100 interview questions related to data engineering. The questions are grouped into sections covering topics like SQL databases, the cloud, Linux, big data, Kafka, coding, NoSQL databases, Hadoop, Lambda architecture, Python, data warehousing, APIs, Apache Spark, MapReduce, Docker and Kubernetes, data pipelines, Airflow, data visualization, security and privacy, distributed systems, Apache Flink, GitHub, DevOps, and development methodologies. The interview questions range from beginner to advanced levels and cover a wide breadth of essential data engineering concepts and tools.

Uploaded by

kamalnadhank

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

394 views8 pages

Data Engineering Interview Prep

Uploaded by

kamalnadhank

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Part V

1001 Data Engineering Interview

Questions

111
33 All Interview Questions

The interview questions are roughly structured like the sections in the ”Basic data Engi-
neering Skills” part. This makes it easier to navigate this document. I still need to sort
them accordingly.

SQL DBs

• What are windowing functions?

• What is a stored procedure

• Why would you use them?

• What are atomic attributes

• Explain ACID props of a database

• How to optimize queries

• What are the different types of JOIN (CROSS, INNER, OUTER)

• What is the difference between Clustered Index and Non-Clustered Index - with
examples?

The Cloud

• What is serverless

• What’s the difference between IaaS, PaaS and SaaS

• How do you move from the ingest layer to the Cosumption layer? (In Serverless)

• Whats the difference between cloud and edge and on-premises

• What is edge computing

114
Linux

• What is crontab

Big Data

• What are the 4 V’s

• Which one is most important?

Kafka

• What is a topic

• How to ensure FIFO

• How do you know if all messages in a topic have been fully consumed

• What are brokers

• What are consumergroups

• What is a producer

Coding

• What’s the difference between an object and a class

• Explain immutability

• What are AWS Lambda functions and why would you use them

• Difference between library, framework and package

• How to reverse a linked list

• difference between args and kwargs

• Difference between oop and functional programming

115
NoSQL DBs

• What’s a key/value (rowstore) store

• What’s a columnstore

• Diff between Row an col.store

• What’s a document store

• Difference between Redshift and Snowflake

Hadoop

• What File Formats can you use in Hadoop

• Whats the difference between a name and a datanode

• What is HDFS

• What is the purpous of YARN

Lambda Architecture

• what is streaming and batching

• what is the upside of streamtin vs batching

• What’s the difference between lambda and kappa architecture

• Can you sync the batch and streaming layer and if yes how

Python

• Difference between list tuples and dictionary

Data Warehouse & Data Lake

• What is a data lake?

116
• What is a data warehouse

• Are there data lake warehouses?

• Two Datalakes within single warehouse?

• What is a data maart?

• what is a slow changing dimension (types)

• What is a surrogate key and why use them?

APIs (REST)

• What does REST mean?

• What is idempotency

• What are common REST API frameworks (Jersey and Spring)

Apache Spark

• What’s an RDD

• What is a dataframe

• What is a dataset

• How is a dataset typesafe

• What is Parquet

• What’s Avro

• Difference between Parquet and Avro

• Tumbling Windows Vs. Sliding Windows

• Difference between batch ans stream processing

• What are microbatches

117
MapReduce

• What’s a use case of mapreduce

• Write a pseudo code for Wordcount

• What is a combiner

Docker & Kubernetes

• What is a container

• Difference between Docker Container and a Virtual PC

• What s the easiest way to learn kubernetes fast

Data Pipelines

• What is an example of a serverless pipeline

• What’s difference between at most once vs at least once vs exactly once

• What systems provide transactions

• What is a ETL pipeline

Airflow

• What is a DAG (in context of airflow/luigi)

• What are Hooks/ is a hook

• What are Operators

• How to branch?

DataViszualization

• What’s a BI tool

118
Security/Privacy

• What is Kerberos

• What is a firewall

• Whats GDPR?

• What’s anonymization

Distrubuted Systems

• how clusters reach consensus (the answer was using consensus protocols like Paxos
or Raft). Good I didnt have to explain paxos

• What is the cap theorem / explain it (What factors should be considered when
choosing a DB?)

• How to choose right storage for different data consumers? It’s always a tricky
question

Apache Flink

• what is Flink used for

• Flink vs Spark?

GitHub

• What are branches

• What are commits

• What’s a pull request

Dev/Ops

• What is continuous integration

119
• What is continuous deployment

• Difference CI/CD

Development / Agile

• What is Scrum

• What is OKR

• What is Jira and what is it used for

120

Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
DBT Cloud Advanced Architecture Guide
0% (1)
DBT Cloud Advanced Architecture Guide
4 pages
Siva
No ratings yet
Siva
4 pages
Srikanth M - Data Engineer
No ratings yet
Srikanth M - Data Engineer
5 pages
The CWT Service Menu (Issue 1)
No ratings yet
The CWT Service Menu (Issue 1)
31 pages
Cisco FTDV Anyconnect VPN Lab G Eve-Ng Lab Guide
No ratings yet
Cisco FTDV Anyconnect VPN Lab G Eve-Ng Lab Guide
18 pages
Databricks Delta for Developers
No ratings yet
Databricks Delta for Developers
11 pages
Big Data Engineer Resume Template Download 20201120
No ratings yet
Big Data Engineer Resume Template Download 20201120
2 pages
150 Data Engineering Interview Questions PDF
50% (4)
150 Data Engineering Interview Questions PDF
8 pages
Solution Ch01
89% (19)
Solution Ch01
4 pages
MTCWE
75% (4)
MTCWE
6 pages
buleinphilipsSCC2107138V01 - One Button Recovery Revision - 1.3 - 20220125112816
50% (2)
buleinphilipsSCC2107138V01 - One Button Recovery Revision - 1.3 - 20220125112816
6 pages
Android Studio Print To Bluetooth Printer
100% (1)
Android Studio Print To Bluetooth Printer
16 pages
Big Data Engineer Interview Questions
No ratings yet
Big Data Engineer Interview Questions
1 page
Pranjal Soni: Professional Summary
No ratings yet
Pranjal Soni: Professional Summary
4 pages
Data Engineer Interview Questions
No ratings yet
Data Engineer Interview Questions
16 pages
Hadoop/Spark Developer Resume
No ratings yet
Hadoop/Spark Developer Resume
7 pages
Sr Data Engineer with AWS & Hadoop Expertise
No ratings yet
Sr Data Engineer with AWS & Hadoop Expertise
7 pages
Senior Data Engineer Resume Example
No ratings yet
Senior Data Engineer Resume Example
1 page
Hive Interview
75% (4)
Hive Interview
17 pages
Big Data & Hadoop Developer Resume
No ratings yet
Big Data & Hadoop Developer Resume
8 pages
Aslam Big Data Engineer
No ratings yet
Aslam Big Data Engineer
6 pages
IT & Big Data Professional Profile
No ratings yet
IT & Big Data Professional Profile
7 pages
Senior Data Engineer Expertise
No ratings yet
Senior Data Engineer Expertise
5 pages
Building Data Pipelines - 1
No ratings yet
Building Data Pipelines - 1
25 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
Azure Data Engineer Interview Guide
No ratings yet
Azure Data Engineer Interview Guide
15 pages
Madhusudhan Senior Data Engineer
No ratings yet
Madhusudhan Senior Data Engineer
4 pages
Sampath Polishetty BigData Consultant
No ratings yet
Sampath Polishetty BigData Consultant
7 pages
Ajay Resume VLaF
No ratings yet
Ajay Resume VLaF
2 pages
Spark Interview Q&A: Key Insights
No ratings yet
Spark Interview Q&A: Key Insights
10 pages
Dhanush Bigdata Resume Updated
No ratings yet
Dhanush Bigdata Resume Updated
9 pages
Reetesh Jain2
No ratings yet
Reetesh Jain2
4 pages
Data Engineering Expertise Overview
No ratings yet
Data Engineering Expertise Overview
7 pages
Data-Engineering Course Structure
No ratings yet
Data-Engineering Course Structure
9 pages
DVS SPARK Course Content PDF
No ratings yet
DVS SPARK Course Content PDF
2 pages
Mahesh - Big Data Engineer
No ratings yet
Mahesh - Big Data Engineer
5 pages
Senior Data Engineer Resume Overview
No ratings yet
Senior Data Engineer Resume Overview
7 pages
Apache Airflow TRAINING12532
No ratings yet
Apache Airflow TRAINING12532
3 pages
Lead Data Engineer with AWS Expertise
No ratings yet
Lead Data Engineer with AWS Expertise
2 pages
Azure Databricks Interview Guide
No ratings yet
Azure Databricks Interview Guide
17 pages
Mandapriyanka (7 0)
No ratings yet
Mandapriyanka (7 0)
3 pages
DBT Flow
No ratings yet
DBT Flow
15 pages
Azure Data Engineer Resume
No ratings yet
Azure Data Engineer Resume
2 pages
Introduction To Databricks SQL Answer Guide
No ratings yet
Introduction To Databricks SQL Answer Guide
6 pages
Naresh DE
No ratings yet
Naresh DE
5 pages
Data Engineering Expert Profile
No ratings yet
Data Engineering Expert Profile
5 pages
Data Engineer Interview Questions
No ratings yet
Data Engineer Interview Questions
6 pages
GCP Data Engineer Resume Examples For 2024 Resume Worded
No ratings yet
GCP Data Engineer Resume Examples For 2024 Resume Worded
1 page
William Chang Resume Azure
No ratings yet
William Chang Resume Azure
6 pages
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
No ratings yet
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
8 pages
SQL and PySpark Interview Questions
No ratings yet
SQL and PySpark Interview Questions
15 pages
Aksha Interview Questions
100% (1)
Aksha Interview Questions
52 pages
Zclus - Harish - Data Engineer
No ratings yet
Zclus - Harish - Data Engineer
6 pages
Spark Application Deployment Guide
No ratings yet
Spark Application Deployment Guide
18 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
AWS Data Engineer Resume
No ratings yet
AWS Data Engineer Resume
1 page
89 Talend Interview Questions For Experienced 2018 - Real Time Scenario
No ratings yet
89 Talend Interview Questions For Experienced 2018 - Real Time Scenario
3 pages
Spark Big Data Tuning Guide
100% (1)
Spark Big Data Tuning Guide
20 pages
Real Time Hadoop Interview Questions From Various Interviews
No ratings yet
Real Time Hadoop Interview Questions From Various Interviews
6 pages
Dice Resume CV Yamini Vakula
No ratings yet
Dice Resume CV Yamini Vakula
5 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
6 Years of Experience in Functional, DB and ETL Testing
No ratings yet
6 Years of Experience in Functional, DB and ETL Testing
3 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
Apache Spark
No ratings yet
Apache Spark
62 pages
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
No ratings yet
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
40 pages
Ch1 3 PDF
No ratings yet
Ch1 3 PDF
32 pages
Release Notes Xcode44dp
No ratings yet
Release Notes Xcode44dp
4 pages
DX Diag
No ratings yet
DX Diag
29 pages
Rexx Handout
No ratings yet
Rexx Handout
62 pages
Ulead COOL 3D Production Studio 1.0 Support
0% (1)
Ulead COOL 3D Production Studio 1.0 Support
6 pages
Ga - h110m s2v
No ratings yet
Ga - h110m s2v
39 pages
Ga-Z170x-Gaming 5 - R10
100% (1)
Ga-Z170x-Gaming 5 - R10
66 pages
How To Take Apart Emachines 250 Netbook
No ratings yet
How To Take Apart Emachines 250 Netbook
16 pages
Boot Menu Reference List
No ratings yet
Boot Menu Reference List
5 pages
Michael - Barnard - Thesis Final Format Approved LW 11-23-15
No ratings yet
Michael - Barnard - Thesis Final Format Approved LW 11-23-15
47 pages
Man Eng Mov11.6 MoviconCE Programmer Guide
No ratings yet
Man Eng Mov11.6 MoviconCE Programmer Guide
68 pages
Exercise 4 - Setting Up Node - Js and NPM
No ratings yet
Exercise 4 - Setting Up Node - Js and NPM
5 pages
What SA SSL VPN Configuration Is Required For The VPN Tunneling Client To Obtain An IP Address
No ratings yet
What SA SSL VPN Configuration Is Required For The VPN Tunneling Client To Obtain An IP Address
6 pages
Sensors 23 03501
No ratings yet
Sensors 23 03501
16 pages
Schematic Document: Phantom (Huron River) Sandy Bridge (BGA1023) + Cougar Point (SFF)
No ratings yet
Schematic Document: Phantom (Huron River) Sandy Bridge (BGA1023) + Cougar Point (SFF)
56 pages
NIC SMS Gateway Integration Guide
100% (2)
NIC SMS Gateway Integration Guide
7 pages
What Is Cache Memory
No ratings yet
What Is Cache Memory
2 pages
CIS Module 3 VDC Compute
No ratings yet
CIS Module 3 VDC Compute
45 pages
Two Compliment Binary Numbers
No ratings yet
Two Compliment Binary Numbers
20 pages
QTP - Not Just For GUI Anymore
No ratings yet
QTP - Not Just For GUI Anymore
48 pages
Hostpot Manual
No ratings yet
Hostpot Manual
39 pages
EPROM Chip Replacement
No ratings yet
EPROM Chip Replacement
5 pages
EPOS Command Library
No ratings yet
EPOS Command Library
92 pages
Lista Abril
No ratings yet
Lista Abril
2 pages

Data Engineering Interview Prep

Uploaded by

Data Engineering Interview Prep

Uploaded by

Part V

1001 Data Engineering Interview

• What are windowing functions?

• What is a stored procedure

• Why would you use them?

• What are atomic attributes

• Explain ACID props of a database

• How to optimize queries

• What are the different types of JOIN (CROSS, INNER, OUTER)

• What’s the difference between IaaS, PaaS and SaaS

• Whats the difference between cloud and edge and on-premises

• What is edge computing

• What are the 4 V’s

• Which one is most important?

• How to ensure FIFO

• What are brokers

• What are consumergroups

• What’s the difference between an object and a class

• Difference between library, framework and package

• How to reverse a linked list

• difference between args and kwargs

• Difference between oop and functional programming

• What’s a key/value (rowstore) store

• Diff between Row an col.store

• What’s a document store

• Difference between Redshift and Snowflake

• What File Formats can you use in Hadoop

• Whats the difference between a name and a datanode

• What is the purpous of YARN

• what is streaming and batching

• what is the upside of streamtin vs batching

• What’s the difference between lambda and kappa architecture

• Difference between list tuples and dictionary

Data Warehouse & Data Lake

• What is a data lake?

• Are there data lake warehouses?

• Two Datalakes within single warehouse?

• What is a data maart?

• what is a slow changing dimension (types)

• What is a surrogate key and why use them?

• What does REST mean?

• What are common REST API frameworks (Jersey and Spring)

• How is a dataset typesafe

• Difference between Parquet and Avro

• Tumbling Windows Vs. Sliding Windows

• Difference between batch ans stream processing

• What are microbatches

• What’s a use case of mapreduce

• Write a pseudo code for Wordcount

Docker & Kubernetes

• Difference between Docker Container and a Virtual PC

• What s the easiest way to learn kubernetes fast

• What is an example of a serverless pipeline

• What’s difference between at most once vs at least once vs exactly once

• What systems provide transactions

• What is a ETL pipeline

• What is a DAG (in context of airflow/luigi)

• What are Hooks/ is a hook

• What are Operators

• what is Flink used for

• What are branches

• What are commits

• What’s a pull request

• What is continuous integration

• What is Jira and what is it used for

You might also like