Intelligence:
● What are the projects you have worked on?
● What are the challenges that you have faced in your projects and how you handled it?
● What are your strengths and weaknesses?
● Interest areas, objective of internship and reason for application
● Interested projects?
● ML or Cloud or trading?
Python:
● Difference between Python and Java
● Libraries worked on
● List comprehension, Lambda Functions, Pass, Map & Reduce
● Type conversion
● Explain the logic to reverse a number/ string in Python
● Palindrome/ perfect number/ factorial (recursion)
● Difference between dictionary and set
● Difference between loc and iloc.
● What are types of loop available in python. Please explain. Which loop would you prefer
and why?
● Functionality of NumPy and Pandas library
● What is block scope?
● What is docstring in python?
● Important steps in model building.
Data Engineering:
● What big data tools have you worked on?
● Which tool is preferred for real-time analytics?
● What is difference between Data Lake, Data Warehouse and Datamart ?
● Difference between SQL and No-SQL.
● Difference between primary key and foreign key.
● Why are wildcards used in SQL?
● Explain different types of joins in SQL?
● Difference between cross join and self join.
● What are entities and relationships?
● What is supervised and unsupervised learning?
● What is Spark?
● What is difference between ETL and ELT?
● What is hadoop?
● What is meant by indexing?
Spark/ Hadoop:
● YARN architecture
● How is Spark different from Mapreduce (2-3 points)
● RDDs
● How to convert RDD to Dataframe
Cloud:
● What is Cloud Computing?
● Name some of the major cloud providers. Are you familiar with the services offered by
them?
● What are the different types of cloud services? (Iaas, Paas, Saas)
● What are microservices?
● What can be some of the limitations of Cloud computing?
● EC2, Scaling, Load balancer, VPC, S3 lifecycle policies
● What is meant by serverless?
● What is managed and serverless cloud services?
● How to secure any cloud service e.g. EC2 on cloud.
● What is data governance and how to achieve it?
● How does AWS achieve high availability and low latency?
● Docker, Kubernetes
● CI/CD - meaning, use case