https://seekhobigdata.
com/
What is SQL, and why is it essential?
SQL (Structured Query Language) is a domain-specific language used for managing and manipulating
relational databases. It allows users to create, retrieve, update, and delete data from a relational
database management system (RDBMS). SQL is essential because it provides a standardized way to
interact with databases, making it easier to store, retrieve, and manipulate data efficiently.
Differentiate between SQL and NoSQL databases.
SQL databases are relational databases that use structured schemas and tables to store data. They
are suitable for applications with complex relationships between data.
NoSQL databases are non-relational and often use flexible schemas like JSON or XML. They are
ideal for applications that require scalability and handle unstructured or semi-structured data.
Explain ACID properties in the context of SQL databases.
ACID stands for Atomicity, Consistency, Isolation, and Durability.
These properties ensure the reliability of database transactions:
Atomicity: Ensures that a transaction is treated as a single, indivisible unit. Either all its changes are
applied, or none.
Consistency:Guarantees that a transaction takes the database from one consistent state to another,
maintaining data integrity.
Isolation: Ensures that multiple transactions can run concurrently without interfering with each
other.
Durability: Ensures that once a transaction is committed, its changes are permanent and survive
system failures.
https://seekhobigdata.com/
What is a primary key, and why is it important?
A primary key is a unique identifier for each record in a table. It ensures that each row in
the table is uniquely identifiable. Primary keys are important because they:
Enforce data integrity by preventing duplicate or null values.
Facilitate efficient data retrieval and JOIN operations.
Serve as a reference for foreign keys in related tables, establishing relationships.
What is a foreign key, and how does it relate to a primary key?
A foreign key is a field in one table that refers to the primary key in another table. It
establishes relationships between tables in a relational database. It enforces referential
integrity, ensuring that data in related tables remains consistent.
Define normalization and its benefits.
Normalization is the process of organizing data in a database to reduce redundancy and
improve data integrity. It involves breaking down large tables into smaller, related tables.
Benefits include: Reducing data duplication and storage requirements. Preventing data
anomalies, such as update anomalies.
Simplifying data maintenance and improving database performance.
Explain denormalization and when it might be useful.
Denormalization is the process of intentionally introducing redundancy into a database to
improve query performance. It can be useful when read-heavy operations are common,
and the trade- off between storage and query performance is acceptable
https://seekhobigdata.com/
Describe the differences between INNER JOIN, LEFT
JOIN, and RIGHT JOIN.
INNER JOIN returns only the rows with matching values in both tables.
LEFT JOIN returns all rows from the left table and the matched rows from the right table.
Unmatched rows on the left will have NULL values.
RIGHT JOIN is similar to LEFT JOIN but returns all rows from the right table and the
matched rows from the left table.
What is a self-join, and when would you use it?
A self-join is a type of join where a table is joined with itself. It is useful when you have
hierarchical data or when you need to create relationships within the same table, such as
when working with organizational charts or friend networks.
What is a subquery, and how is it different from a JOIN?
A subquery is a query nested inside another query. It can be used to retrieve data that will
be used as a condition in the outer query. A JOIN, on the other hand, combines data from
two or more tables into a single result set. Subqueries are often used for complex filtering
conditions or to retrieve aggregated data.
Explain the difference between UNION and UNION ALL.
UNION combines the result sets of two or more SELECT statements into a single result set
while removing duplicate rows.
UNION ALL, however, includes all rows from each SELECT statement, even if they are
duplicates. Use UNION when you want to eliminate duplicate rows and UNION ALL when
you want to retain them.
https://seekhobigdata.com/
How does the SQL injection attack work, and how can you
prevent it?
SQL injection is a type of cyberattack where malicious SQL code is inserted into an input
field, often resulting in unauthorized access or data manipulation. To prevent SQL
injection, you can:
Use parameterized queries or prepared statements.
Implement input validation and sanitization.
Avoid dynamic SQL queries with concatenated user input.
Apply the principle of least privilege to database accounts.
What is the difference between GROUP BY and HAVING
clauses?
GROUP BY is used to group rows that have the same values in specified columns, typically
for aggregation purposes (e.g., SUM, COUNT). HAVING is used to filter the results of a
GROUP BY query based on a condition applied to the aggregated values. HAVING is used
after GROUP BY to filter groups, while WHERE filters individual rows.
What is a correlated subquery, and when would you use one?
A correlated subquery is a subquery that depends on the values from the outer query. It
executes once for each row processed by the outer query. Correlated subqueries are used
when you need to reference values from the outer query within the subquery, typically for
filtering or aggregation purposes.
https://seekhobigdata.com/
Explain the difference between a stored procedure and a
function.
A stored procedure is a set of SQL statements that can be executed as a single unit. It can
have input and output parameters and may not return a value directly. A function, on the
other hand, always returns a value and is primarily used for calculations or
transformations. Functions can be used in SELECT statements, whereas stored
procedures cannot.
How do you optimize a slow-performing SQL query?
Optimizing a slow query involves various strategies, including:
Analyzing query execution plans.
Adding appropriate indexes.
Reducing the number of rows returned.
Rewriting complex queries.
Adjusting database configuration settings.
Using caching mechanisms.
Considering hardware upgrades.
What is an index, and why is it important in a database?
An index is a database structure that improves the speed of data retrieval operations on a
table. It works like an ordered list of values, allowing the database system to quickly locate
the rows that match a query's WHERE clause. Indexes are crucial for improving query
performance, especially on large datasets.
https://seekhobigdata.com/
What is a clustered index and a non-clustered index?
A clustered index determines the physical order of data rows in a table. Each table can
have only one clustered index, and it directly affects the table's storage order. In contrast,
a non-clustered index does not affect the physical order of data and can be created on
multiple columns. Tables can have multiple non-clustered indexes.
Describe the differences between a candidate key, primary. key, and super key.
A candidate key is a column or set of columns that can uniquely identify each row in a
table.
The primary key is a specific candidate key chosen as the main means of uniquely
identifying rows in a table. It enforces data integrity and must be unique and not contain
NULL values. A super key is a set of one or more columns that can uniquely identify rows
in a table. It may include extra columns beyond what's needed for a candidate key.
What is a trigger in SQL, and how is it used?
A trigger is a database object that automatically executes a specified set of SQL
statements when a specific event occurs, such as an INSERT, UPDATE, DELETE operation.
Triggers are often used for enforcing data integrity, auditing changes, or implementing
complex business logic when certain database events occur.
https://seekhobigdata.com/
Explain the concept of database transactions.
A database transaction is a sequence of one or more SQL operations that are treated as a
single unit of work. Transactions ensure data consistency by adhering to the ACID
properties (Atomicity, Consistency, Isolation, Durability). Either all operations within a
transaction are executed (committed) together, or none of them are (rolled back).
What is the difference between a unique constraint and a
primary key constraint?
Both unique constraints and primary key constraints enforce uniqueness in columns, but
there are key differences:
A primary key constraint uniquely identifies each row in a table and does not allow NULL
values.
A unique constraint enforces uniqueness but allows for NULL values in the constrained
columns.
How do you handle NULL values in SQL?
NULL represents the absence of a value in SQL. To handle NULL values:
Use the IS NULL and IS NOT NULL operators to check for NULL values.
Use the COALESCE() or ISNULL() function to replace NULL with a default value.
Ensure that columns that should not contain NULL values are properly defined with
constraints.
https://seekhobigdata.com/
What is the purpose of the SQL CASE statement?
The SQL CASE statement is used for conditional logic in queries. It allows you to return
different values or perform different actions based on specified conditions. It can be used
in SELECT, WHERE, and other clauses to customize query results.
Explain the concept of a view in SQL.
A view is a virtual table generated by a query. It consists of a SELECT statement that
retrieves data from one or more underlying tables. Views are used for security, simplifying
complex queries, and providing a logical representation of data without storing it
physically.
How does the EXISTS clause work, and when would you
use it?
The EXISTS clause is used to check for the existence of rows in a subquery. It returns TRUE
if the subquery returns any rows, otherwise, it returns FALSE. EXISTS is often used with
correlated subqueries when you need to check for the presence of related data.
What is the purpose of the SQL COUNT() function?
The COUNT() function is used to count the number of rows that meet a specified
condition in a table. It is often used with the GROUP BY clause to perform aggregate
counting operations.
https://seekhobigdata.com/
What is the purpose of the SQL CASE statement?
The SQL CASE statement is used for conditional logic in queries. It allows you to return
different values or perform different actions based on specified conditions. It can be used
in SELECT, WHERE, and other clauses to customize query results.
Explain the concept of a view in SQL.
A view is a virtual table generated by a query. It consists of a SELECT statement that
retrieves data from one or more underlying tables. Views are used for security, simplifying
complex queries, and providing a logical representation of data without storing it
physically.
How does the EXISTS clause work, and when would you
use it?
The EXISTS clause is used to check for the existence of rows in a subquery. It returns TRUE
if the subquery returns any rows, otherwise, it returns FALSE. EXISTS is often used with
correlated subqueries when you need to check for the presence of related data.
What is the purpose of the SQL COUNT() function?
The COUNT() function is used to count the number of rows that meet a specified
condition in a table. It is often used with the GROUP BY clause to perform aggregate
counting operations.
https://seekhobigdata.com/
Describe the differences between the CHAR and
VARCHAR data types.
CHAR and VARCHAR are both used to store character strings, but they differ in storage
and behavior:
CHAR has a fixed length, and it pads values with spaces to the specified length.
VARCHAR has a variable length and only uses the space needed for the actual value,
saving storage space.
Explain the difference between the UNION and JOIN
operations.
UNION combines rows from two or more tables into a single result set, removing
duplicates.
JOIN combines columns from two or more tables into a single result set based on a
specified condition, allowing you to retrieve related data from multiple tables.
What is the purpose of the SQL DDL and DML
statements?
Data Definition Language (DDL) statements are used to define, alter, or drop database
objects like tables, indexes, and views.
Data Manipulation Language (DML) statements are used to manipulate data in the
database, including inserting, updating, and deleting records.
https://seekhobigdata.com/
What are CTEs (Common Table Expressions), and how are
they used?
Common Table Expressions (CTEs) are temporary result sets that you can reference within
a SELECT, INSERT, UPDATE, or DELETE statement. CTEs make complex queries more
readable and maintainable. They are defined using the WITH keyword and can be
recursive or non-recursive.
How do you perform data paging in SQL?
To implement data paging in SQL, you typically use the OFFSET and FETCH clauses
(available in SQL Server, PostgreSQL, and some other databases) or the LIMIT and OFFSET
clauses (common in MySQL and SQLite). These clauses allow you to retrieve a subset of
rows from a result set, useful for pagination.
Explain the concept of SQL cursors.
SQL cursors are database objects used to iterate over the result set of a SELECT query,
one row at a time. Cursors are typically used within stored procedures or triggers when
you need to process rows sequentially, performing actions on each row.
What is the purpose of the SQL TRUNCATE statement?
The TRUNCATE statement is used to remove all rows from a table quickly. Unlike the
DELETE statement, which removes rows one at a time and logs each deletion, TRUNCATE
deallocates the entire data storage and is faster. However, it cannot be rolled back, and it
does not trigger any DELETE triggers.
https://seekhobigdata.com/
How can you prevent and handle deadlocks in a
database?
Deadlocks occur when two or more transactions block each other by holding locks on
resources needed by others. To prevent and handle deadlocks, you can:
Use appropriate isolation levels.
Design transactions to access resources in a consistent order.
Implement deadlock detection and resolution mechanisms.
Adjust database timeouts and retry logic in application code.
What is the difference between a left outer join and a
right outer join?
A left outer join returns all rows from the left table and the matching rows from the right
table. If no match is found in the right table, NULL values are returned.
A right outer join returns all rows from the right table and the matching rows from the left
table. If no match is found in the left table, NULL values are returned.
Explain the purpose of the SQL ROLLBACK statement.
The ROLLBACK statement is used to undo changes made in a transaction and return the
database to its state before the transaction began. It is typically used in response to an
error or when you want to cancel the effects of a transaction that should not be
committed.
https://seekhobigdata.com/
How can you optimize a SQL query using indexes?
You can optimize a query using indexes by:
Ensuring that columns in the WHERE clause are indexed.
Avoiding functions or operations on indexed columns in WHERE clauses.
Using covering indexes that include all columns needed for a query.
Periodically rebuilding or reorganizing indexes for maintenance.
What is a natural join, and when would you use it?
A natural join is a type of join that combines tables based on columns with the same name
in both tables. It is used when you want to join tables using columns that share identical
names without explicitly specifying the columns in the JOIN clause.
Explain the differences between the CHARINDEX and PATINDEX functions.
CHARINDEX returns the starting position of a substring within a string, or 0 if the
substring is not found.
PATINDEX searches for a pattern within a string using wildcard characters. It returns the
starting position of the first occurrence of the pattern, or 0 if no match is found.
https://seekhobigdata.com/
Let’s build your Data
Engineering journey
together!
📩 Call us directly at: 9989454737
https://seekhobigdata.com/