Thanks to visit codestin.com
Credit goes to datadriven.io

Resources

The short list of what's on the site that's worth your time, organized by the decision you're trying to make.

This is the short list of what's on the site, organized by the decision you're trying to make. Not everything is here on purpose. The deeper material lives on its own page; the cheat-sheets and one-off references are linked from the relevant guide.

If you're new and want one place to start, the interview prep pillar is the right page. If you're two weeks out and want one thing to practice, the SQL problem bank is the highest-yield use of an hour.

Prepare for the interview
01 / Open invite
02min.

Know the patterns before the interviewer asks them.

a system design query, the same shape a screen would give you.
The diff against expected. Where ties broke. What you missed.
sandbox
1source → bronze → silver → gold
2 ingest : CDC + Kafka
3 transform : dbt + Airflow
4 serve : Snowflake
5
Execute your solution0.4s avg.
PayPalInterview question
Solve a problem

The five round walkthroughs

Every data engineering loop tests the same five rounds. Each guide walks the round end to end with the patterns interviewers score on and the failure modes that lose the round: SQL, Python, data modeling, system design, and behavioral. If you read one round guide and find it useful, the others are written in the same voice.

Practice catalogs

Live problem banks with real graders. Free, no login required to start.

Pillar guides for the four domains

Long reads on each domain, written as the version you'd hand to a friend before their loop: SQL interview questions with fifteen worked solutions, Python interview questions, data modeling, and pipeline architecture. If you want the hundred questions weighted by interview frequency, that's the top 100.

Company interview guides

The loops that bend the standard template enough to be worth knowing in advance. Round structure, what's actually asked, where the loop deviates.

Decision guides

High-intent comparisons for the role-and-tool decisions that change what you should study. Data engineer vs data analyst for the role question. ETL vs ELT, batch vs streaming, dbt vs Airflow, Snowflake vs Databricks, Kafka vs Kinesis for the tool questions interviewers actually pose. If you're switching in from another role, the analyst-to-engineer transition guide is the one to read first.

Live Viewers, Live Billing

> We run a live video platform where creators broadcast to thousands of viewers at once. The product team wants real-time viewer counts and chat activity for creators, and the ads team needs accurate impression data for billing. Design a data pipeline for our livestream events.

+ Source
+ Transform
+ Storage
+ Quality
+ Consumer
+ Queue
Bronze
Silver
Gold
Custom
Pipeline Architecture
Sketch the architecture.

Click or drag a node from the toolbar above. Right-click the canvas for the full menu.

Drag from a node's right port to another node's left port to wire data flow.

Career artifacts

The boring but high-leverage stuff: the resume guide with examples by level, the roadmap for what to learn and in what order, salary by level, and portfolio projects that actually move the needle in a recruiter screen. The recruiter screen is the easiest part of the loop to win and the easiest to underestimate.

Tool-specific references

Open these only when you know the company's stack and you're targeting that specific dialect: PySpark, Kafka, Airflow, dbt, Snowflake, Databricks. The full hub is tools. Most candidates over-study these; the loop tests them less often than the JDs suggest.

02 / Why practice

Stop reading. Solve one problem.

  1. 01

    Active recall beats re-reading by 50%

    Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom

  2. 02

    76% of hiring managers reject on the coding task, not the resume

    From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice

  3. 03

    Five problem shapes cover 80% of data engineer loops

    Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition