Resources
The short list of what's on the site that's worth your time, organized by the decision you're trying to make.
This is the short list of what's on the site, organized by the decision you're trying to make. Not everything is here on purpose. The deeper material lives on its own page; the cheat-sheets and one-off references are linked from the relevant guide.
If you're new and want one place to start, the interview prep pillar is the right page. If you're two weeks out and want one thing to practice, the SQL problem bank is the highest-yield use of an hour.
Know the patterns before the interviewer asks them.
The five round walkthroughs
Every data engineering loop tests the same five rounds. Each guide walks the round end to end with the patterns interviewers score on and the failure modes that lose the round: SQL, Python, data modeling, system design, and behavioral. If you read one round guide and find it useful, the others are written in the same voice.
Practice catalogs
Live problem banks with real graders. Free, no login required to start.
Graded against a real Postgres instance with randomized fixtures.
Draw the pipeline, pick the tools, get scored against the SLA.
Schema design problems with grain-first rubrics.
Data-engineering-shaped Python, not LeetCode algorithms.
Pillar guides for the four domains
Long reads on each domain, written as the version you'd hand to a friend before their loop: SQL interview questions with fifteen worked solutions, Python interview questions, data modeling, and pipeline architecture. If you want the hundred questions weighted by interview frequency, that's the top 100.
Company interview guides
The loops that bend the standard template enough to be worth knowing in advance. Round structure, what's actually asked, where the loop deviates.
SQL- and modeling-heavy. E3 to E7 leveling.
System design depth. Second design round at senior.
Delta Lake and PySpark grade strictly here.
Streaming and large-scale event processing.
Metrics and experimentation rounds.
Heavy data modeling. Live schema critique.
Big tech standard with BigQuery dialect bias.
Leadership Principles plus L4-L7 technical bar.
SQL optimization and Snowflake-specific syntax.
Decision guides
High-intent comparisons for the role-and-tool decisions that change what you should study. Data engineer vs data analyst for the role question. ETL vs ELT, batch vs streaming, dbt vs Airflow, Snowflake vs Databricks, Kafka vs Kinesis for the tool questions interviewers actually pose. If you're switching in from another role, the analyst-to-engineer transition guide is the one to read first.
Live Viewers, Live Billing
Click or drag a node from the toolbar above. Right-click the canvas for the full menu.
Drag from a node's right port to another node's left port to wire data flow.
Career artifacts
The boring but high-leverage stuff: the resume guide with examples by level, the roadmap for what to learn and in what order, salary by level, and portfolio projects that actually move the needle in a recruiter screen. The recruiter screen is the easiest part of the loop to win and the easiest to underestimate.
Tool-specific references
Stop reading. Solve one problem.
- 01
Active recall beats re-reading by 50%
Cognitive-science meta-reviews (Dunlosky et al., 2013) rank practice testing as a top-tier study technique, while re-reading and highlighting rank near the bottom
- 02
76% of hiring managers reject on the coding task, not the resume
From HackerRank's 2024 Developer Skills Report. Candidates who look strong on paper still fail the live screen if they haven't done timed, executable practice
- 03
Five problem shapes cover 80% of data engineer loops
Dedup, sessionization, top-N-per-group, slowly-changing dimensions, partition tricks. Writing the shapes by hand turns the unfamiliar into pattern recognition