In the Pipeline

In the Pipeline

Pinned

Dave Flynn

dbt best practices in action at Cal-ITP’s data-infra project

Cal-ITP uses a standardized PR template and automated report for comprehensive PR review. See their process in-action with example PRs

8 min readApr 18, 2024

--

dbt best practices in action at Cal-ITP’s data-infra project

--

Dave Flynn

‘Thoughtful PR Review’ is now a requirement for data jobs

PR review is the ‘ Point of no Return’ — The last checkpoint before code is merged and prod data is changed. How do you review yours?

5 min read5 days ago

--

A mock classified ad from a newspaper wanted section looking for a dbt PR reviewer

--

Douenergy

Analytical SQL Tips Series —Qualify Clause

In SQL, the QUALIFY clause is essential for filtering results from WINDOW functions, serving a similar purpose as the HAVING clause does…

2 min readApr 19, 2024

--

1

Analytical SQL Tips Series —Qualify Clause

--

1

Dave Flynn

So, you think you’ve got dbt test bloat?

After a certain threshold, alert fatigue becomes an issue, especially when upstream issues trigger hundreds of downstream alerts

5 min readApr 18, 2024

--

So, you think you’ve got dbt test bloat?

--

Douenergy

From Zero to dbt: How to Analyze and Build Data Models from Spotify’s Million Playlist Data

Part 1: Analyze the 30GB json dataset with DuckDb and jq, then convert to Parquet to prep for dbt

10 min readApr 12, 2024

--

3

Spodbtify — Data modeling in dbt with the Spotify Million Playlist Dataset

--

3

Douenergy

Analytical SQL Tips Series — Filter Clause

In SQL there’s often more than one way to skin a cat, er… filter an aggregate. Here’s a SQL tip for those of you using DuckDB or Postgres

2 min readApr 9, 2024

--

Analytical SQL Tips Series — Filter Clause

--

Douenergy

Analytical SQL Tips Series — ORDER BY ALL

ORDER BY ALL helps streamline the sorting process removing the need to manually specify which columns to sort by

3 min readApr 3, 2024

--

Analytical SQL Tips Series — ORDER BY ALL

--

Douenergy

Analytical SQL Tips Series — GROUP BY ALL

Google BigQuery has recently added support for a new syntax called GROUP BY ALL, find out how this makes SQL querying more convenient…

2 min readMar 22, 2024

--

Analytical SQL Tips Series — GROUP BY ALL

--

Dave Flynn

Histogram overlay charts for data impact assessment in dbt just got a whole lot easier

The best way to diff data profile stats like histogram and top-k is to plot them on a single chart, overlaid with shared axes

4 min readMar 20, 2024

--

Featured image showing side by side histogram charts compared to a histogram overlay chart

--

Dave Flynn

When does proper data validation become a ‘must have’ for dbt projects

There’s a threshold that’s crossed when proper data validation goes from something that’s ‘nice to have’, to something that’s a ‘must have’

4 min readMar 18, 2024

--

Screenshot of the machine gun scene from the 1966 Western, Django. The machine gun is ‘proper data validation’, the bad guys are ‘silent data issues’.

--

Articles for data and analytics engineers
Connect with In the Pipeline
Editors
Dave Flynn
Technical Advocate @ DataRecce.io — the data modeling validation toolkit for dbt data projects
Even Wei
CL Kao
clkao — Building PipeRider. geek. father. Co-founder @ {InfuseAI, g0v.tw}. worked on version control and decentralized data sync in previous lives.

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams