tl;dr PipeRider now supports dbt state! You can profile and run data reliability tests on only the tables that have been modified as part of your dbt state
One of the great features of dbt is node selection and the ‘state’ method. Using state, you are able to specify a subset of models (or other resources) to work on. For instance, you might use state:modified
to only build your dbt models that have changed.
The logic behind state is that there’s no point rebuilding everything if you’ve only changed part of your project. Likewise, if you’re using a data quality tool with your dbt project (and you should), then the same logic should apply - you only want to run your data quality and reliability tests against the modified models.
The latest version of PipeRider (0.14), the open-source data reliability tool, adds exactly this feature. If you’re a dbt user and you’re not using PipeRider, you really should check it out. PipeRider adds profiling and data assertions to a dbt project with zero config.
Here’s a video that shows it in action, or scroll down for the relevant commands.
Run data profiling and assertions on dbt state
It’s really straightforward to use. Let’s say you’ve modified a dbt model and then built your project specifying only the affected models:
dbt build --select state:modified+ --state target
Now, when you run PipeRider, all you need to do is specify the dbt state using the --dbt-state
option:
piperider run --dbt-state target
PipeRider will only profile and test models that are included in the specified dbt state, saving you time and computing resources.
Compare data profiles
PipeRider also has a handy feature to quickly compare data profile reports:
piperider compare-reports --last
You’ll be prompted to select the reports to compare or, if you specify the --last
option, it’ll automatically compare the last two reports - great!
Taking it a step further, you can specify to only compare tables that appear either in the base or target report. This ties in nicely to dbt state because you might not want to include all of the unmodified tables in the comparison report.
To specify which report’s tables to limit the comparison to, use the --tables-from
option.
piperider compare-reports --last --tables-from target-only
This command auto compares the last two reports, and only compares tables that appeared in the target report. Which, in this case, would be the report that only includes tables modified as per the dbt state. This is perfect if you want a report that only focuses on the comparison of modified models before-and-after your latest changes.
Get started with PipeRider
As I mentioned above, if you’re a dbt user then PipeRider should be your go-to data reliability tool. If you already have a dbt project, all you need to do is initialize PipeRider inside the project and your data source settings will be automatically detected:
# inside your dbt project folder
piperider init
InfuseAI is solving data quality issues
InfuseAI makes PipeRider, the open-source data reliability CLI tool that adds data profiling and assertions to data warehouses such as BigQuery, Snowflake, Redshift and more. Data profile and data assertion results are provided in an HTML report each time you run PipeRider.