Relationships with Tableau Prep? Now it is possible! (in beta)

Share
Tableau Prep flow creating two published data sources which then get used in a relationship!
Tableau Prep flow creating two published data sources which then get used in a relationship!

For years, Tableau developers have lived in an awkward middle ground.

Tableau developers have done great work only to hear, “That calculation should really be in the data source” or maybe, “Why is this dashboard so slow?”

To which the answer, in frustration, is often, "I would… if I had access to the database.”

After performance recorder sessions, developers often find the workbook is spending half its life recalculating COUNTDs, FIXED LODs, string cleanups, and table calculations over and over again.

That tension has always existed in Tableau.

We want:

  • clean reusable logic
  • performant dashboards
  • flexible filtering
  • natural-grain data
  • reusable semantic models

But historically, we had to compromise somewhere. And then Tableau Prep came along in 2018 and helped us out in a big way.

The Missing Piece of Tableau Prep

Tableau Prep has always been powerful for:

  • cleaning messy data
  • standardizing dimensions
  • building row-level calculations
  • creating aggregate tables
  • scaffolding datasets
  • generating reusable metrics

The problem was never Prep itself. The problem was what happened after Prep.

Prep could publish data sources into Tableau Cloud or Server… but those published data sources were effectively dead ends.

Relationships came along in 2020, making true data modeling available in Tableau. However, we haven't been able to extend Prep models using relationships, because it wasn't possible to create relationships between published data sources. (If you set your Prep outputs to database tables instead of published data sources, you could create relationships between those tables. However, I expect that most people who had that level of database access weren't using Prep.)

So if you pushed logic into Prep, you often lost the flexibility that modern Tableau data modeling provides.

That created a tradeoff:

  • push calculations down for performance
  • or keep them in the workbook for flexibility

This is a trade-off I previously wrote about on The Flerlage Twins blog in 2023.

Now, composable data sources change that. The trade-offs mostly disappear. I think this is a much bigger deal than most people realize.

Why This Matters

A lot of Tableau performance problems are really modeling problems.

We ask Tableau to do expensive work repeatedly at runtime:

  • string calculations
  • FIXED LODs
  • COUNTDs
  • ranking logic
  • table calc gymnastics

Every interaction becomes another query cost.

As Candi Munroe and I discussed in our TC26 presentation on this topic, Tableau does not generally materialize calculations in workbooks or published data sources. Those calculations are frequently re-executed as users interact with dashboards.

That means:

  • slower dashboards
  • more warehouse compute
  • harder-to-maintain workbooks
  • logic duplicated everywhere

The natural answer has always been to push the logic down. But if pushing logic down forced you into giant flattened joins, you lost many of the benefits relationships introduced in the first place.

Relationships Changed Tableau Modeling

Relationships solved several massive problems:

  • keeping data at its natural grain
  • avoiding exploding joins
  • reducing duplication
  • simplifying multi-fact analysis
  • improving filtering behavior
  • enabling “less data, more answers”

But there was still a gap. Prep outputs and relationship-based semantic modeling were living in separate worlds.

Composable data sources finally bridge that gap.

So What Are Composable Data Sources and How Do They Enable Relationships with Prep?

In simple terms:

You can now relate published data sources to other published data sources.

Among many other use cases, it means Prep-generated outputs can become reusable building blocks inside larger semantic models.

You can now:

  • create reusable aggregate tables in Prep
  • publish reusable cleaned dimensions
  • build certified metric layers
  • relate them together later
  • extend published data sources without rebuilding everything

This effectively turns Tableau into a much more modular semantic modeling platform.

The New Opportunity

The real breakthrough here is not just “Prep supports relationships.” The breakthrough is that we can finally separate modeling responsibilities properly.

For example:

In Tableau Prep

  • string cleanup
  • row-level calculations
  • reusable FIXED-style logic
  • aggregate tables
  • ranking and other compute-intensive, table-scan calculations
  • data quality work

In the Workbook

Keep:

  • interactive aggregations
  • flexible filtering
  • lightweight calculations
  • visualization logic
  • table calculations where they actually make sense

That’s a far cleaner architecture and one that scales organizationally.

A Practical Example

Let's look at the same fictional call center example we used in the TC26 presentation.

Executives are looking for:

  • employee ranking by manager by month
  • time since employee took their first call
  • drill-through to detailed calls

Let's look at the transactional table that captures calls:

This is a simple table, but it shows the basis of what a lot of these tables could look like. (A "real" table might not be one row per call but have status updates - Prep could help solve that "grain" issue too. For now, we will focus on this more simple, near real-world example to make the case.)

Right away, we can see some of the problems with bringing a table like this into Tableau Desktop (or as a published data source):

  • a messy text field
  • the need for relationships with employee and manager dimension tables (or blow up this table by joining in those string fields)
  • Resolved being neither binary (for math) nor a true boolean (yes/no not true/false)
  • the need for a multi-pass query (or in Tableau's case, a nasty table calc) to get the rank by employee by manager by month to work
  • the need for an LOD to get the first date an employee took a call

How can we solve these with Prep? First, let's get the first date each employee took their first call (this assumes the table goes back to the beginning of calls, of course):

This calculation interface is so good! You can see the impact of the calculation visually, without needing to type anything.

This is the equivalent of { FIXED [Employee ID] : MIN([Date]) } in Tableau Desktop/published data source. In Prep, it is both faster as the calculation will now be materialized as a column that doesn't need to be recalculated unnecessarily on each sheet and easier to write!

Next up we have to clean up that Resolved Calls field to make it easier to do math on it:

Let's clean up the messy text field in two steps. First, using the machine learning algorithm on Group --> Pronunciation:

Then we can manually group "Prod Defect" and "Prod Not Working" into a single field by manually typing "Product Defect" over each:

We now have a great new fact table (a detailed table of measures with keys for dimension tables) that we can use in Tableau without needing to do the LOD, text cleanup, and boolean-to-binary calculations in Tableau. If we think about these calculations, they all happen at the row level. That means they cannot be filtered "lower" than this level and therefore make sense to be calculated "before" bringing them into a workbook.

Let's close this step by creating an output to a published data source on the same Tableau Cloud site:

So far, we have addressed a lot of our data cleanup and work that is going to help with dashboard performance.

However, trying to get the bump chart of rank of employee by manager by month for resolution rate would be very complicated in Tableau. And frankly, if this had real-world levels of detail, it just won't work. It would first be slow, then it would either error out, finally "locking up" your browser. The reason is that RANK is a table calculation and for a table calculation to work at this level, Tableau would have to bring in a lot of data to the browser memory... again, in this case, too much data.

A similar use case was created as a Workout Wednesday challenge by Lorna Brown last year. It is not the exact same but shows the complexity to do this with table calculations, even assuming the amount of data we had would work.

Prep makes this so much easier! First, we are going to create a fork from our flow and create an aggregate table. This allows for the pre-calculations of complex aggregations:

Aggregations are easy from this point. It's all drag-and-drop and reads like a sentence:

This UI has grouped fields (dimensions) and aggregated fields (measures). The way this reads is "Group these measures by these dimensions." In our case, "Sum the resolved calls and number of rows by manager and employee truncated to the level of month."

Now you can see what I created the binary field. I can just simply sum up the ones and divide it by the row count (i.e. all calls) to get my resolution rate calculation, which we will do by adding a clean step:

OK, we have our resolution rate calculated but we still need the rank of each employee by manager and month. We don't want to try to do this in the workbook for the reason mentioned above. Tableau Prep also makes this easy:

Which gives us then brilliant UI:

This also reads like a sentence: For every employee, rank the resolution rate by (or restarting) manager and month. We even get a nice histogram that shows the distribution of rank!

Let's finish up the flow by creating a second output from the same flow:

To this point, everything we have done in Prep is not new. These features have been in Prep for years. What is new is not even in Prep - it is in the way we can bring these outputs together. In this case, the brilliant thing it allows is speedy (and simple) answers on rank while still allowing for drill down to details!

Before we start to build this data source in a workbook, imagine that we also have three published data sources that we will also use: an employee table, a manager table, and a date table.

Let's open a new workbook and connect to the resolution rate data source (after publishing the flow and running the outputs):

On our data canvas, we see the new feature - Add Published Data Source. Let's do that and add the cleaned up fact table:

So there is no way to directly create relationships between these tables. That makes complete sense. What ties them together are shared dimension tables. (If this is not clear, I recommend going back to the post on multi-fact relationships.) As we already have published data sources for employee, manager, and date, we can add those one by one and then created the appropriate relationships with each of our base tables, until our final model looks like this:

One thing to remember... the resolution rate rank table is at the month level so the relationship on the Date Dimension Table should be (and remember to watch the date level if comparing across tables):

One of the great things about this feature is we could either decide to publish this as a new data source (making it available to others, including agents) or we can keep it in our workbook, avoiding an explosion of published data source, yet keeping calculations where they should be - in certified data sources available to others.

Finally, we can now create our dashboard that:

  • is highly performant
  • allows drill-down to details
  • either the detailed or aggregate table can be used independently or brought together in the same worksheet (at the month or higher level)
  • no trade offs of pre-aggregated tables eliminating details just to get performance - the best of both worlds

Here is what the dashboard could look like:

A monthly rank of employees filtered by manager. Interactive with dashboard actions. The top chart comes from the aggregated table, the bottom two from details. Dashboard actions allow it to be super responsive as we are always filtering the details by employee and month!

This is a Natural Extension of Where Tableau Has Been Heading

When you zoom out, this fits perfectly with Tableau’s broader evolution:

  • blends
  • published data sources
  • a data Prep (ETL) tool
  • relationships
  • multi-fact models and shared dimensions
  • VizQL Data Service
  • view data model
  • better answers from LLMs, internal and external to the platform (coming posts on this)

The platform has been steadily moving toward reusable composable semantic assets, if slower than we would have hoped.

Moving away from giant monolithic data sources.

Getting logic down the stack instead of trapped in workbooks - not duplicated calculations everywhere.

Now we have certified building blocks. Governed, trusted, reusable data with the flexibility of using Tableau Desktop with flat files.

My Recommendation Going Forward

I increasingly think teams should consider building and certifying:

Fact Tables

Clean transactional detail

Aggregate Metric Tables

Reusable pre-calculated business metrics

Dimension Tables

Clean conformed dimensions

Then let analysts and dashboard developers compose their own semantic models from those pieces.

That’s a much more scalable future than:

  • massive workbook logic
  • giant flattened extracts
  • duplicated calculations across hundreds of dashboards

Final Thought

For a long time, Tableau Prep and relationships were these two great capabilities that we frustratingly couldn't bring together.

Prep was for shaping data. Relationships were for modeling data.

Composable data sources finally connect these together.

And I suspect we are only at the beginning of what this enables.