Tableau Prep Builder: Unions & Joins (Part 7 of 7)
If you're coming from Excel and still calling them VLOOKUPs, here's how joins and unions actually work in Tableau Prep.
- Unions stack tables on top of each other, combining rows from data sets with similar column layouts, while joins add columns by matching records side by side via a join clause.
- Tableau Prep colour-codes columns by their source data, so you can visually trace which rows or fields come from which branch and check the proportions in the summary view.
- The union step flags mismatched fields when column names don't align, letting you manually pair columns to merge them.
- Choosing the wrong join type or omitting part of the join clause can explode your row count beyond the input total, an immediate alarm bell for accidental duplication.
- You can run a flow and output to a Tableau hyper file (fast and compressed) or CSV for Excel, and run it from the output button, the output step, or the top toolbar with caching speeding up reruns.
0:00Okay, welcome back to the final video in
0:03this series, Tableau Prep for Excel users.
0:08This is, just looking back at my notes here
0:10, I think this is the one, two, three, four,
0:12five, six, seventh video in the series,
0:15seventh and final video in the series as
0:17well.
0:18And so today what we're going to be doing
0:20is covering joins and unions in Tableau
0:23Prep.
0:23Now in Excel, joins might be more commonly
0:26known as VLOOKUPs, but fundamentally I'm
0:29going
0:29to stick to calling them what they actually
0:31are because in the database world, or if
0:33you're
0:33working with a team of data engineers, this
0:35is the terminology they're going to be
0:37using.
0:38And Tableau Prep uses the similar concept.
0:41Let's open up a sample flow.
0:44And you'll have seen in Tableau Prep that
0:46there are two types of tools that we have
0:48not yet covered in the video series.
0:51And so we've left those till last.
0:55If I click on this tool, you'll see that
0:58this tool here is called the join tool.
1:01And the tool just before it looks very
1:04similar.
1:04Well, it doesn't look similar at all, but
1:06the interface looks slightly similar, but
1:09it's called the union tool.
1:11And you might have also noticed that if I
1:16was to remove these two steps, and remove
1:20this step as well, and I was to maybe just
1:24copy this step here and paste it to the
1:27left
1:28and drop that there, whenever you drag a
1:32tool onto another tool, you always get
1:35three options.
1:37And that is the ability to add it, which
1:39draws a line between the two tools.
1:41But notice you also get this union and join
1:44options.
1:45Let me just zoom in again.
1:46There you can see it, union and join.
1:49And that's deliberately why I've left this
1:51particular feature until last, because it
1:53requires a little bit of explanation.
1:55I'm going to link in the description a
1:57really good blog post by Diego Parker from
1:59the Information
2:00Lab Data School, where he talks about when
2:03you should use joins and when you should
2:05use
2:05a union instead.
2:07And he actually has some visual examples
2:10here, showing you the input files and the
2:12outcome
2:13that comes from that.
2:14So you can actually have a look at this and
2:16understand how the data works.
2:18So I highly encourage you to hop on out to
2:20that video and have a look at it.
2:22I'll go through something similar in Table
2:25au Prep, but I just want to sort of call
2:27that
2:27out as a further reading that you can use
2:29to sort of enhance your understanding.
2:31Okay, so now I'm back in Tableau Prep.
2:34I'm going to go ahead and remove everything
2:36up until this final point.
2:38If I click on my clean step, the first
2:40thing to do, I'm going to create a scenario
2:43here
2:43where we have a need to union our data.
2:47Okay?
2:48So what I'm going to do is I'm simply going
2:52to add a step on the top.
2:55I'm going to move that up, and I'm going to
2:57add another branch here at the bottom.
3:00So what this does is it creates a duplicate
3:02of our data.
3:03By adding that branch, we basically create
3:06two streams from one input.
3:07Okay?
3:09And what I'm going to do to make it easier
3:10to differentiate between the different
3:13columns,
3:13I'm going to just create a calculated field
3:16, and I'm going to call this branch two.
3:18Okay?
3:19I'm not going to do anything more than that
3:21.
3:21And I'm going to call this branch.
3:23Okay?
3:24And save.
3:25And I'm going to go to the top connection,
3:27and then I'm going to also create another
3:29field.
3:30I'm going to call the same column name.
3:32They're going to come together.
3:33And I'm going to call this branch one.
3:36Okay?
3:38And click save.
3:40Now what's really interesting about union
3:43ing is in real terms, the best way to think
3:46of
3:46it is if you take Excel and you have a data
3:49source that's been given to you where each
3:52tab represents a month, what you'd
3:55typically want to do if you wanted to do
3:57some aggregations
3:58is you'd want the data to all be in one tab
4:01.
4:01So you can very easily hit a particular
4:03column and aggregate using that column.
4:06Well, union is almost a physical
4:08manifestation of exactly that.
4:10You are taking two tables or two views with
4:14similar columns.
4:16Okay?
4:17Usually they normally have an identical
4:19column layout.
4:20And you're simply putting one on top of the
4:22other.
4:23That's why the icon for it over here is
4:25exactly that.
4:26It's a little bit like Django.
4:27You're kind of stacking things on top of
4:29each other.
4:30So if I zoom back out, and what I'm going
4:33to do here is I'm going to drag the second
4:36item here, the second branch, on top of
4:40this particular item.
4:42You'll see you get that sort of hint again.
4:45And now this is the really sort of bit you
4:48have to be careful.
4:49I want to explicitly drop it on the union
4:52option here that sort of goes orange when
4:54I'm actually hovering above it.
4:56You can see that happening there like that.
4:58Okay?
4:59I want to drop it on the join.
5:00I want to drop it on the union.
5:02And when I do that, Tableau will do two
5:05things.
5:06It will create a branch between these two,
5:09and it will drop in the union step.
5:11And I'm just going to bring that here to
5:13the center to kind of please my OCD.
5:16Okay?
5:17And the other thing you will notice is that
5:20each of these branches had a different
5:23color
5:23for the steps.
5:24And actually Tableau is pretty good at this
5:26.
5:26It will use a color consistently up until
5:28there's a meaningful change.
5:30So in this union step, if I look at my
5:33summary, any row where the header is orange
5:36is coming
5:37from this step, and any row where the
5:40header is green is coming from this top
5:43step.
5:43So I can even see visually how much of my
5:46data is made up of the different
5:48proportions
5:48of my data.
5:50Okay?
5:51And what I've basically done is I've just
5:54duplicated my data.
5:55I've basically put one on top of the other.
5:58Now another thing Tableau will do is it
6:00will create a new column called table names
6:03.
6:03And this is basically just the name of the
6:06relevant files that basically go into this
6:10flow.
6:11It's trying to create some sort of separ
6:12ator.
6:13But we actually created our own called
6:15branch.
6:16And you can see here, if I drag that into
6:18the left here, so it just comes into view.
6:21If I click branch two, you can see that
6:23that only refers to the second data set.
6:26And branch one refers to that top one.
6:29They both represent 50% of the data.
6:31That's why none of these charts are
6:33actually changing, because it's exactly 50
6:36%.
6:36So the proportion is the same no matter
6:38what I click on.
6:40Okay?
6:41So that's a very simple example of a union.
6:44Now let's go back to the second step.
6:47And I actually want to change this days to
6:49ship to something else.
6:50Let's say I had a typo.
6:51So I'll put D, I'll miss the A, Y, S to
6:55ship.
6:56Okay?
6:57So this is what I'll do.
6:58Okay?
6:59Now when I change that, no error will come
7:01up.
7:02But if I go back into my union step, you'll
7:04see something new.
7:05There's this section here on the bottom
7:07left which shows mismatched fields.
7:09These are essentially columns from the two
7:11data sets where Tableau Prep has not been
7:13able to find a matching column.
7:16Okay?
7:17So what I can do is I can actually click on
7:19the first item, days to ship, and then
7:22click
7:22a plus on the second one to tell Tableau
7:25that these two are actually the same thing.
7:29Okay?
7:30And then Tableau will take that and merge
7:32them together.
7:33Now that's technically not a change in this
7:36sort of situation.
7:38It's essentially just creating a
7:39relationship between the two columns and
7:41then completing
7:42the step that it already knew about, which
7:44was the union.
7:45Okay?
7:46And that's pretty much how unions work.
7:48They're very, very simple.
7:49They're very, very straightforward.
7:52Now joins, on the other hand, are a little
7:54bit more complex and require a little bit
7:56more thought because depending on the
7:58outcome you want to achieve, you might need
8:00to do
8:01joins in a slightly different way.
8:03So what I'm going to do is I'm now going to
8:05lean on the pre-built example that we
8:07already
8:07have in our flow over here on the left-hand
8:09side.
8:10And the reason I want to do this is just so
8:12I can walk you through the anatomy of a
8:14join
8:14and you can see how it works.
8:16And then we'll have a go setting one up
8:19ourselves a little later.
8:21Let's just drag this window up.
8:22I appreciate it's covering the flow
8:25entirely, but just bear with me.
8:27The settings pane for the join tool is a
8:29little bit more involved because it has a
8:31lot of
8:32information to communicate.
8:34You also have this summary view like we've
8:37had for aggregate functions and pivoting
8:39functions.
8:40It has that sort of similar two-column
8:42setup that tells you a little bit about
8:44what's going
8:45on and then you have your summary view yet
8:47again.
8:48Now the thing to note here is just like the
8:50union tool, it's going to color the columns
8:53based on which data source they came from.
8:56And unlike the union tool where you're just
8:58stacking two rows of data on top of each
9:00other,
9:00with joins what you're doing is actually
9:02adding the number of columns.
9:03So you're stacking data side by side.
9:06And the relationship that defines that is
9:09called the join clause.
9:10Okay?
9:11So the join clause here in this example on
9:14the left-hand side is actually the product
9:16ID on both sides of the data and the order
9:19ID.
9:20Those are the unique items between both
9:23data sets that allows me to figure out how
9:26this
9:26works.
9:27And so if you look at this flow, what's
9:29actually happening is you have all your
9:31orders here
9:31at the top and then you have your returns
9:34in a separate data source.
9:36And the returns are basically cleansed in
9:38this bottom step.
9:40And using this relationship where we get
9:42the product ID and order ID from our
9:44returns table
9:45here in purple on the left-hand side, and
9:49then we join that to our main data flow,
9:52which
9:53is here in yellow, and we create this join
9:55which then turns this step into a green
9:58output.
9:59So again, you have that visual
10:00communication here.
10:01You have a join clause on your purple and
10:04yellow, and that results in this green.
10:07And so it's interesting to see here the
10:09result green goes all the way across the
10:11top of this
10:11join clause, join results, and then here
10:14you have a mismatch of colors as these bits
10:17of
10:17information are coming from different
10:19tables.
10:20Okay?
10:21Now, if I move on to the next step, which
10:24is the join type, depending on the type of
10:27relationship you want, there are a concept
10:31called left joins, inner join, and right
10:34joins.
10:35You can get more advanced with full outer
10:37joins, but I'll skip that out of this video
10:39.
10:39Okay?
10:40Now, in this particular video, a right join
10:43is being operated here because it's
10:45basically
10:46saying that I want to take not just the
10:49matching fields between my returns and
10:53orders, but
10:54also everything in my orders.
10:56You see, if I deselected that section, this
10:59would now only return only the items that
11:02meet the classification of a return in our
11:05purple set and an order in our yellow set.
11:08So I'd just be analyzing returns and their
11:11corresponding orders.
11:13If I click on that yellow section, what I'd
11:16now be analyzing is every single order as
11:18well as any possibility that they might or
11:20might not have been returned.
11:22You can see that sort of happening here as
11:24I click on this.
11:25As I click on that, you'll see that this
11:27data set here comes from just being
11:29returned.
11:30And if I click on that again, it just now
11:33has nulls and returns as well.
11:35Now I'm making a much smaller sort of
11:39proportion of the data.
11:42The last thing is I also get a
11:43visualization showing me exactly what's
11:46going on.
11:46So it actually shows me the name of the
11:49data sets.
11:49Okay?
11:50These are the two data sets.
11:52So here I have my returns.
11:54I have 272 records.
11:57And then my orders set, I have 16,301.
12:02And what this is telling me is that my data
12:05set for returns has three returns that don
12:09't
12:09have a corresponding order.
12:11Okay?
12:12That's what this is basically saying.
12:14And everything else is actually contained
12:17in all my orders.
12:18That's why this icon and this square are
12:20sort of matching above each other, because
12:23it's
12:23actually telling me that these number of
12:25records have matched and these three have
12:28been excluded
12:28and therefore have not matched.
12:31So there's a lot of communication going on
12:33here.
12:34And then the last thing is the result.
12:36Now depending on the number and what
12:39behavior you're expecting to see, the 16,
12:42301 is correct.
12:44I shouldn't have more records than I had
12:46coming in.
12:47If I do, then this number will explode.
12:50Let me just show you what that actually
12:52means in real terms.
12:53Let's say I forget to create the
12:56relationship between order IDs and I des
12:59elect that.
13:00What Tableau is now going to do is it's
13:02going to try and match every single product
13:05ID with
13:06every single product ID.
13:08And it's going to completely ignore the
13:09relationship of the orders.
13:11Okay?
13:12And so I actually end up getting 16,848
13:17rows rather than 16,301.
13:20Now if we think about that for a second, we
13:23were only analyzing orders and their
13:26returns.
13:27So how can I possibly have more records now
13:30than I've ever had orders, even if they had
13:33a return?
13:34And that should be immediately an alarm
13:37bell to you.
13:38Because this group of records, this here in
13:41yellow, I've just clicked on them and this
13:44is essentially the records that don't match
13:48.
13:49Or they match but they've been duplicated
13:51because they match multiple times.
13:53Okay?
13:54And so if I go back up to the top and I add
13:57that join clause of order ID, I just do
14:00this
14:00by typing order ID and order ID on the
14:04right-hand side.
14:07That yellow section now becomes much, much
14:09smaller.
14:10And I can click on these and actually see
14:13what's going on.
14:14This is the power of Tableau Prep.
14:16I'm doing something really, really advanced
14:18here and I'm seeing the outcome in real
14:20time.
14:21I'm actually able to see if I'm making
14:22mistakes as well.
14:23Rather than doing something like a VLOOKUP,
14:26making a mistake and having to reprocess
14:28that
14:28VLOOKUP again or manually fiddle the lookup
14:31again so that I can try and see what's
14:33going
14:33on.
14:34And last but not least, obviously, you can
14:37click on the results and see sort of the
14:39output.
14:40And then that output carries on.
14:42Okay?
14:43So we've covered unions and we've covered
14:46joins.
14:46Now, the key thing is this has been a very,
14:49very basic introduction.
14:51If you wanted to, you could actually spend
14:54half a day learning about unions and joins,
14:56especially if you're studying some topics
14:59like SQL where you're working with
15:01databases.
15:02These are some of the most fundamental
15:03principles that can really make or break
15:05your day if
15:06you work in business intelligence.
15:07Okay?
15:08So I really encourage you to take a deeper
15:10dive into these concepts, really understand
15:13the whole range of joins and sort of
15:15possibilities of what you can do and really
15:18understand how
15:19that works for you and your particular
15:21business question that you're trying to
15:24answer.
15:24Right.
15:25This is the last video.
15:27So it would hurt for me to not have shown
15:29you how to output a file in Tableau Prep.
15:32We haven't actually done that throughout
15:34the entire series.
15:35And so what we now need to do is we've got
15:37our flow here.
15:39We've got a union step.
15:40We've got a join step here.
15:42And what I'd like to do is actually add an
15:44output.
15:45So what I'm going to do is just going to
15:47click add output.
15:48And I'm just going to export this as an
15:50Excel file.
15:51Now you'll notice here on the bottom left-
15:53hand side that I've got some output options.
15:56I've got a file here.
15:57I'm going to try and save it to a specific
16:00location in my machine.
16:01I'll just zoom back out.
16:03I'll go to my desktop and I'll just put it
16:05in the SunTidal folder and I'll hit accept.
16:09Now the default file type is a hyperfile.
16:12This is the Tableau data format.
16:14It's really well optimized.
16:15It massively compresses the file.
16:17And if you're working in the Tableau
16:19platform, it's going to be the fastest way
16:21to work with
16:21any data that you export from Tableau Prep.
16:24However, if you need to open it something
16:26like Excel, select the comma separated
16:28values
16:29option and run the flow.
16:32When we run the flow, Tableau actually runs
16:34the entire flow from start to finish.
16:37And notice it's actually very fast.
16:38It's done all of those transformations in
16:41four seconds.
16:42Okay?
16:43And if I hit done, it goes back to the view
16:46.
16:46Now that's not the only way I can actually
16:49run the flow.
16:50If I zoom in here, you'll see there's a
16:52little play button here on the output
16:54option.
16:55If I click that, you'll see this new window
16:58pop up.
16:59And this is because I'm running this for
17:00the second time.
17:01It's now asking me if I want to overwrite
17:03that file.
17:04So if I hit replace, it will do exactly the
17:07same thing and it will rerun the data and
17:10it will actually run it a little bit faster
17:12.
17:12The reason it's a little bit faster is
17:14because Tableau is actually caching some of
17:16that information.
17:17So it's not actually having to run the
17:19entire flow from scratch.
17:21It's using some of the computations it's
17:23already done and just outputting the file a
17:25little
17:25bit faster.
17:27The last place you can run a flow is on the
17:29very, very top here.
17:30So if you click on this play icon at the
17:33top, you'll get the same outcome again.
17:36And here you'll see it's outputting it in
17:38three seconds.
17:39So there's three ways of running a flow.
17:41The output option itself, I'll hit cancel.
17:45If you click on the output, you've also got
17:46this option here on the bottom left to just
17:48run the flow.
17:49And then at the very top, you've got the
17:52ability to run more parts of the flow.
17:54Now if you add another step, let's say we
17:57add an output here because I want a copy of
18:00my data before it's been duplicated.
18:02You'll notice that whenever I add a second
18:05step, this play icon gets a dropdown menu.
18:08So if I go back, you'll see that it loses
18:11it.
18:11That dropdown menu disappears.
18:13If I go forward, it reappears.
18:16And now if I click on the dropdown, I can
18:18actually choose to run just one of the
18:20outputs.
18:21Or I can hit this play icon at the top, and
18:24it will actually generate both outputs.
18:27You can see here that it's got two outputs,
18:29and it's going along generating rows.
18:31And again, it did two files in four seconds
18:34.
18:34Okay?
18:35So that's basically it.
18:37That's Tableau Prep.
18:38If you're coming from Excel, hopefully you
18:40'll see that it's a much more flexible tool
18:42for
18:42doing what you do.
18:44If you work on the Tableau platform, this
18:45is going to become an invaluable tool in
18:47the
18:47future for making sure that your data is
18:50really, really strong and it's cleansed and
18:53ready
18:53for great visualizations in Tableau.
18:57Hopefully you found this series very useful
18:59.
18:59If you have, drop a comment below.
19:02Give us some comments.
19:03Let us know the kind of content you'd like
19:05to see.
19:05If you'd like some more help with Tableau
19:07Prep, let us know what kind of videos you'd
19:09like to see.
19:10It's a new product, so it's a little bit
19:12difficult to cover the new features because
19:14they come
19:14out extremely fast.
19:16So let me know what kind of content you'd
19:18like to see for Tableau Prep as time goes
19:20on, and hopefully I'll catch you in the
19:22next video.
19:22- Thank you.
Diego Parker’s Post: http://j.mp/3aWVooaIn this last video of the series, I walk you through how unions and joins work in Tableau Prep builder. In excel this would typically be a manual operation leaving you to human error if you incorrectly configure a V lookup or incorrectly union two data sets. Tableau prep gives you a fast easy and safe way to make sure the relationships in your data make sense and it makes it easy to spot scenarios where your matches don’t work the way they should. -----Join my Discord Server. https://discord.gg/shBuxXr it’s a little sparse at the moment but hang in there.