Wildcard union enhancements in Prep - New in Tableau 2022.2
This wildcard union enhancement is one of those features where you see it and just think, 'About time, thank God.'
- To union multiple files, connect to a file then go to the Tables tab and select 'Union multiple files' rather than the connections pane
- File filters offer five criteria: file name, file size, creation date, modification date, and the folder to search, each with range, relative or rank sub-options
- File name filters use wildcards (e.g. *historic*) and a match/doesn't-match toggle, so be precise as 'historic' won't catch 'historical'
- Excel files get an extra worksheet filter, letting you pull specific sheets like 'orders' from across multiple workbooks
- The feature only works for CSV, Excel and statistics files, not database tables, and you can confusingly connect to one file but pull data from another in the same folder
0:00In Tableau 22.2, Tableau Prep now allows
0:03you to use an increased list of file
0:05attributes
0:06to do wildcard unions inside of Tableau
0:08Prep. Let's get stuck in. Okay, so we're
0:11here inside
0:12of Tableau Prep 2022.2. I'm going to go
0:15ahead and connect to a CSV for the first
0:17example.
0:18Let's go ahead and select a text file over
0:20here on the connections pane on the left
0:22hand side,
0:23and for this I've actually got some crime
0:25data and we're going to connect to just one
0:26of these files
0:28and once we've done that we're going to see
0:30what this feature can do for us. Now, when
0:33you connect
0:33to a file in Tableau Desktop, when you
0:35connect to a CSV, typically you might be
0:38used to seeing the
0:39other CSV files that you find here on the
0:41left and you might think that's where you
0:43need to go
0:44to for this particular feature to see how
0:46to union other tables that are essentially
0:48in the same
0:49folder. Instead, you need to go to the
0:51tables tab just over here and select union
0:54multiple files and
0:55that enables this feature and you can see
0:57the new interface that we've got. You've
1:00essentially got
1:01two sets of filters. In this first example
1:03for CSV, we can only really see one of the
1:05filter
1:06types, that's file filters and that's
1:07essentially what this feature is about.
1:09Essentially, this
1:10filtering area allows you to add a criteria
1:13to tell Tableau Prep which of these CSVs
1:16you want
1:17to bring in. So let's start by playing
1:19around with this. First of all, let's go
1:21ahead and select
1:22add file filter and you can see that I get
1:25five criteria; file name, file size,
1:28creation date and
1:28modification date. The other criteria that
1:31we get, that's why I said five, is actually
1:34where to search
1:35up here. So those are essentially the
1:37criteria. You've got the four file
1:38attributes as well as the
1:40folder in which you're searching and that
1:42makes five. Now for each of these, you do
1:44get a subset
1:44as well. So for example, if we go back here
1:47to file size, I can do it by range or by
1:49rank. I can
1:50do it by creation range, relative and rank
1:52and same with modification date. So
1:54essentially, there's a
1:55type of filter type for each particular
1:58data attribute. So file size is just a
2:01range because
2:02it's looking at the size and dates have
2:04ranges relative and rank because you can
2:07sort using the
2:08date as well. So let's also look here at
2:10the search box at the top because when I
2:13select the
2:14search in folder, it actually shows me like
2:17a hierarchy of the folder structure that I
2:19'm actually
2:20currently in. So if I select crime, which
2:23is the subfolder I'm in, that crime folder
2:25is inside a
2:26data folder called prep sample data, which
2:29is in a folder called 2022-2 demo, which is
2:32on my desktop,
2:33which is on my computer, which is owned by
2:36me. So this criteria allows you to sort of
2:39go up and down
2:40the directory and really sort of customize
2:42things. The further up you go, the more of
2:44your computer
2:44you're searching. And in essence, the more
2:47trouble you could get into if you tell
2:49Tableau Prep to
2:50search all subfolders. I'll stick it to the
2:52crime folder now. And let's just have a
2:54look at our first
2:55filter. We'll call this file name and I'll
2:57show you how that works. And the file name
3:00filter has
3:00a match criteria at the top. So you can
3:02either find something that matches or doesn
3:05't match. So
3:06it allows you to search in both ways,
3:07define what you're looking for or define
3:09what you want to
3:10avoid. And that way it will pick up
3:12anything that matches that. So let's look
3:15at matches. And what
3:16we can see here is we've got a file that
3:18talks about crime data. And in essence, it
3:21's got a
3:21specific sort of naming structure for
3:23different types of files in different areas
3:26. So let's say
3:26I wanted to go and find all the historical
3:30crime data. There's essentially two here,
3:33one by LSOA
3:34and one by Ward. And so you might wonder,
3:36well, how do I just go find just the
3:38historical data
3:39files and look at them? I don't know if
3:41whether they're in the same data structure,
3:44I don't know
3:44anything about them, but I just want to
3:46bring them all together. Well, for this one
3:48, what I would
3:49probably do is type a star, then I would
3:52type historical or just historic, actually
3:55historic,
3:56and then enter another star. And what you
3:59're basically saying is go and find anything
4:01that has a term historic anywhere in the
4:04file name. And if I hit enter, you'll see
4:06that it
4:07returns those three files. And this one has
4:10historical, this one has historical,
4:13this one has historic. That's why I typed
4:15historic. If I'd typed historical, I would
4:18not
4:19have necessarily got back that last one. So
4:21let's go ahead and enter. And you can see
4:22the last one
4:23doesn't match, because the term it's
4:25looking for is not historic, it's
4:27historical. And that
4:28was only, you know, referencing historic.
4:31So that's how this files criteria works. In
4:34something
4:35like Alteryx, you're also able to say, look
4:37, go and get every CSV by just putting star.
4:40csv.
4:40In this particular case, I could just give
4:42it that criteria, I could just say star.csv
4:44,
4:44and hit enter, and it would show me
4:46everything in the folder. But that's
4:49basically as good as just
4:50not having a search criteria. So you don't
4:53need to do that, you can just go ahead,
4:55look at the
4:55folder. And if you're happy with everything
4:57, you can just hit apply, and it will go and
4:58pull
4:58everything in. Now, the thing to remember
5:00here is you're unioning these files. So you
5:03are going to
5:03need to make sure you've got a common data
5:05structure between them. Otherwise, you're
5:08going
5:08to create yourself a bit of a challenge
5:10inside of Tableau Prep to clean this up and
5:11make it work
5:12really, really well. So that's the first
5:14filter. That's a file name filter. The file
5:16size filter
5:17is really, really good. Again, this isn't
5:19rocket science, when you choose range of
5:21sizes, you also
5:22get the ability to use less than or less
5:24than or equal to, greater than or greater
5:26than or equal to.
5:27That's just part and parcel of it. What I
5:31like about this filter is that they do
5:33correctly
5:34sort of aggregate up the file specification
5:37for you. So kilobytes to megabytes to gig
5:39abytes to
5:40terabytes. But weirdly, when they show you
5:42the size down here, it's only shown in
5:44bytes. So I'm
5:45thinking this is a bug. This is something
5:47to do with launch day. If not, this would
5:49be a really
5:49nice sort of error of improvement. I can't
5:52do bytes to kilobytes to megabytes to gig
5:54abyte
5:55conversions in my head, specifically
5:57because on some computers, let's say if you
5:59're on a Mac,
6:00it counts megabytes and gigabytes
6:02differently to how a Windows computer
6:04counts it, which is
6:05in multiples of 1024. So it's just
6:08something to kind of bear in mind that this
6:10would be a really
6:11nice sort of thing to use. But nonetheless,
6:14the search criteria is pretty flexible, you
6:16can go
6:17ahead and say, "Look, bring me back
6:19anything that is going to be at the moment,
6:21it's zero to one
6:22terabytes." Again, I would love this to pre
6:24-select the largest file size that it can
6:26see. So for
6:27example, here, if I say in megabytes, let's
6:31say, "Give me back everything that is up to
6:3430 megabytes."
6:35Hit enter, you'll see that everything up to
6:3730 comes back. And it's quite dynamic, it's
6:39quite
6:39real. As you type, it doesn't update in
6:41real time, but when you hit enter, it
6:44updates again,
6:44and you can see sort of files sort of
6:46dropping off as I do that as well. So that
6:49's the criteria for
6:52file size. Let's go ahead and delete that.
6:54Add another one, creation date and
6:55modification date.
6:57These are just date ranges. They'll both do
6:58the same thing. So I'll just show you how
7:00one of them
7:00works. Range of dates allows you to pick a
7:03start date and an end date. Again, I don't
7:05understand
7:06why this starts at 1972. The hack here is
7:09not to do the drop down and then start sort
7:11of clicking
7:12through. You might come here and try and
7:14click on this to type it. You can't. The
7:16trick here is to
7:17just type the date in here if it's going to
7:19be faster. So you can just go ahead 2021,
7:22and then
7:22you start from a much, much more reasonable
7:24sort of timeframe, and you can get going.
7:27The table
7:28here gives you a high level overview of
7:30those dates if you want to be able to
7:31filter. So that's
7:32really, really nice. But in essence, you
7:35can go ahead and choose whatever search
7:37criteria works
7:38for you. Just remember that everything that
7:41gets found here is always coming back. When
7:44you go
7:44ahead and hit apply, this will essentially
7:46pull everything in the view back into your
7:49prep flow,
7:50as well as the file that you basically
7:52selected. So that's a really nice touch, a
7:56really nice way
7:56of working. Now, the other thing you can do
7:59, and it's probably not as obvious, and this
8:02is probably
8:02one of those places where this little left
8:04hand side interface is a little bit small
8:06now. So what
8:07I might do is expand it a lot larger. I don
8:10't think I can change the middle table
8:12height. I'd
8:13love to sort of move that down, but it
8:15feels a little bit cramped. So one trick I
8:17do is I just
8:18minimize Tableau prep by hitting control
8:20minus or command minus on a Mac. What that
8:23does is because
8:24Tableau prep is a web application, it
8:25treats it like a browser, and it just gives
8:27you this sort of
8:29illusion of creating more space because it
8:30makes everything smaller and spreads it out
8:32. So now it
8:33looks like I'm in a higher resolution
8:35screen, which isn't probably good for this
8:37video.
8:37And you following along, but hopefully now
8:39you can see I've got a bit more space. I
8:42can go back
8:42to that more compressed view and I can
8:44continue to add more filters, whether it's
8:46by file name
8:47or creation date. And you can see these are
8:50just stacking up. And as they stack up, you
8:53basically
8:54reduce the list, a list in this case, to
8:56nothing. So nothing will come through. But
9:00nonetheless,
9:00that's how it works. Okay. So I think I've
9:02really labored that point really, really
9:05hard. That was
9:05the CSV. Let me show you what happens when
9:07we connect to an Excel file. Okay, so let's
9:10connect
9:10to an Excel file. I'll go back here to the
9:12top, I'll hit plus, and we'll go ahead and
9:15select a
9:15Microsoft Excel file. And when we do this,
9:17let's just go back one. I think we need to
9:19go back to
9:20the prep sample data. And in this
9:22particular case, I've got three files, Last
9:25Efem Data, Bookshop,
9:26and Sample Superstore Sales. I'm going to
9:29select Sample Superstore Sales for this
9:31particular one,
9:32not the data everyone loves, but I'm going
9:34to drag it in anyway. And we're going to
9:36connect to it,
9:36go to tables, go to the same option, select
9:39union multiple tables. And now you can see
9:42the options that we have. Let me make this
9:43larger again. Now, the reason I want to
9:46show you this is
9:47because an Excel file has an additional
9:49search criteria that other files won't. So
9:53here you can
9:53see there's a worksheet filter as well as a
9:56file filter. So the file filters are what I
9:58've showed
9:59you already, file name, file size, created,
10:01modified date, all that jazz. Worksheet
10:04filters,
10:05specifically for Excel files, allow you to
10:08also search a criteria for worksheet naming
10:10. And in
10:11this particular case, this will only have
10:13the worksheet name in this little dropdown.
10:16So I can
10:17then go ahead and put in specific workshe
10:18ets. So if I just want to bring in the
10:20orders table,
10:21I can just go ahead and type that and it
10:23will only find the orders table in any
10:25worksheet if it's
10:26included in that Excel file. Okay. Now, the
10:29other thing this did when we came to this
10:31option,
10:32when you connect to an Excel file, this is
10:34going to, I think, catch a few people out,
10:36is that it actually prefilled the file name
10:39options here. So at the moment, what we're
10:41looking at is just one file. And you might
10:44think, "Oh, I can't go and look and open
10:46seven Excel
10:47files at the same time." That's actually
10:50not true. If you go ahead and clear this
10:52file search filter,
10:53you'll see that it thinks about it for a
10:55little bit longer because what it now has
10:57to do is go
10:57into the directory and file every Excel
10:59file. And once it's found every Excel file,
11:02look at
11:02every sheet in every Excel file, then pull
11:04through the data. And that's what you see
11:07here. You can
11:07actually pull through the different sheets
11:09inside of multiple Excel files. And that's
11:12really, really
11:12nice. So you can see here, I have some data
11:15for this bookshop and it's actually got
11:18sales Q1,
11:19Q2, and Q3, and Q4. That's actually what I
11:22'm looking for. So let's go ahead and add a
11:24worksheet
11:25name criteria. And we'll just say anything
11:28that starts with a capital sales and ends
11:31however we
11:32want and hit search. And this is a little
11:35bit slower because I think it has to
11:38dynamically
11:39search those files again and again and
11:40again. And now you can see I've pulled the
11:42sales data from
11:44the beta bookshop, even though I connected
11:46to the sample superstore sales. And now it
11:49's going to be
11:50able to work. So let's hit apply. And once
11:52we've hit apply, it's going to go ahead and
11:55bring all
11:55those in. And you'll see Tableau Prep
11:57update here on the right hand side. And now
11:59we've just got the
12:00sales and it's sort of pruned everything
12:02down. And when we go ahead and look at the
12:04orders,
12:05now it's connecting to that beta bookshop
12:06file, even though I'm connected to sample
12:08superstore
12:09sales. It's a really, really weird sort of
12:13scenario to be in. I have found that this
12:15takes a little
12:16bit longer because again, it's having to do
12:19this by talking to the files. And it does
12:21what you'd
12:22expect Prep to do, which is to give you the
12:24table names and the file that the file is
12:26coming from.
12:27So this is something that we'll do every
12:29single time. Even if it's a CSV, it will
12:32tell you the
12:32table name, which is essentially going to
12:34be the file for CSV. But for this in Excel,
12:37you're going
12:37to get all the different file names if you
12:39do connect to multiple files over here on
12:42the right
12:42hand side. So that's a really, really nice
12:44touch. Now let's say you want to connect to
12:46a database.
12:46I've actually jumped ahead here and I've
12:48connected to my Snowflake instance. You
12:51might think you get
12:52these abilities with tables in a database
12:54like this. So you might want to maybe
12:56dynamically
12:57search for multiple tables like here, union
12:59them together and bring them in. This is
13:01actually not a
13:02currently supported feature. So this
13:05feature only works for CSV files, Excel
13:08files and statistic
13:09files. Those are basically the only file
13:11types that this will work for. Hopefully in
13:13the future,
13:13you get this capability with other tables
13:15inside of a database, but it's
13:17predominantly targeted at
13:19flat files essentially. So that's going to
13:21be the only parameters where this works. It
13:23won't work
13:24as you think with a database here. I'm
13:26connected to Snowflake with some sample
13:29data and you see,
13:30I don't get any option to bring it in. I
13:33just get the old sort of sampling method
13:37capabilities and
13:38I can sort of make some changes to this
13:40particular window here on the right hand
13:43side. But in terms
13:44of settings, this is really all I'm getting
13:46. I don't have much more than just the
13:48ability to
13:49select one thing at a time. And so that's
13:51pretty much it for this particular feature
13:53with databases.
13:54So unfortunately not much to add. A couple
13:57of things just to be sort of aware of. When
14:00you do
14:00connect to something using this feature and
14:02you get it working and everything's good,
14:05you do get
14:05this little plus icon that lets you know
14:07that that is actually bringing in multiple
14:09tables and
14:10unioning them together. You don't get that
14:13if you don't do that. So just as a simple
14:15example,
14:16if I select on this file and I was to go
14:18back up to the top and I was to select a
14:21single table,
14:22you'll see that the icon disappears. And if
14:25I go ahead and select that option there,
14:28only once I've
14:28actually successfully hit apply, does the
14:32icon appear here for you to see. So that's
14:35an icon
14:36that only happens again when you
14:37successfully complete the operation. Nice
14:39little quirk to be
14:40aware of. Another thing I just want to
14:42highlight is that it's actually possible to
14:44, with this
14:45feature, connect to one file but pull in
14:47data from a completely different file,
14:49which makes
14:50troubleshooting a little bit more tricky. I
14:52've actually already done that in front of
14:54you and
14:54you maybe didn't realize. You can see here
14:56that I actually originally connected to the
14:58sample
14:59EU Superstore Excel file. But then inside
15:02of this search criteria, I actually went
15:04and pulled the
15:05files from the bookshop.xlx file, which is
15:08in the same folder as the Superstore files.
15:11And then I
15:12pulled in cells, the Q1, Q2, Q3, Q4. That's
15:15where that was actually coming from. And so
15:17if you go
15:17look at the prep output and actually look
15:20at this, you'll see that there's nothing
15:23coming from
15:24my Superstore cells. You can actually see
15:26the bookshop data here just above my head.
15:28It's the only file that's coming in. So
15:30that to me is like one of those weird mindm
15:34azes, right?
15:35Like how is it that you can connect to a
15:36file that you never intended to connect to
15:38in the
15:38first place? Well, this sort of scenario
15:40exists. And so if you're having to troubles
15:42hoot your prep
15:43flow and this is what's going on, this is
15:45exactly what's happened. Essentially,
15:47someone's connected
15:47to one file, then decided to pull something
15:50from another file. And that's basically
15:52where
15:52it's going. Now, what is interesting about
15:55this is if I go in and edit, what I'm not
15:57sure about
15:58is whether it's going to be expecting the
16:00beta bookshop.xlx or the sample EU Super
16:03store store.xlx.
16:05In essence, what I'm going to do now is I'm
16:07going to actually switch the connection and
16:09see what
16:10that does. So if I double click on that,
16:12and we'll let it sort of do its thing, I
16:15think the search
16:16criteria will still apply itself to that.
16:19And you can actually see that it's actually
16:21working as you
16:22would expect. It's intelligent. You can see
16:25it's highlighting, first of all, the four
16:28things that
16:28are pulling through, even though I wasn't
16:30connected to that in the first place. And
16:32now this is still
16:34obviously running completely fine. So it's
16:36good that it degrades beautifully, and it
16:39still shows
16:40you what it's supposed to do. If you're
16:42just not connected to the specific file in
16:44question,
16:45you won't see that over here on the left-
16:47hand side. I guess that's the sort of
16:49downfall of this.
16:50Connect to the right file, you'll see the
16:51right things. If you don't connect to the
16:53right file,
16:54subsequently pull in something from another
16:56file, then you won't see it here on the
16:58left
16:58when you really should because you want to
17:00know that cells Q1, Q2, Q3, and Q4 are
17:03being brought
17:04into the flow as well. And that's also a
17:05nice touch. I don't believe I saw that
17:07previously with
17:08the Excel files because again, we didn't
17:10see this tabular view. But again, if you're
17:12using Excel,
17:13it's really nice that you get these four
17:15files here highlighted in blue to tell you
17:18that they're
17:18actually being used in the flow. So that
17:20was another quirk that I sort of discovered
17:22. I
17:24didn't really know where in the demo to
17:25show you this, but definitely something to
17:27highlight
17:28in this video. Yeah, this is it. This is
17:30the feature pretty much in a nutshell. Let
17:34me know
17:34what you think of this. I think this is a
17:36long time coming. This is one of those
17:38features where
17:38you see it and you're like, "About time.
17:41Thank God." It's one of those kinds of
17:43features that
17:43just makes that little bit easy to work
17:46with specific use cases where you have to
17:48handle
17:49a lot of unwieldy files. Together with some
17:51of the previous changes as well, you know,
17:54prefixes and suffixes as well, it's just
17:56starting to feel like Tableau Prep is now
17:58starting to get
17:59into its stride and can be seriously
18:01considered for at least light to medium
18:04data prep. Nothing
18:05too heavy just yet. All the heavy stuff I'd
18:08still do and all tricks. But a lot of the
18:10time,
18:10I am starting off in Tableau Prep and it's
18:13doing the job perfectly fine. In fact, it's
18:15a lot nicer
18:16to use than other tools as well that I'm
18:18comfortable with. So I'm actually quite
18:20enjoying
18:20it. Let me know what you think of the
18:21feature below. Let me know in the comments.
18:23Let me know
18:24what you'd like to see done with this
18:26feature. Let me know what ideas you come up
18:28with. It's
18:29always great to have people watching the
18:31videos comment, give us use cases and share
18:33their ideas
18:34with others. Sometimes even I forget things
18:36as well. So that's a really nice way to add
18:38to the
18:39discussion. Thanks for watching and I'll
18:41catch you in the next video.
18:42[0-1]
18:43[0-1]
18:44[0-1]
18:45[0-1]
18:46[ Silence ]
In Tableau Prep 22.2, you can now select a subset of files from a directory based on a new set of attributes within Tableau Prep. These attributes include file name, creation time, last modified time, last access time and size.