Predictive model improvements
Tableau 2020.4 lets you run regularised linear and gaussian predictive models in a calculation, and it's far simpler than wrangling R.
- MODEL_QUANTILE and MODEL_PERCENTILE let you run predictive analysis in a Tableau calculated field, with linear regression as the default model
- Use a quantile of 0.5 to predict the median, so roughly half the values fall above and half below the line
- Fix the 'cannot mix aggregate and non-aggregate arguments' error by wrapping the date predictor in ATTR() so everything is aggregated
- Specify a model explicitly with the model = '...' syntax, using 'rl' for regularised linear regression and 'gp' for the gaussian process model
- Tableau's documentation gives clear guidance on when to pick each model, and these calculations are far simpler to set up than the equivalent in R
0:00Hey, it's Tim here and in 2020.4 Tableau
0:03have added the ability to now work with
0:05different
0:05predictive modeling functions inside of
0:08Tableau Desktop and the browser as well. To
0:11show you
0:11this I'm actually going to start off in
0:12Tableau Desktop. We're going to connect to
0:13Superstore
0:14Sales and we're going to build a very basic
0:16line chart. If you haven't seen my other
0:19video
0:19on extending the date access range that is
0:21a really useful video to watch before you
0:23watch this one because I will be doing that
0:25a little bit here and it's just good
0:27context
0:27for what's going on in this video so I won
0:29't be covering it so go watch that video and
0:32I'll see you back in a bit. I'm going to
0:34connect to Superstore Sales here and we're
0:36going to
0:36hop right in essentially. We're going to
0:38wait for this to load and we'll have our
0:40data set
0:40pretty much ready to go. Now the key thing
0:42I want to do here is just build out a
0:44simple
0:44line chart and I'm not going to try and do
0:47too much education of predictive models
0:49here.
0:50I'm just going to try and show you the
0:52predictive models and show you what's new.
0:54So the first
0:54thing I'm going to do is drag sales onto
0:56rows and then I'm going to drag order date
0:59onto
0:59columns. That by default brings in a
1:01discrete year if that makes sense. I'll
1:03make it a continuous
1:04month and that gives us this nice timeline
1:07we've got here. Now in 2020.3 they added
1:11the
1:11ability to do model quantile and model
1:14percentile. And essentially these are sort
1:17of methods
1:17of doing predictive analysis. They're sort
1:19of two functions as you would call it in
1:21Tableau
1:22terms. And so what we're going to do is we
1:23're actually just going to use one of them.
1:25We're
1:25going to use the model quantile one because
1:28that is actually quite a straightforward
1:30one
1:30to show on the screen and explain. The
1:32percentile one is more of a range based
1:35sort of predictive
1:36model. So with the quantile one what we can
1:39do is we can choose the median and then we
1:41can see how that actually behaves because
1:43the median is supposed to reflect
1:45essentially
1:46the middle of the data. So 50% of the data
1:48should fall above it and roughly 50% should
1:51fall below it according to the model. So
1:53let's go ahead and create a calculation and
1:55we're
1:56just going to bring this here into the
1:57middle and we have our calculation window
1:59ready to
2:00go. Now one of the things you can do is if
2:02you don't know how to write these functions
2:03you can just type in the name model in this
2:05case and you'll see model percentile and
2:08model
2:08quantile. We're going to use quantile here
2:11in this particular case and it's going to
2:13actually give us some hint as to how the
2:15model should be written if that makes sense
2:17. So
2:18what I'm going to do is I'm just going to
2:20write it as if I'm not putting any
2:21variables
2:22into this. I'm just using the standard
2:24function which in this case is a linear
2:26regression.
2:27So you can see this sort of nomenclature
2:30here. First thing it wants is the quantile.
2:32So if
2:32I just type in 0.5 for the median that's
2:35going to work quite nicely and then the
2:37target expression
2:38is the sum of sales in this case. In the
2:40other video you'll see that I suggested
2:42that we
2:43shouldn't be actually typing this out. I'm
2:45quite comfortable with Tableau so I know
2:46how
2:46to do this myself but if you want to make
2:48sure you don't get it wrong then what you
2:51want to do is drag them from the menu bar.
2:52In fact that's what I'm going to do for
2:54this
2:54next one because the predictor expression
2:56in this case is the month date and I need
2:58to make sure it's the correct one. So let
3:00me just click and drag here. I'm holding
3:02command
3:02on the Mac while I do that. On a Windows it
3:04's control and we pretty much have
3:06everything
3:07we need here. The last thing is obviously
3:09this date truncation is causing us a
3:11problem.
3:12You'll see here we get the error cannot
3:13makes aggregate and non aggregate arguments
3:15with
3:16this function. We've got sum of sales here
3:18which is an aggregate. The model is
3:21basically
3:21encapsulating all of this so the thing we
3:23actually need to fix here is the date trunc
3:25ation.
3:26So the way to sort of make this an
3:27aggregation is to throw an attribute at it
3:29essentially.
3:30So ATTR that's a good old trick. If you've
3:32used Tableau for a while you know that
3:34trick
3:34basically. If you need to aggregate text or
3:37something like that attribute is an
3:39absolute
3:39winner. In this case it makes a lot of
3:41sense because we're working with dates and
3:43the dates
3:43alike. I think of them as buckets so we
3:45need to actually just attribute it to the
3:47whole
3:48month not just the particular day if that
3:50makes sense. I mean I probably murdered
3:52that
3:52description so let's not go into that in
3:54this video maybe another video on that but
3:56long
3:56story short we need to make sure that it's
3:59all an aggregate. So here we have our model
4:01and I'll just call this example one and I'm
4:05actually going to call this linear
4:06regression.
4:07I'm just going to call this linear R
4:09because I know it is. I'll show you how I
4:12know that
4:13it is in a second so hit apply and what we
4:15'll do is we'll actually put this onto the
4:17canvas
4:17and I won't close the calculation window so
4:20let me just make this a little bit smaller
4:23and I'll bring this down here just off
4:24screen and we'll actually drag the example
4:27one right
4:27next to sales and this will actually draw
4:30another chart. You can see the two lines
4:33there
4:33and what we can do is we can actually dual
4:35access these so that they lie on top of
4:37each
4:37other. The last thing to do of course is to
4:39synchronize the axis so it's really really
4:41clear and now we have it. So the whole
4:44point of this linear regression and the
4:46quantile
4:46that I've chosen is that essentially this
4:48model predicts that half the values are
4:51going
4:51to fall above and below so it's the exact
4:54middle of the data set and it generally
4:56looks
4:56at the data set in a very simple way. Now
4:59the reason I know this is a linear
5:00regression
5:01is if I actually go to the Tableau
5:02documentation on this where they have
5:04advice on choosing
5:05a predictive model this is actually new in
5:072020.4 they actually have this page and the
5:10default function that came with model quant
5:13ile and model percentile was actually the
5:15linear
5:16regression so you don't actually need to
5:18specify it because it is the actual default
5:20. So if
5:21I go back to my calculation here and I just
5:23click put my mouse in here you'll see that
5:26if you look at this really really long
5:28expression here that the model has a
5:30different way of
5:31describing the model if you want to. So you
5:33can either do it the simple way which is on
5:35the left quantile target expression
5:38predictor expression or you can do model
5:41and then model
5:42quantile target expression and predictor
5:46expression. So model quantile target
5:49expression and predictor
5:51expression so this one on the right
5:53actually has the specificity of the model
5:55you want
5:56to run so we actually need to do that in
5:57this case to make things work a little bit
6:00better.
6:00Now in order to do that I'm actually going
6:02to split this out onto multiple lines it's
6:03a little bit easier to see. Okay so let's
6:05clean up this calculation I'm just going to
6:07go up to this 0.5 I'm actually going to go
6:10onto a separate line like so then I'm going
6:12to go to the sum and I'm going to open up a
6:14separate line then I'm going to go to the
6:16attribute I'm going to open up a separate
6:19line again and I'm going to go to the final
6:22I think that's pretty much it actually I
6:24don't need to do any more of that what I
6:26can do
6:26is I can put these brackets in new lines so
6:28that it's easy to know where things start
6:31and where they end. If you look at this
6:34date truncation actually what we can do is
6:36this
6:36should actually go back one here because
6:39this is all just one function and actually
6:41that's
6:42the last bracket here that actually we
6:44started over there. One thing you could do
6:46if you've
6:46got a bit of OCD is do that so that the
6:48brackets are in line then you can of course
6:51highlight
6:51everything and just tab them in one so that
6:54everything's sort of nice and neat. So this
6:57is great one of the new features in 2020.4
7:00is the ability to do block comments so the
7:02way you do that is you could do something
7:05like this forward slash star say something
7:07and then say more here and more and then
7:11you can close out on a new line with that
7:14and
7:15that actually does a block comment it's a
7:16really cool feature check out my video on
7:18that but here what I now want to do is add
7:21in the new functionality if I click in here
7:24you'll see that it gives me this ability to
7:26add a model so what I'm now going to do is
7:28I'm now going to specify a model. Okay in
7:31order to specify a model we just need to
7:32basically
7:32create a new line and I'm just going to do
7:35a double space there I'm just going to type
7:37in model and I'm going to do equals and I'm
7:40just going to do speech marks because that
7:42's
7:43typically what you do to enter text and I'm
7:45going to do a comma just to make sure that
7:47's
7:47working now the calculation still thinks it
7:50's got an error I think what I've done here
7:52is
7:52I've actually put the speech marks in the
7:58wrong place so if I do that there and then
8:04if I just type in linear in here let's just
8:07make sure this is correct sometimes I
8:09forget
8:09the notation if I get it wrong we can go to
8:12the documentation and search there we go I
8:14think I've got it right so what it didn't
8:16like is it didn't like the space that I had
8:18there before and then previously I didn't
8:20put the speech marks in the right place so
8:22it's very it's very fussy it's only going
8:24to sort of be happy once you've put
8:25everything
8:26in exactly the correct place now you can
8:28see here I've rewritten this in the new
8:31format
8:31I'm just going to move it here to the top
8:33right so we can see the chart more clearly
8:34here at the bottom and if I just hit apply
8:36you'll see that nothing changes because
8:39this
8:39is the same model we were using before it's
8:41actually the default model okay so now we
8:44can actually start playing with some of the
8:46new models in 2020.4 so the new one one of
8:48the new ones is a regular regular I can't
8:51even say this a regularized oh man this is
8:54a real tongue twister let's see how many
8:56times it takes for me to get this right
8:58regular
8:58regularized linear function there we go I
9:02got it so this one is actually quite easy
9:05to type thankfully you don't have to type
9:07that sort of mouthful you can just type in
9:08rl if I was to say in better terms this
9:11model will find a line that generally fits
9:14the trend
9:15it's not as fussy as the linear regression
9:17in terms of following the trend if that
9:19makes
9:19sense so let's just type in rl and hit
9:22apply and you'll see that it's ever so
9:25slight change
9:26there if I just do if I just copy this out
9:28here what I can actually do is I can be
9:31really
9:31smart I can actually comment out that line
9:34and then what I can do on a new line is
9:36copy
9:37this and what we can do is actually just
9:39comment in and then comment out pieces of
9:43the of the
9:43calculation so let's just paste this here
9:46and we can say that this is linear okay and
9:49now if I hit apply because this has been
9:51commented out it won't work so if I do it
9:53apply you'll
9:54see the line shifts ever so slightly it's
9:56the most subtle of changes and then if I
9:58cross
9:58that out and go down to a new line double
10:00forward that it's commented out you see the
10:03line changes back so it's probably not a
10:05good example here this probably will work
10:08better
10:08in a slightly different sort of time frame
10:11but essentially that model is is now
10:13available
10:14for you to try and then the last one we're
10:16going to try is a gaussian model so let's
10:19just go in here and hit enter go in again I
10:21'm going to use a double space I'm going
10:24to comment out the first one go forward
10:26slash forward slash and then comment out
10:29the last
10:29one dash dash and what I want to do here is
10:33the terminology for this I forget this so
10:36I'm just looking up on my second screen
10:39here is gp so gp is the actual expression
10:42so let's
10:42go ahead there and hit apply and you can
10:45see that this is a very different model it
10:47actually
10:48pretty much follows our trend in a very
10:51interesting way and so there we have it we
10:53have the two
10:54new models that have been added the regular
10:58ized I still can't say regularized linear
11:01model
11:01and the gaussian model that have been added
11:03alongside the linear model which was the
11:05previous
11:05default and you can of course use this for
11:07the model quantile and model percentile
11:10functions
11:10that are now available as predictive
11:13functions inside of the calculation window
11:15or tableau
11:16desktop or even in the browser so that's
11:17pretty much it that's the feature in a
11:19nutshell I
11:20highly encourage you to actually go to
11:22tableau's documentation I know I'm a little
11:24bit of a
11:25shill for tableau's documentation I'm
11:27probably a shill for tableau but there's
11:30great great
11:30content here in the documentation I can't
11:33emphasize enough how often people come to
11:35me for help and I just go to the
11:37documentation I feel really bad when I send
11:39people the document
11:41as if to suggest that they they hadn't
11:43thought to read it but it is an easy
11:45mistake to make
11:46you wouldn't expect a software company to
11:48go into this much detail in terms of
11:50describing
11:51features that you know are actually
11:53generally not exclusive to the piece of
11:55software these
11:56models are are not necessarily exclusive to
11:59tableau they can be found in any
12:00statistical
12:01tool pretty much but as you can see here
12:03tableau have done a nice job of really
12:05summarizing
12:06when you can use this model so linear
12:08regression is the default as I said to you
12:10earlier on
12:10use when you have only one predictor and
12:12the predictor has a linear relationship
12:13with your
12:14target use the regularized linear
12:16regression I mean they've got to come up
12:18with a better
12:19name for that come on use when you have
12:21multiple predictors especially when the
12:23predictors
12:23have linear relationship to the target
12:25metric and those predictors are likely
12:27affected by
12:28similar underlying relationships or trend I
12:30mean I just zoned out after the first
12:32sentence
12:32there but if you're into this stuff and you
12:34're being asked are you being pressed to
12:37start
12:37to use these functions or use these
12:39capabilities in some of the way you do your
12:41work then it's
12:42nice to be able to have them available to
12:44you and of course you can then start to you
12:46know put them out there and see how people
12:48challenge the models or choose which model
12:50is the best one and so on and so forth but
12:52it's just great to have these features
12:54there
12:54and you can see that you know it's
12:56reasonably easy to add them to tableau to
12:58be brutally
12:59honest it's not a ton of work and not like
13:01R or something else where you have to do a
13:03lot of data framing and sort of description
13:06of what you're about to do and a tableau
13:08calculation
13:09is super super simple so anyway I'm r
13:11ambling on now let's end this video and if
13:13you've
13:13enjoyed the video you know what to do if
13:15not drop a comment below let me know what
13:18you'd
13:18like to see instead or send it to someone
13:20you don't like they might appreciate it
13:21more
13:22thank you I'll catch you in the next one.
Alongside linear regressions, Tableau has added regularised linear regression and gaussian process regression to the model types you can run in the Model Quantil and Model Percentile functions in the calculation window.
Tableau Release Notes
In 2020.3, we introduced the MODEL_QUANTILE and MODEL_PERCENTILE table calculation functions. By default, these functions use linear regression to generate predictions and explore relationships within your data. Now in 2020.4, you’ll be able to leverage two more models: Gaussian process regression and regularized linear regression. With more models supported, you have greater flexibility and can choose the model that best fits your use case.Documentation. https://help.tableau.com/current/pro/desktop/en-us/predictions\_choosing\_model.htm