S3 E2: Byte: Data Ethics with Gwilym Lockwood
Data ethics isn't just about the YouTube algorithm; it's about the guy with a dashboard who forgot to account for toilet breaks.
- Data ethics applies to ordinary dashboards, not just headline-grabbing algorithms: badly designed metrics (like Amazon warehouse time or cable technician job counts) can ignore real human realities such as toilet breaks and unintentionally amplify bias.
- Identical data can carry very different ethical weight depending on context: Gwilym's data school exercise reuses the same dataset as product launches and then as hospital operations, exposing that there's a technical right answer but no single ethical one.
- AI and automation are only as fair as their training data and underlying logic, so they tend to reproduce existing human prejudice rather than remove it.
- With clients, frame ethics as better metrics and unintended consequences rather than preaching ethics directly; rewarding sales revenue-per-minute beat raw call time and happened to be the more ethical choice too.
- 'Competence ethics' matters: small technical errors like joining on names instead of IDs can drop people from data and cause real harm, so build checkpoints throughout your pipeline.
- Introducing our first guest0:01
- Security training and a credentials anecdote1:08
- What is data ethics?5:22
- Mundane dashboards with ethical consequences7:50
- The product-versus-patient model exercise13:03
- Bias, AI and automation17:31
- Raising ethics with clients19:30
- What to do next24:26
- Competence ethics and technical mistakes29:05
- Wrap up32:55
0:00Hello and welcome to Datum.
0:02This is season three and episode number two.
0:04I hope you enjoyed the last episode.
0:06Uh Ravi, how are you doing?
0:07I'm good, thank you.
0:08Yeah, that last episode was good fun.
0:09I think it was good.
0:10Nice recap 2019.
0:11Um because we actually got through a lot last year
0:13Yeah, we did, we did, we did.
0:15But this episode, we're joined by our very first guest.
0:18Hi, how's it going?
0:19Good.
0:19How you doing?
0:20I'm well, I'm well, I'm well.
0:21Yeah.
0:22Whereabout whereabouts in the world are you?
0:23I'm currently in uh I was about to say sunny San Diego.
0:26in in dark morning San Diego.
0:30I'm out here to do some consultancy at the moment and so I've got up at uh six ish to to to speak with you guys.
0:36So it's really dedication.
0:37I love it, I love it.
0:38I think this reminds me of when it was snow beginning in the UK.
0:41Do you remember that when we did like a couple of podcasts when I was over in in New Jersey for for a couple of weeks?
0:45And that was again I I think I did a similar six or seven a.
0:48m.
0:48start to get to get a podcast
0:50So thank you, absolutely.
0:52Getting a body for us.
0:53But it's uh true dedication for our first podcast guest.
0:56Um I guess for context, uh we actually work with Gwillem, so we all we all work at the information lab.
1:00Um but Gillem's abroad and we decided to kind of use this opportunity.
1:04to get him on board to talk about our topic today, which is data ethics.
1:08Ravi, what do you think about Yeah data ethics is a fun one, right?
1:11Because it's it's it's almost like um
1:13It it's one of those things where when you on onboarded a company, it's one of those videos you have to watch that the sort of week long of induction and all these things and the the sort of stuff where it's about data security, data ethics starts with you and like the leaf that you get is like has a mirror on it
1:27And it's sort of to remind you that it's it's everyone's responsibility.
1:31Um but I think it's also um I think we we're gonna discuss this a lot more in this episode, but it's something that
1:36is often ignored or sort of almost passed off to someone that's like, oh this is I'm not as important for this.
1:43Uh I mean Gwillam actually teaches this at the data school as a as a as a sort of module that we do.
1:49And I think
1:50for each of everyone that's gone through that that process and um it it always encourages different conversations.
1:55I think you'll agree with that Gollum.
1:57Yeah, yeah, for sure.
1:58So um
1:58I spent about uh well I had like a three hour session with the DS where I talk about data security and data ethics.
2:04Um it's and it yeah, I I I like how you uh you were talking about the the training videos at uh
2:09different companies because those those are always um it's almost like they've designed it to be as boring or uh as like as ridiculous as possible
2:18Which has the the counterintuitive effect of um uh making people care less about it.
2:23So uh so you know you you sit through these uh these really long videos about how you know you should never give anybody your password and you think, well
2:31Obviously.
2:33But then the way it's had at home makes you less likely to really, really take it seriously
2:39Yeah, exactly.
2:40And but funnily enough that that one actually has happened to me before.
2:45I was once working with someone who I um it was a transatlantic project.
2:48I was in the UK, they were in the States.
2:51working for an American client and I was the I was their Tableau guy based in London.
2:55And uh you know it was uh it was this long project with uh lots of deadlines and it was quite stressful.
3:01And this uh the the the lady who was on point
3:04for the for the project was messaging me to say uh hey I need access to this uh this dashboard this work in progress thing you've got um uh where where is it?
3:12So I sent her a link to the Tableau server.
3:14And she said, great, what c what credentials can I use?
3:19Well, if you're not registered on Tableau Server, what you're going to have to do is talk to the talk to the IT guy to set up on site.
3:26Right.
3:26And then she says, oh, can I borrow your credentials?
3:29I'll log out after download.
3:33After download.
3:34That's pretty so that and this was all over
3:38Skype for business as well, right?
3:39So I have no idea if this is even her.
3:41I don't know if this is someone sitting behind a laptop and typing away.
3:44So you know I I my reply was
3:46Sorry, I don't know if this is a test or for real, but sharing credentials is a huge security risk.
3:51Delaying a client project by a few hours is not worth that risk.
3:54Because, you know, I was a good boy.
3:56I've sat through all the all the videos
3:58And she comes back like, no, I still really need sign in.
4:01I'm gonna CC your boss and your boss's boss.
4:03I will take the risk.
4:04It's my password, it's not your risk to take.
4:06But yeah, so we chat about all this kind of stuff in uh at the data school as well.
4:10Because it's one thing to sit through the it's one thing to sit through the videos and not take them seriously.
4:15It's another thing to chat about uh real times where risks do occur.
4:20Yeah.
4:20Yeah, it's always interesting with the with the passwords thing because it's uh everyone has their own weird secure uh inverted commas.
4:26I'm doing the quote marks here, secure way of doing things.
4:28I've seen uh people have the the the draft email they keep with all their logins and credentials they can copy and paste on.
4:34Or an insecure notepad.
4:36And a while SpiderBooks.
4:41Right, exactly.
4:41And and it's stuff like this and it's it's it it makes you wonder like, you know, you do all these data ethics and data security classes when you you were inducted or
4:50All it all all of these times when you're not really paying attention, but um I wonder how many companies have actually implemented something that does do a test like imagine if that was a test with with the
4:59with the person you're working with.
5:00Um or or even just to recap, right?
5:02Like as in a refresher everyone has to go through and you're regularly checked in on on these things.
5:07Um I think what we're gonna talk about a lot here is um why is it important in in data tech and analytics in the sphere we work in uh and also a bit about the origin I guess.
5:16Right?
5:16Yeah.
5:17Exactly, exactly.
5:18So why don't we dive into that?
5:19Sort of sticking to the old what-so-what format.
5:22So yeah, what is data ethics?
5:24I guess I'll open up that discussion first.
5:26to maybe Gwellum or Ravi.
5:28Go for it.
5:29So oh it's it's what it's one of these things where it's I think it's quite difficult to define in a nice snappy sentence.
5:35Um the the way I see it, data ethics is okay so you're doing stuff technically with your
5:41with the data and you're doing your job but can you step back and think about it for a moment please that that's what I think of as as ethics like okay I I can see what you're doing from a business standpoint but what are you actually doing?
5:51You know what's the effect what's the uh what's the um
5:55What's what could potentially go wrong?
5:56What's uh what are you I mean like uh what are you doing without realizing?
6:00Who are you affecting?
6:01What's um
6:02Uh what stop looking at the numbers, what's the processes, what's the people?
6:06And and I'm kind of paraphrasing Georgia Loopy here.
6:09Yeah.
6:09Okay, yeah, yeah.
6:10It's it's it's it's sort of a lot about the value judgments, right?
6:12The soft, the softer side, the softer implications.
6:16that you know are derived or maybe even implied from the way you go about handling data, right?
6:21If if I was to try and compress it into a phrase that doesn't cover half of it, but yeah.
6:26It's it's about the places I can get.
6:27I think I think a lot of this comes there there's a lot of blurred lines here, right?
6:33I think we talked when when we had the data and privacy episode, we we we talked a bit more about
6:37whether it's right to have uh the the GDPR regulations and PID data and all these different things.
6:43But then it's like, well, if you're trying to use data to improve lives, do you not need some level of
6:49you know, demographic data and that's where we start getting to that ethical point, right?
6:53At what point do you stop looking at a row as a row of data and start thinking
6:59of the person behind it and what is the intent.
7:02I think the the intention is really important here.
7:05Yeah, exactly.
7:06It's it's almost as if um there's multiple strands.
7:09So data it says data ethics, you'd think it's
7:11only focusing on ethics, but actually there's there's an under underlying level of sort of uh law or legislation, whatever you want to call it, that sort of, you know, this is how you handle data, GDPR being a good example.
7:21Then there is the um sort of there's a practice.
7:24So
7:25you know, people like you and me, we all have agreed ways of handling data and understanding what's the best, most efficient way of doing that.
7:30And then applying that to the, you know, the first the first rule, which is law.
7:33And then there's a third level, which is okay, you're doing the first two fine, but there's still another aspect, which is you might be doing the first two, but still not actually considering ethics in itself, right?
7:43And it sort of has an implied relationship between those three
7:47things.
7:47Yeah, for sure.
7:49So cool.
7:50How do you relate this back to sort of the um business analytics side or the sort of day-to-day usage of, you know, uh we talked to we talked a bit about like, well, it's everyone's responsibility.
7:58How does that sort of factor in when you're when you're speaking about data analytics?
8:01ethics.
8:01So I think that this is one of the most crucial points about data ethics and I I I see it as a bit like climate change in in the way that people approach it.
8:10So uh in everyone kind of uh
8:13uh appreciates that climate change is a massive problem.
8:15But on the individual level, you can uh people see it as like, well I'm trying an ice cycle to work, but so I don't have a car, and that's the best I'm doing, but I know that it's also relatively
8:26it's a drop in the ocean.
8:27What change can I really make?
8:28I'm not the head of BP, I'm not the head of Shell.
8:31You know, I can make some actions, but I can take some actions.
8:34but is it going to do anything?
8:36And data ethics is kind of similar in that people think when you read about data ethics or when you see it in the news, you see stuff like how um
8:44you know, the YouTube algorithm may or may not radicalize people towards the far right or how also the YouTube algorithm may or may not uh direct uh
8:55There was a fascinating but terrifying story in the in the New York Times uh about six months ago.
9:01Um it was what was the headline?
9:02It was called uh it was something like um YouTube's digital playground is
9:05an open gate for pedophiles, something like that.
9:07And it it was about how you know the algorithm would pick up uh you know it would it would identify people who watched like
9:14uh certain kinds of content and it would realize that uh people with people who watch this kind of content also watch content with young girls in it or young children in it and and how the algorithm kind of facilitated that and would direct
9:27pedophiles to uh innocent videos that kids had uploaded of themselves uh and it was it was horrific but but people see stuff like that as data ethics which which it is but people see the big stuff and think
9:39I'm not like someone who develops the YouTube algorithm.
9:42I'm I'm I'm a guy with a dashboard.
9:44So what what is it like you know my my dashboard is this this week's sales you know what where's the data ethics in that?
9:51But it's um
9:53Something I I I make the point of is that uh when you when you hear about workers in Amazon warehouses um like pissing in bottles because they uh in the middle of the warehouse they're like a mile away from the nearest toilet and they can't uh
10:06You know, you c you can't go all the way to the loo and then all the way back to where you are without uh spending ages doing that and then losing time, etc.
10:14Sure, that's that's unpleasant, but the somewhere in uh
10:17Amazon there is someone like us who has a dashboard of uh some like employees time.
10:24And some so someone has someone with a dashboard
10:26is probably just reporting time on work and time off work for individual employees.
10:34And the way that they've set up that dashboard probably just doesn't account
10:36for stuff like going to the loo on a shift when you're you're a long way from a toilet.
10:40So and that that's the mundane side, but that's how it can, that's how something that feels relatively mundane can actually lead to ethical and unethical situations.
10:49Exactly.
10:50And even that example um maybe has some implied bias in it because, you know, if if if you're maybe a woman who's about to go through a pregnancy, you might of neither loo more often.
11:00So a dashboard like that would would also, you know, not take that side of the sort of conversation into account.
11:07And it can lead to sort of
11:08biases being kind of only amplified further than you know otherwise would say.
11:13So that that reminds me of a uh one of the best articles I read last
11:16year it was um it was it was in the Huffington Post it was called uh it was called um something about I was a cable guy uh it's it's by this woman called Lauren Huff who's uh oh yeah so yeah so she's a writer she wrote something fantastic
11:28about how um she wrote an article about how she was uh she was a cable technician for i in the States and she was uh her job was to drive around
11:37um various different places and fix people's cable.
11:41And her whole point, one of the things that comes through this essay is how she was one of the only female employees
11:48And um the like when when you're driving around uh suburban America or rural America, if she needed a toilet, she would have to um drive way out of her way to get to the nearest uh convenience store.
12:00And even then, you know, you don't necessarily
12:02want to go to the convenience store bathrooms because they're they're they're not safe.
12:06So um there's this whole section in there about how she's uh arguing with her boss for a pay rise
12:12because um she's paid less than her male colleagues and then her boss turns around and says well it's because of your numbers you do you complete fewer jobs per day than your male colleagues and it's performance based and there's this whole section
12:23She's like, well, yes, that's true, but my male colleagues have like massive bottles, like massive empty Gatorade bottles that they fill with urine while driving.
12:33And and I'm physically incapable of that.
12:37Well, yeah.
12:38Right.
12:38And so so that that to me is data ethics.
12:40So uh and and that to me is somewhere that we work as well.
12:44in that someone is assessing someone has a dashboard with uh performance targets and the the number of jobs completed by cable workers and someone yeah whoever maintains that dashboard or whoever
12:54maintains that data source probably doesn't have some kind of calculation that offsets toilet breaks.
13:00Right.
13:00Yeah.
13:01Or neutralizes.
13:02for it.
13:02Yeah.
13:02Yeah.
13:03I I think one one of the m most interesting things that I I've I've seen or heard you do with that class you run is uh swapping was it medical uh medical data with cell data
13:13It's the exact same numbers uh across a a small telly you created.
13:16Uh do you talk a bit more about that?
13:17Because I think that's a really interesting um uh test that you do in that that encourages must encourage.
13:22Yeah, that's a great conversation.
13:24Um I I really inter
13:25doing that one.
13:25It's uh we use uh we use Ultrix to build a predictive model and it's um this is relatively early in in in the DS training and uh
13:33What I've got is a uh a load of sales data.
13:35So I have this setup where I say that we we're a manufacturer and we've developed 30 new products.
13:40We have enough budget to launch five products
13:43So we take the historical data and there's various columns in there about cost of development and binary fields suitable for children, yes, no, that kind of thing.
13:53What we want to do is create a model to predict product success for the new data.
13:58So we look at the historical products that have worked well and haven't worked well and we take the new products, use the same kind of columns to work, to just give like a logistic regression.
14:07to see uh the whether whether it's going to succeed or not.
14:12Based on this data, which five products should we pick to launch?
14:15So people happily open their
14:18their predictive models and find the five new products that are most likely to be successful products according to um based on their uh their development.
14:29All right yeah so then the next question is
14:31We're a hospital.
14:33There's a list of patients who are waiting for an expensive operation which has no guarantee of working.
14:37We have uh 30 patients on the list and we only have enough budget for five operations
14:43Take the historical patient data and and you know it's it's um this time it's it's feels like uh patient's age and patient's income because
14:52income is like one of the best correlates for health outcomes there is and a binary field for smoker, yes or no, that kind of thing.
14:59And then um then the question is
15:01Which five patients do it have the greatest chance of success?
15:05Which five patients should we operate on?
15:07Because they're most likely to
15:09uh for it to work.
15:10And you know, people are uh some people are outraged, some people go dive straight in, and then then a little later people realize, hang on, this is the exact same data set.
15:18You've just changed the column headers
15:19Yeah.
15:20And and yes, yes, that that's that's the exact thing, right?
15:23But the the context of like the data is identical.
15:26The only thing that's changed is the context of what you're doing with it
15:30Mm-hmm.
15:30And one has more gravitest than the other, right?
15:33It kind of hits home the second uh the hospital one hits home a little bit harder.
15:36Yeah.
15:36Because you start asking yourself actual ethical questions about well
15:40uh you know what what what ethical framework am i actually going to use to make this decision it should it be the same one you'd use for you know uh
15:48uh in is is sort of objects or would it you actually should you use a more sort of sound set of principles to choose who you treat and who you don't treat that are not based on values that are in the data
15:59right?
16:00Yeah exactly and this is also where it gets uh where it gets very difficult and fuzzy because there just isn't a right answer to that.
16:06The the fun side but also the difficult side
16:09is exploring with people how to approach these questions when when there just isn't a right answer.
16:14There is a right answer from a technical perspective and how to do the regression, but there is not a right answer in terms of
16:20of the the ethical perspective and the the outcomes.
16:27Going a bit deeper into that, that this is where you get into like for example, big banks have the the security and privacy um
16:34regulations and and security settings they have.
16:36For example, you can't go down to such a a level of granularity.
16:40So the the responsibility then becomes on the individual, not only just to be aware of the context, but also
16:45the context of the data they're using.
16:47So for example, if you're able to, you know, so I I'm currently I'm in Champsford right now, but you could probably find out where I am exactly by tweaking certain filters if you're working for, I don't know, um
16:58the the my internet service provider.
16:59You can find out where I am and who I am based on your innate knowledge, right?
17:03But this is where it's responsible for the person creating the data to
17:06um almost have those security settings in place.
17:10So you can't go down to a single row.
17:11You can't use, you know, once you get to a hundred rows of data, you everything gets obscurated.
17:19Hang on, let me take a second obfuscated.
17:21Everything gets obfuscated, you know, obfuscated.
17:25Um that or MD5 hash or something like that.
17:28Um and that that's really important.
17:30Um
17:31And alongside that, because I think your example, Gwillem, of the sort of uh the predictive model really comes into what's happening and the sort of future trends.
17:39people are expect expecting.
17:40We've talked about Morse Law on this podcast before where computing is getting cheaper and automation uh is coming effectively
17:47But through automation, you also get bias, like the inherent bias we have.
17:51So for example, if we ever get to a point where the legal system is run by some level of AI
17:58it will be inherently biased because you know there is a tarnished history in the legal system about biases that people have within themselves that then
18:06come out during trials and all these high pressure situations because humans make mistakes.
18:10Um and trying to do something at that level and scale and volume.
18:15uh is tricky.
18:16So um where AI starts to have bias because it's not trained in a certain way uh also comes into this right.
18:22Yeah and people will people will often say with with with some reason um
18:27That this isn't biased.
18:29This is done by a machine.
18:30The machine is blind to human prejudice.
18:32No, the machine the machine is only as good as the input data that you give it
18:36And the the the training data for a model is always going to come from human decisions and human behaviors and human processes, which are, well, we we are generally awful to each other.
18:45So the machine is just going to learn that and learn how to uh recreate that quite effectively.
18:51And even if you take that down to really granular level, like uh sort of computer logic, like if statements and nested logic, like that's got a very limited sort of set of frameworks, right?
19:01There's only so far you can take that
19:02uh sort of framework as a programmer and build an algorithm because it's still using the fundamental building box of logic gates to to do the yes and no stuff.
19:11So um even that in itself is like an ethical limitation.
19:14Uh AI can only function as well
19:16well as the technology that sits underneath it and that unfortunately is still quite basic um at its core.
19:22I think we we we're definitely into the so what we're really into the discussion about a data ethic
19:26And data security.
19:27So I think so.
19:29One of the things I I I sort of struggle with is being a consultant and this sort of this friction against this topic around ethics.
19:36Because actually, as a consultant, what we tried what you're trying to do as a consultant is help your client
19:40Do the job better.
19:41Do whatever they've got you in to do in a better way, in a more reliable way.
19:45And sometimes you come up against this friction where the client wants X, but actually you think Y is slightly better for them.
19:51And adding data ethics to that mix is something that I've never really understood.
19:54What's the best way to convince your client that data ethics is something they should consider?
19:59Because often uh you're there because they haven't actually thought about some of these uh never mind analytical concepts in detail.
20:06Um uh what about sort of ethics, ethical concepts?
20:09And so I wonder how do you guys think that that that discussion or friction point is best sort of uh breached as it were?
20:17So it's people will people will just want you to do the job that you are hired to do.
20:22And if you start talking about something else, people are going to react generally negatively in a way that if I um
20:28If I have something wrong with my with my flat and I call out the electrician or the plumber, I want them to sort out the electricity and I want them to sort out the plumbing.
20:36I'm not necessarily interested in uh in them providing a computer
20:40different service to me, even here's how you should change your bank account or telling you about how that you should invest in a blockchain.
20:46Yeah, and you know maybe maybe they'd be totally right about changing my bank account.
20:50But the point is that that wasn't the context where I
20:52I wanted to hear it, so it's going to fall on death ears.
20:55So I I try to any anytime that I'm with a client and I think that something's a bit of a data ethics issue, I try and frame it more as um
21:04Let's think about the way that this is working, but also have you considered that doing this in this way will have these unintended consequences for the data or for the your profit line or
21:15or whatever.
21:15Um one situation where I had that was working with a uh a very simple um sales performance dashboard and they wanted to assess their salespeople and and the the metric that they were using for assessing the salespeople was
21:29was it was basically just the amount of minutes spent on the phone per day.
21:33And and and from from I mean just from a business perspective, that's like what you're gonna do.
21:38Chain your
21:39just chain your sales rep to the to to the to a desk and a phone.
21:42That's not actually getting the job done very well.
21:44So I was I was able to frame it as a um
21:48as a question of okay can we link particular salespeople to revenue and then do uh then do some kind of comparison where uh basically like a ratio of um revenue earned per minute spent on phone
22:01And I was able to persuade them that that was a better metric just in terms of this is going to reward your salespeople that are more effective.
22:07But from a data ethics standpoint.
22:10I didn't even mention data ethics but to them, but I'm sitting there thinking, you're poor salespeople.
22:15All you're doing is just making them do call after call after call after call
22:20Because what you're rewarding is persistence, not flexibility.
22:25And it's from the ethics perspective, the way that you're monitoring them and measuring them just makes their job atrocious.
22:33Right.
22:34Right.
22:34So and it encourages bad behavior.
22:36Yeah, yeah.
22:37And and that was a nice one where the um the the ethics side aligned with a like the responsible metric side, the ethics side aligned with just a generally better metric side for the business.
22:46So that was nice.
22:48Yeah, for sure.
22:49I think what what's really well, I think your example of climate change really really helps here, right?
22:53Like for example, you can tell someone you should recycle that, but you're not gonna be like
22:57It it's it's one of these things you you you'll mention it because it's something you care about.
23:01You it's not like you know you d I mean there's probably recycling consultants out there that tell you like all these different things you
23:06can do to be more green.
23:07But effectively it's it's sort of like that, right?
23:09You you just do your bit, you say your you see you say your piece and then leave it there.
23:13You don't really force someone to donate some some money to to offset their carbon emissions and things like
23:19this right you don't sit there no you have to do this you really should so well okay cool I will but don't make me do that that sort of thing but so I think your advice point is is definitely how how you deal with that sort of thing because
23:30And you really have to play these things by the ear.
23:32I don't think there's a hard and fast way of forcing an implementation of data ethics, but just having the conversation, talking
23:39about it is a start in this.
23:40Yeah so the the Open Data Institute have a really nice uh data ethics framework for for developing new projects and um again I I think that's a great starting point but it also depends on
23:50uh the amount of buy-in that people have.
23:52So while the framework is fantastic for people who are engaged and willing to look at uh look at developing a new project under um with with particular ethical guidelines or
24:04when um looking at uh like investigating different ethical approaches to the project they're doing.
24:10If if you try to do the same thing with people who are already disengaged, it's uh I don't know you'd have to be a much more convincing person than than
24:17I am to be able to make that a success.
24:20Yeah.
24:21You're fighting an uphill sort of battle at that point.
24:24Yeah
24:25Yeah.
24:25Cool.
24:26I I think we're we're in the now what section now.
24:28So we're sort of talking about, well, you know, we've talked about data ethics, what it is, we've spoken a bit more about how it's used, why it's important.
24:34and all these different things and we're almost at like well what what what should we be doing like what what should we do next and almost like how will this have a we're in a new decade we're in 2020 like how will this impact
24:44the future versus where it is today and how can we as a community of people of you know um of tech workers and data workers and people that have this access and responsibility how do we actively take this forward?
24:56Tim do you want to start on this one?
24:58Yeah, I think I think the way I normally do this kind of thing is that I I I tend to look at what other people have done.
25:03So shamelessly copy essentially.
25:05So um there's a couple of data ethics frameworks and it's interesting because
25:10The data ethics means something different to every organization.
25:13So actually getting a feel for what different companies think ethics refers to is probably a really good start.
25:19Um because you might have a different sort of um understanding of that in the
25:24context of the business that you run.
25:26For example, if you're a hospital, well that has to be paramount.
25:28Whereas um I don't know if you're a stock trader, just take a an extreme example, well, your ethics, well, your ethical framework
25:35might need to uh be a bit more robust.
25:37But my point is is that the things you have to worry about immediately in front of you are slightly different.
25:40And so I think that's the first thing I'd normally do.
25:42I don't I don't know what you guys think.
25:44What do you think are some of the key steps people
25:45should take.
25:47What I like doing is just talking about data ethics with a with a sense of enthusiasm.
25:54is a thing that people talk about, but haven't necessarily read things about it or haven't necessarily really dived into it because the uh
26:03The way that data ethics is presented can sometimes feel quite uh prescriptive and forbidding and like a list of rules.
26:09But what's really nice is that just
26:10just talking enthusiastically about books like Invisible Women or Weapons of Math Destruction, people suddenly get interested and think, okay, well it's not just like a set of commandments
26:21that you have to follow.
26:22It's a constant evolving living thing that you can think about and identify and it affects everything more than you think.
26:28And just just talking about it with enthusiasm, I find
26:31it it helps just encourage people to to get interested because it it's a whole new area of conceptually interesting things to to discover and read about and learn more about
26:41So yeah, I find that that helps a lot.
26:44One of the things that I found, and I echo that completely because one of the things I definitely struggled with was um
26:50the sort of everyday sectism side of things and um through people talking to me about it and sort of almost showing the biases and showing the things I should be thinking about when I'm speaking and
27:00The things you do unconsciously just because society and the culture and things around you have just shaped you to be, even though it could be right or wrong.
27:07I think data ethics sort of sits where
27:10In a similar sort of vein where you you want to be talking about something that's quite important and making people think about it, um just bec uh and just doing it actively but not like annoyingly
27:20So to be talking about it in a way, as you say, quilling with enthusiasm, showing interesting examples, um, and sort of elevating the the role data security, data ethics and data privacy play in our everyday life as we become, you know, there's more and more data being used.
27:34used by companies, but also we have a responsibility for our own data and also the data we work in, especially being workers in this industry.
27:42I think that definitely we'll start just bringing it into the conversation more, right?
27:45Just
27:46bringing into the discourse and the things we talk about, things we think about automatically by virtue will improve or try to improve or at least shape the the the actions we take.
27:56Yeah, exactly.
27:57I think it's actually also, I mean, Guillem's point about enthusiasm is actually probably the most pertinent thing there.
28:03I think we have this habit especially of bringing up sort of
28:06of topics and consulting in a very dry way.
28:09And it really helps actually if you if you approach a topic with enthusiasm.
28:13But also I think from an individual perspective, just thinking, well, how how do you do that in a enthusiastic
28:19Way because every client is different, but also the context is different.
28:22Um, your audience is different, you know, factoring it into sort of your champions, you know, when you go talking to your champion.
28:28and people who you're going to bring on as stewards for your um you know data literacy projects, how do you factor that into the way they approach people?
28:35It's about cultivating a culture, as it were, of where that topic, data ethics, is more
28:40sort of frequently discussed and considered because only by doing that do you then have everyone in the organization thinking about it.
28:46If one person thinks about it, it's probably not going to be very ethical and
28:49itself right so it's also gonna be a very biased to that person's perspective on the wealth right yeah or lack perspectives if it's not sort of from a from a broad group of people so yeah it's a very important sort of um
29:02sort of thing to consider.
29:03Okay.
29:04One final thing that I think uh doesn't get raised as data ethics that often is is going back to the technical side of things.
29:12So I uh completely mangle a join on an almost daily basis.
29:16I join on the wrong fields or I I do a left join instead of an inner join where it should be the other way.
29:22But uh the point is that I I normally normally catch these mistakes and um and rectify them before uh before any kind of negative effects in the later data pipeline happen.
29:33But the the the thing is that in in my head this is mostly just oops messed up a joint again better correct it but just small technical things can have a huge effect on uh as well and and just small technical things
29:46can have a huge effect as well.
29:47And I think that the outcomes from small technical effects could be classified as data ethics.
29:52So for example, if you're um if you're doing a join of uh hundreds of thousands of of personal records
29:59And let's say that you've got um you're I don't know you're joining two tables of people's names and you're uh you're joining on the name instead of someone's uh consistent personal ID
30:10Spelling mistakes happen.
30:11People's names will be spelt differently in different tables.
30:14If you are joining on people's names, and because of spelling mistakes, some people fall out of that join.
30:20What what business process are you doing which means that people are now missing from your data but shouldn't be?
30:25So for example, you could have if you're um
30:28If you're looking up, say, people's addresses and whether or not they've paid their TV license, but you've joined on someone's name, which has a spelling mistake in it.
30:36or is spelled indiv is spelled inconsistently or someone's changed their name since they paid their TV license, that kind of thing.
30:43If you're taking the left output of that joint and saying, okay, the people who are in the addresses table and are not in the
30:50Have made a payment table, we're going to go after them.
30:53You get the situation where people are being hassled for payment, which they've already made.
30:59And this is affecting people's lives in a very negative way.
31:02People feel
31:02Um people feel hassled, people feel stressed, and and all it is is because someone's joined on a name instead of a a person ID.
31:10So I think that one of the most in terms of making small steps towards data effort.
31:15One of the best things you can do is set up as many as many kind of checkpoints as possible throughout your data pipeline.
31:22to try and predict where you're going to go wrong and try and rectify any mistakes that you might make.
31:27And I think of this as competence ethics, in that one of the most ethical things you can do, I think, is be good at your job.
31:34Yeah.
31:35Just to minimize as many mistakes and the the horrific effects of unintended mistakes.
31:40Yeah, yeah, just do your job basically.
31:46or or making people feel uh making f people feel bad because that that's not the effect I want.
31:52It's just about um uh admitting that you're you're human and you're going to make even the best technical architects are going to make
32:00mistakes in their data pipeline at some point.
32:03And just to just to spend your day with the idea in the back of your mind that, hey, at some point I might mess up.
32:08What would the effect of that be?
32:10And how can I make sure I don't mess up?
32:12Yeah.
32:13And and this comes back to like things like systems, right?
32:15And having like the checklist of stuff you need to do before you make something productionized, etc.
32:19etc.
32:19And this is where process and systems and protocols that people put in place and
32:23If you design those with data ethics in mind alongside everything else you have to consider, then yeah, absolutely you start minimizing those mistakes, right?
32:30Okay.
32:31Yeah, it's a it's a very valid point.
32:32Um I actually think people should take solace from that because I think we all do pride ourselves on the quality of the work we do
32:36We all do.
32:37So um at the very least, if if you if you keep on sort of aspiring to do your job better, then inherently you'll eventually get there, if that makes sense.
32:45You'll eventually get to the point where
32:46Ethics is something that is considered, but there has to be that concerted effort to just think about it a little bit more as well, to kind of uh take it into your thought process
32:54Okay, I think that's it.
32:55Uh that's been a great episode, Gwillem.
32:57Thank you so much.
32:59Thank you for being thank you for being our first guest.
33:01You didn't tell me I was going to be the first guest
33:03Dare I say it's very splant, but yeah.
33:07Absolutely.
33:08No, it's it's been an absolute pleasure.
33:10I think it's it's it's been great.
33:11We Ravi and I know nothing about
33:12about data ethics.
33:13So uh we we we when we thought of this topic we just thought well there's one guy and that was you so I think got right if there's any takeaway like uh there there are very there are very very few experts.
33:23I I've read a bit and there's always so much more to learn
33:26so much more to explore.
33:27I'd never want to call myself an expert on data ethics.
33:29I just I just really like reading about it and learning and trying to uh trying to incorporate it into my job
33:35Exactly.
33:36Cool.
33:36Um if you want to listen to old episodes of the podcast, uh head to datumpodcast.
33:41com.
33:42You can find all the links to subscribe in your favorite podcast reader.
33:45If you want to join the conversation, uh Twitter handles exactly the same.
33:48Datum
33:49podcast.
33:50Give us a tweet.
33:51We're going to be talking about the episode a lot, I'm sure in the future as data ethics becomes more pertinent.
33:55And join us for the next episode, which will be in a couple of weeks.
33:59Farewell.
34:00Cool.
34:00Thanks for listening, everyone.
34:01Cheers.
Future-proof your career https://n1d.io
| In this episode, we speak to friend and colleague Gwilym Lockwood about Data Ethics, what it means and how to start thinking about it.
Show Notes
• Data Ethics Framework: GOV.uk : Link (https://www.gov.uk/government/publications/data-ethics-framework/data-ethics-framework)
• Online harms whitepaper: Link (https://www.gov.uk/government/consultations/online-harms-white-paper)
• What is data ethics: Alan Turing institute: Link (https://www.turing.ac.uk/research/publications/what-data-ethics)
• I was a cable guy by Lauren Hough : Huffington Post: Link (https://www.huffpost.com/entry/cable-tech-dick-cheney-sex-dungeon_n_5c0ea571e4b06484c9fd4c21?guccounter=1)
Feedback welcome on Twitter to Ravi at @scribblr_42 or Tim at @tableautim - or e-mail us, at datumpodcast@gmail.com (mailto:datumpodcast@gmail.com)