- Host: Steve Stedman
- Recording Date: December 11, 2024
- Topic: JOIN types
Stedman SQL Podcast Episode 12 – JOINs and our last episode of Season 1!
Stedman SQL Podcast Ep. 12 JOINs
Steve Stedman discusses SQL Server joins in the final episode of Season One of his podcast. He explains the importance of understanding table relationships, particularly foreign keys, and covers various join types, including inner, outer, left, right, full, cross, and non-EQUI JOINs. Stedman demonstrates these joins using a sample database with tables for people, favorite colors, and favorite foods, highlighting data inconsistencies and the results of different join types. He also introduces the concept of the grandfather, father, son (GFS) backup schedule for efficient storage of backups. The episode concludes with a mention of a free joins poster and an invitation to Season Two.
Podcast Transcript
Steve Stedman 00:16 Hey everyone, welcome to this week’s episode of The Stedman solution SQL podcast. This is episode 12. This will be the last episode of 2024 or last episode of season one before we go into 2025 with Season Two. We were doing just live streaming earlier in the year. We decided a couple months ago to turn it into a formal podcast, and we’ve been doing it about every week now, and we’ll plan to continue that next, next year. If you want to find us, find the podcast, to watch it and listen. You can get it at stedman.us/podcast, and there’ll be links there to YouTube as well as Spotify, where you can watch and listen.
This week’s episode is on SQL Server JOINs. And this is something that I have presented on, gosh, probably close to 15 years that was one of my first SQL Saturday presentations that I ever did, and I probably presented it 10 different times at 10 different SQL Saturdays. And I also presented it at the SQL pass Summit, a big event used to happen. Well, prior to COVID, it was more popular than it is. Now, got a class that I teach on it, as well as several online presentations so one of the things that joining does is it connects tables, and you need to understand the relationships between those tables. And you can kind of see the background of this screen I’m sharing right now is a bunch of tables drawn out with lines between them, showing the relationships, and those relationships define what you can JOIN on between different tables. And when I first created this class, I was in a position where I was working with a lot of developers and a lot of people who didn’t really understand the database. They knew data, they knew you had data in table one and table two, but how you actually connected the data between those two tables was not really well known, and typically you have a foreign key relationship, which is where one ID is being referenced from another table. And we’ll take a look at that. How to use those and how to use those to do some pretty cool things.
Here’s the short links for the podcast, stedman.us/podcast, YouTube/podcast, Spotify/podcast, are ways you’ll be able to watch this and other episodes. And I’ve presented, I think the last time I presented this class was in 2021 and I did a live stream, and since then, there’s been a lot of different questions that I’ve seen people ask on the website, and I want to hit on some of them. Some of them are basics, like, what is a JOIN in SQL Server? Well, that’s simply connecting two tables and pulling data back between them. What is a JOIN type of inner Well, we’re going to explore that. What’s the difference there between inner and other types of JOINs? What are some types of JOINs in SQL? Well, there’s Inner JOINs, there’s outer JOINs, there’s left, Outer JOINs right, outer JOINs, FULL OUTER JOINs, cross JOINs, things like that. And then what is an inner JOIN versus an outer JOIN. By the time we’re done with this podcast episode, you will understand that. And what is the RIGHT OUTER JOIN versus a left outer JOIN? We’ll take a look at that and make sure you understand the difference between those and how many types of JOINs are there. Well, if you’ve seen my JOIN types poster, which we’ll show in a minute. There’s a whole lot of them, and it really boils down to different combinations of how we’re joining, pulling data from different tables, what is an inner JOIN versus a left JOIN? What is an outer JOIN with exclusion? That’s a really useful one. Once you learn that, you’ll use it a lot, hopefully. What is an anti JOIN? That’s one of those that I haven’t covered in the previous lessons, but we’re going to cover that here. What is a SQL exclude JOIN? Well, that’s kind of the same thing as a JOIN with exclusion on this. What is a SQL JOIN to exclude batches, similar thing. What is a left excluding inner JOIN? Or LEFT JOIN excluding inner JOIN? Well, that’s a term that somebody’s used when they’re going to ask about it. We’ll kind of cover that on the same thing as exclude JOINs and an anti left JOIN, similar thing and an outer. JOIN versus a cross JOIN. That’s interesting. We’ll take a look at what the difference is with those and a full JOIN versus a cross JOIN, or an EQUI JOIN versus a non EQUI JOIN. This is one of those that the EQUI JOIN can apply to different JOIN types or the non EQUI JOIN. Is there a JOINs cheat sheet?
So, what I’m going to do is step ahead in the slide deck here to my cheat sheet. If you go to stedman.us/poster can download for free a PDF version of my cheat sheet. I made this about, I don’t know, a dozen years ago, maybe closer to 15 years ago, and as a way to represent all these different JOIN types. And it uses Venn diagrams as a way to see how the different JOINs work. Okay, so now that we’ve kind of hit the questions and we’ve mentioned the cheat sheet, let’s jump into a demo on how to use the JOINs. So to start with, we’re going to use a query training database that I have here, and in that database we’re going to have three really simple tables. One table that has people in it, but I’ve just called it table one because it’s going to be references table one, table two and table three, as we go through here, and all it has in it is an ID, a name, and then a foreign key to another table, so foreign key to the favorite color. So this ID in the table two represents favorite colors, matches this foreign key here, and then a foreign key to favorite food, which is this over here. And from there, we can look up people’s favorite foods. And then we’re going to insert into that table a handful of people, not too many, because we want to be able to visualize this. If we had 10,000 people in this table, it’d be really hard to see exactly how all these relationships work. So we’re just going to start with a small set of people here, a small set of favorite colors and a small set of favorite foods.
Now, one of the things you want to notice is that with the favorite colors, we have some data inconsistencies. We have two of them with an ID of one red and indigo, that’s going to that’s there as a duplicate to be able to show some things in the different queries as to exactly what happens when we have multiple rows. We’re doing the JOINs. If every row only matched up with one other row, the JOINs wouldn’t be quite as interesting. So what I’m gonna do at this point is I’m going to create the database, create the tables and insert those rows. And one of the things I always get asked after this demo, because I’m doing it a lot, is, how do I get rid of the messages window here quickly, I’m using Control R to show and hide the messages window on Management Studio Control R. So then let’s take a look at what’s in these tables. Actually, all three of the tables, we take a look at table one, we have people, Steve, Aaron, Mary, Fred and Beth, Johnny and Karen. And those people have some favorite colors, but not all of them have favorite colors. Johnny and Karen do not have favorite colors, as you can see by the nulls on the two bottom rows there and then all the or some of the people have favorite foods, but not all of them have favorite foods. You can only see the three on the bottom have favorite foods, but everyone else does not have that. So if I was to ask you, well, what is Beth’s favorite food, you could look and say, well, Beth has a favorite food of one, and look down here and type one is pizza. So Beth’s favorite food is pizza. Johnny’s favorite food is burger, or no is pizza also because it’s one, and Karen’s favorite food is burger, because it has an idea of two. If you were to look here and say, What is Mary’s favorite color, we could look and see favorite color of two maps to green, and we can say Mary’s favorite color is green. But if we said, what is Fred’s favorite color one? Well, there’s two of them here that have an idea of one. So Fred’s favorite color is actually red and indigo because of a data inconsistency that we have in the database there. Okay, that’s the data we’re going to use for this. And this process of looking at one table and then looking at another table like we just did to look those values up is really inefficient, because then you have to look at the database twice. So what we’re going to do is cover how do we connect those tables in a way that we pull the data back and JOIN those tables in a way we can use them?
So to start with, let’s go into inner JOINs, and back to the questions we looked at at the beginning. What is a JOIN in SQL? Well, it’s simply connecting two tables based off of some relationship that’s been defined between those tables. Then the next question is, what is a JOIN type of inner and that is the standard, the default JOIN that we’re going to take a look at here, that just says, take everybody in table A and match them up with Table B or. Table one and table two, if there is something that identifies a match between those tables and what are some other types inner leftover and more. Well, we’re going to take a look at Inner JOINs in this script window, and then we’ll take a look at LEFT OUTER JOINs. So what I’ve done here is I’ve said select everything out of table one, and then throw in some dashes just to kind of separate them. And then select everything out of table two. That’s what the t1 dot star and the t2 dot star are. And we’re going to say from table one, inner joining table two. Here’s where the relation comes in, or the JOIN predicate, as it’s called. We’re going to say, connect table one to table two based off of the favorite color in table one map to table two’s IDs. Now in the real world, somebody would have done a better job, hopefully naming the tables. And table two would have been named colors, and table one would have been named people, and you could do the mapping that way. So let’s highlight this and run it. And basically we’re saying, give me everything from table one matched with table two. If you remember, some of the people did not have favorite colors, we run the query, and we can see that this first half over here, the first half of the grid that I’ve highlighted in blue, is what came from table one, and the right half to the right of the JOIN is what came from table two, and it’s showing everyone from table one who had a favorite color and what that favorite color is mapped to. So you can see, because the data inconsistency on favorite color one, that Steve here has two favorite colors and so does Fred, because the idea of one is doubled up, but you can see that Beth has a favorite color of yellow. Anne’s favorite color is purple, and so on, based off of how we JOIN the data between those now, there were a couple of other people in the list that we saw originally who did not get included here because they did not have favorite colors. The thing with an inner JOIN is it will only show you those rows that match. If we just go back. Well, we’ll skip. We looked at it before, but there were a couple other people who didn’t have favorite colors there.
Okay, so now we take the same query and we’re going to change it here. We had inner JOIN of table two, and down here it’s just phrased as JOIN table two. What’s that mean? And how is that different? Well, it’s the same thing. The word inner on JOIN is optional, as is outer in a number of ways, and we’ll take a look at that in a minute. But you can either use the full syntax of any inner JOIN, which is explicitly states what it is you’re looking for, or you can just use the word JOIN, which means the same thing as an inner JOIN, they both do exactly the same thing. They grab all the data out of table one, match it up with the data in table two, based off of some relationship that sometimes referred to as the on clause, or, more technically speaking, the JOIN predicate, and then we have an order by in the end, just as sorted. Okay, what if we take a look at a different order of the tables here, I want to say select t1, dot star, and t2 dot star. That didn’t change, but I changed around the order. Table two is defined here, and table one is defined there. When we run this with an inner JOIN, the order of the tables of what we’re joining from, or what we’re in, our joining to, doesn’t matter. Now, when we start looking at left and right JOINs in a minute, it will matter. But from an inner JOIN, whether we have table one or table two listed first or second or table three listed third, it really doesn’t matter, because we’re looking at the results of everything that matches between the both of those tables.
Okay, so then what about our JOIN predicate? Here we have the on clause, table one favorite color equals table two ID versus table two ID equals table one favorite color. Well, that means the exact same thing. When we run it, we still get the same results. But what was different here was we said, Give me everything from table to first and then everything from table one. So it just changed around. The order of the columns that we’re seeing in the results set down here. Does the inner JOIN include everyone? Well, we talked about that a minute ago, and the answer is no, it doesn’t. But let’s take a look and prove it. If we make this bigger, we can see we have Steve doubled up, Aaron, Mary, a couple of friends and Beth, whereas down below, we have all those same people, but we also have Johnny and Karen who do not have a favorite color, so the inner JOIN actually excludes those people who don’t have a favorite color. Now imagine if this was some employee database, and you were trying to say, give me all of the employees. These and show their favorite color if they have one. Well, using an inner JOIN would mean that Johnny and Karen would be excluded in this specific example. Let’s JOIN to a different table and take a look. And this is joining the table three, which is favorite foods instead of favorite colors. And when we run this, we can see that only Beth, Johnny and Karen had favorite clues, and everyone else in the list got skipped on this inner JOIN. So the Inner JOINs will show you everything that matches between the two tables, and then effectively in the results that throws away those things that don’t match.
So you might be thinking, well, what if I want to see all of the employees, all the people, and show me their favorite color if they have one, but if they don’t have one, don’t I just still list them, but don’t show their color, favorite color. So let’s take a look at that. We can achieve that with the left outer JOIN. So left outer JOIN, we’ll start by using the right database, we get all the rows from the first table or the left table, and if there is a match to table two or the right table, then we show the matched rows. Otherwise, if it’s not matched, then show nulls. So let’s take a look at what that really means. So here, this is similar to our inner JOIN. But instead of saying inner JOIN, we change it to left outer JOIN. Here it’s going to say, give me everything from the left table. So if we look at these, table one, in this case, is the left table, and table two is the right table, so the right of the JOIN, we’ll click that character turn back in there for readability. So give me everything from table one. And if they have a favorite color, let’s match it up with that favorite color in table two. Otherwise, let’s replace it with null. So we’ll take a look at how that looks when we run it. So you can see the first eight rows that we got back here are exactly the same as what we got with our inner JOIN, but when we look at this because it was a Left Outer JOIN, it says, include everyone from table one, which means Johnny and Karen in this example, but fill it in with nulls if they don’t have something from table two. And in this case, table two is the favorite color, so it gave null as being their favorite color.
All right, so what happens if we reverse the order of the tables. Now, when we did the inner JOIN and we reverse the order of the tables like this, it didn’t make any difference. But keep in mind it’s saying, Give me everything from the left table and match it up with the right table if it matches otherwise, display nulls. So here we got all of the favorite colors and where they matched. Over here, we displayed who they matched with. Otherwise, we left them as null. So we can see this from this that the favorite colors of pink, mauve and orange, or four, six and seven haven’t been chosen by anyone in the list. Okay, what about this? If we just do an outer JOIN from table one to table two. Okay, that is not an invalid syntax because you haven’t specified here whether it’s a Full Outer JOIN or a Left Outer JOIN or right outer JOIN. So you have, you can drop the word outer like here, we can just change this to a left JOIN and run that. But you can’t drop the words left or right or full when you’re running outer JOINs. And then there’s some other examples, very similar. We’ll skip over those, but just joining the sub queries and things. But the point is, the LEFT OUTER JOIN is giving you everything from the first table and matching it up to the second table and showing nulls if the second table is not match.
Okay, now we’ll take a look at the right outer JOIN. And what is the right outer JOIN? Well, you might have guessed it. It says, Give me all the rows from table two or the right table, and if there’s a match in table one or the left table, show the matched rows, otherwise, fill it in with nulls. So let’s take a look at this. This is the same query that we looked at with our left outer JOIN, but we’ve just replaced the word left with right. And when we run this, we can see we’ve got all of the favorite colors matched up with people, if they exist, otherwise, filled in with nulls. Okay. So then reverse the order of the tables. From table one with the RIGHT OUTER JOIN, sorry, table two with the RIGHT OUTER JOIN THE TABLE one, we run that you may recognize this, and it looks like the LEFT OUTER JOIN from table one to table two. And in fact, they’re the same. By switching the table order around and switching this from right to left, they behave the same as if it was a left outer JOIN with the tables in the opposite direction. Okay, more questions. So how many JOIN types are there? Well, we’ve got a. JOIN, which we talked about inner JOIN, which we’ve talked about left JOIN and left outer JOIN, that we’ve talked about right JOIN and right outer JOIN, and we’ve talked about but there’s also FULL OUTER, JOIN full JOIN cross, JOIN non EQUI, JOIN echo, JOIN semi, JOIN anti left, JOIN anti semi, JOIN anti right, JOIN and more.
So what we’re going to do is take a look at now the difference between the inner JOIN versus the LEFT OUTER JOINs. So here we have the inner JOIN that we looked at originally, and it just matches up everything from table one to table two, where there’s a match and throws away those that don’t match. And then I’m going to run that included with the left outer JOIN, which says, give me everything from table one and leave it with nulls if there’s nothing in table two. And when we run that, this one should actually down here be right outer JOIN. So when we run all three of these, the inner JOIN, the LEFT OUTER JOIN and the right outer JOIN with all the tables and columns in the same order, we can see three different sets of results here. Let’s refresh that so first one, the inner JOIN matches everything, and only shows those that are matched. THE LEFT OUTER JOIN matches everything, but also gives us everything from table one and skips it with nulls on table two and the RIGHT OUTER JOIN says, give us everything from table two and fill it in with nulls if it doesn’t match on table one. Okay, that’s inner JOIN, left outer JOIN and RIGHT OUTER JOIN, keeping in mind that the left table is the one that is before the JOIN, and the right table is the one that is after the JOIN.
All right, moving down the list, then on to the left and right outer JOIN with exclusion. So there’s terminology here that we’ve got to think about. First is, what is a SQL exclude JOIN that is just referring to a left or a right outer JOIN with exclusion. What is a SQL JOIN to exclude matches? That’s the same thing. What is a left JOIN excluding an inner JOIN? Well, that’s kind of a weird way to say it. But what is a left JOIN with exclusion? And what is an anti left JOIN, or even an anti right JOIN in SQL, well, it’s a way to say, do the JOIN, but then throw away the stuff that would have been in the inner JOIN, or throw away the stuff that actually matched. So what I want to look at here is a left outer JOIN with exclusion, also known as an exclude JOIN, or a anti left JOIN. And if I leave off the where clause here first, it’s just a plain left JOIN, left outer JOIN, like we’ve been looking at where it shows everything from table one and leaving it a nulls if it doesn’t match table two. But then we say where table two ideas null. And when we run that, it basically says, give me everything from the left JOIN, throwing out those people who are matched, or throwing out everyone who would have been an inner JOIN. So this would be a way from the list of people in favorite colors to find everyone who does not have a favorite color, for instance, or everyone who doesn’t match up in table two.
Okay, so now the RIGHT OUTER JOIN with exclusion. We’re doing a similar thing. We’re saying where RIGHT OUTER JOIN where table one’s favorite color is null. And when we run that, it says, okay, just give me all of the colors that aren’t matched up to people’s favorites. So if this was we’re going to do, go do company T shirts, and we want to put everyone’s t shirt with their favorite color. We can safely not order any pink mob or orange T shirts based off of this query results, showing us that none of the people in our list like those colors. Another way you could achieve the same thing would be selecting from one table where the value is not in another table. And what we found is that doing the right outer JOIN with exclusion, or the right anti JOIN, or the left outer JOIN with exclusion, or the left anti JOIN, can be a more efficient way, in some cases, to do that for kind of a query, versus doing a not in type query, like we always say, actual mileage will vary, so you want to test it and compare it, but it’s a great way to give you say, match these tables and show me the things that are missing or that aren’t matched up. All right, moving down the list now we have a full outer JOIN or a cross JOIN, and the question we’ve been asked a few times on this is, what is an outer JOIN versus a cross JOIN, and what is a full JOIN versus a cross JOIN? Well, let’s get some terminology straight here. First, an outer JOIN or a full JOIN are referring to the same thing, and it’s effectively a Full Outer JOIN which. Right? It’s like saying, let’s do a left JOIN and do a right JOIN, meaning, give me everything from table two and give me everything from table one or vice versa. And if they match, show the match. Otherwise show nulls. So it would be like if we did the LEFT JOIN and the right JOIN together and unions the results. So when we run that query, we can see the first set of rows down to here. The first eight rows are what we would have got with the inner JOIN. Looking at that with the seven and eight, that’s what we would have got with the left outer JOIN. And then these additional three down here are what we would have got beyond the inner JOIN when we did the right outer JOIN. So it’s a way to say, match everyone up and then find the people who don’t have favorite colors, and find the favorite colors that don’t have people all right. Now, what’s different with that is it’s saying there’s it’s matching, and there is something that’s attempting to be matched. But with the outer JOIN, it’s filling in with nulls, and if it doesn’t match. So what a cross JOIN does, this is where it gets interesting. This is the effectively doing the Cartesian product, meaning take everything from table one and match it up with everything from table two and every possible combo. And you notice there’s no ON clause or JOIN predicate here on the cross JOIN, it simply says, Take everything and match it up. So what we’ve done is it’s taken it’s taken each person and combined them with every single color, and then so it’s taken Aaron and combined Aaron with every single color, Mary, and combined Mary with every single favorite color, Fred and so on. So it’s a way to just say, here’s all the possible combinations between these and cross JOIN. Is one of those that you don’t use very often, but there’s some really cool examples of test cases that I’ve done, like when we’re trying to build out a table and you have a list of here’s a list of here’s a bunch of first names, here’s a bunch of middle names, and here’s a bunch of last names. If I want to fill in a table with a whole lot of test data, I just do SELECT FROM first name, select from CROSS JOIN middle names, cross JOIN last names, and it generates a really big list quickly out of all of those tables there. So a cross JOIN is the Cartesian product, or basically matching every possible combo between the two tables and the FULL OUTER JOIN, is like doing the inner JOIN plus the LEFT JOIN plus the right JOIN, and including all the results there.
Okay, so moving down the list on to EQUI and non EQUI JOINs. So in all of the JOINs that we’ve done so far. Everything I’ve shown you so far has been an EQUI JOIN, meaning in the JOIN predicate or the on clause. I’d used an equals. So if we run this, this is our standard inner JOIN to say, give me everything from table one joining table two, where they match. Well, a non EQUI JOIN is kind of weird, where you’re doing something that’s not just equal so you can say it’s not equal to, or it’s greater than or equal to, or it’s less than, or anything that’s not an exact match. And if we do this and non EQUI JOIN greater than or equal to, means give me everything the regular INNER JOIN did, but also match up all of those IDs that were greater. What is it? We’re saying the favorite color is greater than the ID. So here you can see Aaron’s favorite color, the ID of one. The only foreign key is or the only value is one. But here where the color is where the ID is greater than or equal to So that’s saying where this favorite color is greater than or equal to this ID. You can see that we then get multiple colors selected for that, doing it as a left JOIN instead of instead of an inner JOIN, similar thing where it’s saying, give me the JOIN based off of something being greater than or equal to or less than, and then the same thing as a Right Outer JOIN. Now, if you’re not quite catching non EQUI JOINs, that’s okay. I’ve been working with SQL Server for over 30 years now, and other than coming up with some wacky test case where I was trying to come up with a bunch of test data. I don’t think I’ve ever used a non EQUI JOIN for anything useful in any type of real coding scenario. It’s something you should know about, something you can use, but it’s something that’s not very frequently used, all right.
And moving down the list, that is the end of the demo for today. Download the poster. The poster shows examples of all of these. It also has the sample schema and the table data, and it’s a great way to be able to visualize these. You can go to stedman.us/poster and download it for free. We also have a class available, and I have dozens and dozens. Of blog posts over the last 15 years about these different JOIN types on my on Stevestedman.com so lots of information there. Alright. Well, that wraps up the JOIN types presentation.
Now on to the ask Steve SQL questions. So this one was submitted, and the question came in is, what is the grandfather, father, son, backup schedule, or GFS, and this is one of those that’s really useful to understand when it comes to backups. Now, when people think of backups, a lot of the times, people, until we’ve had a discussion, don’t have backups that meet their expectations. And a couple weeks ago we talked about recovery point objectives, recovery time objectives, things like that. And oftentimes, I mean, none of us have unlimited disk space. So in a perfect world, you could have every backup of your database from every point in time since it was ever created, but we’re all constrained by disk space. So if you wanted to have daily backups for an entire year, you would effectively have to have about 365, times the number of or the size of your backup set that amount of disc space. But what the grandfather, father, son backup schedule does is it gives you a way to have longer term retention without chewing up as much disk space. So what you get is you get daily backups for the last seven days of backups. What that means is you have daily granularity, and you probably have some log backups in there as well for seven days, which is pretty good. You’ve got good coverage for seven days. But what happens if something was lost or destroyed in your data that you didn’t realize for beyond that seven days. Well, with that, we then have weekly backups that would be a full backup that keep four weeks of those weekly backups. So then we’re covered for the first month. We’ve got great granularity for the first seven days. We’ve got weekly granularity for the next three weeks after that, and then we throw in what’s called the grandfather level, or monthly backups, where we retain 12 months or more of monthly backups. So instead of with this, we end up with a whole year of full backups. But we don’t have the same granularity of one per day with 365 of them. Instead, we have 12 plus four plus seven or 23 sets of full backups, to have a whole year’s worth of backups there, rather than 365, and it’s a really useful way to manage longer term backup retention without chewing up as much disk space as you might think. So the grandfather, father, son backup retention method is a great way to extend your backup coverage without having to use as much disk space. All right, so if you like what we’ve talked about here, on JOINs and you’re thinking about, what do I that JOINs poster might have been great, but you don’t know if it’s going to do it for you. You can get that JOINs poster printed on socks. This would be great for that next SQL Server interview where somebody asks you to tell you about the difference between a left JOIN and a right JOIN. You just pull your pull your sock up and show them an example there. It would certainly get you noticed or remembered at that job interview. And hopefully that would be a good thing.
So that’s it. JOIN us next year. Season Two of the podcast starts in January. This is the last episode of season one of the Stedman solutions SQL podcast. Thanks for watching, I’m Steve, and I hope you enjoyed this. Please click the thumbs up if you liked it. And if you want more information, more videos like this, click the subscribe button and hit the bell icon so that you can get notified of future videos that we create.