In the last couple of tutorials, we’ve been dealing with SQL queries that handle numeric data. Now we’re going to learn how to group together that data in a column and organize it.

🆕 🌟
📌GROUP BY: group results by a specified column
📌HAVING: narrowing down/specifying the location further

#Table of Contents

Query 1. #GROUP BY
SELECT store_id, count(active) FROM customer WHERE active=1 GROUP BY store_id;

Query 2. #GROUP BY #HAVING
SELECT rating, COUNT(rating) AS 'number' FROM film GROUP BY rating HAVING number>180 ORDER BY number DESC;

#GROUP BY

Q1. Find how many customers are active and organize them by the originating store

GROUP BY (column name you want to group by)

Let’s check out the customer table first. Run this query:

SELECT * FROM `sakila`.`customer`;

Here we can see the customer ID, store ID, the customer’s full name, email, and address ID. Let’s focus on the store ID and active columns for now. If we want to know how many customers are currently active at our imaginary film store, we can try running this query:

SELECT COUNT(active) FROM customer;

But this doesn’t give us very clear information. We still don’t know how many are active, since this query only shows how many customers have information entered in the ‘active’ column. It doesn’t specify whether they are active (specified with a 1) or not active (specified with a 0). So let’s filter this information by adding WHERE active=1!

SELECT COUNT(active) FROM customer WHERE active=1;

Okay, now we know we have 584 active customers. But there is one more factor we don’t know yet… which store has how many active customers? We need to plug in the store ID into our query.. but how?

By using the GROUP BY query, we can get a clear and organized look at this information. First, we need to add store_id to our SELECT query in order to view that column. Next, we want to specify with  GROUP BY  how we want the results to be organized. Here’s how to plug it in:

SELECT store_id, COUNT(active) FROM customer WHERE active=1 GROUP BY store_id;

Now we can clearly see that store #1 has 318 active customers, and store #2 has 266 active customers!

#HAVING

Q2. Find the quantity of movies for every rating and rename the new column to ‘number’. Sort the results in descending order and filter out anything less than 180.

GROUP BY (column name you want to group by) having (specified)

Let’s check some movie ratings back in the film table.

To view the ratings column, we need to specify it under the SELECT query. We also want to count how many ratings there are, so we'll add the COUNT query. Let’s also use what he just learned and throw in the GROUP BY query :

SELECT rating, COUNT(rating) FROM film GROUP BY rating;

Here are the 5 distinct movie ratings and how many we have of each. The title of the count column isn’t as pretty and neat as rating, so let’s fix that with the AS query. Just add it after the COUNT query and you’ll get this:

SELECT rating, COUNT(rating) AS'number' FROM film GROUP BY rating;

Let’s say we want to only see movies that we have MORE than 180 copies of — how can we filter that information? We’ve tried a similar query before with the WHERE query, but this time we’re going to use the HAVING query.

SELECT rating, COUNT(rating) AS'number' FROM film GROUP BY rating HAVING number>180 ORDER BY number DESC;

🔑 Why do we use HAVING instead of WHERE?

In SQL,  WHERE  is processed before  GROUP BY. That means the information wouldn’t be divided by rating when  WHERE  executes. But  HAVING  gets processed after GROUP BY, so it can be used to constrain results by the group we specified.

Just remember, the WHERE query applies to all rows in the result set.
The HAVING query is applied to the groups created by a GROUP BY query.

🔑 Comparing DISTINCT and GROUP BY

Try running both these queries in QueryPie. What’s the result?

SELECT DISTINCT replacement_cost FROM film;

SELECT replacement_cost FROM film GROUP BY replacement_cost;

As you can see, the two queries are very similar. They will give you the same results in this case. The only difference is the order.

⭐️But for future use, know that GROUP BY allows you to use aggregate functions ( AVG, MAX, MIN, SUM, and COUNT) whereas DISTINCT is used to remove duplicates.

#Practice Time

Your challenge today is to find the number of movies in the film table. You must group them by price, and rename the column as ‘number’.

Here’s the answer:

SELECT rental_rate, COUNT(rental_rate) AS number FROM film GROUP BY rental_rate;