3. SQL for Data Analysis: Filtering Data with the WHERE Clause

Meta Description: Learn how to use the WHERE clause in SQL for data analysis to filter your results. Master logic operators, text filtering, and numerical conditions today.

Welcome to the third lesson in our SQL for data analysis series! In our previous lessons, we learned how to see all our data and how to pick specific columns to keep our reports clean. However, the true power of using SQL for data analysis lies in your ability to ask specific questions. You don’t usually want to see all customers; you want to see the ones who haven’t ordered in six months, or those living in a specific city.

Today, we are diving into the WHERE clause—the fundamental tool for filtering datasets and extracting specific business insights using SQL for data analysis.

The Role of Filtering in Data Analytics

Think of the WHERE clause as a high-powered filter. In a spreadsheet, you might click a tiny arrow on a column header and check boxes to filter your data. When you use SQL for data analysis, you write these rules directly into your code.

This makes your analysis reproducible. Instead of remembering which boxes you checked in Excel, your SQL script preserves the exact logic you used to define your subset of data, which is essential for professional SQL for data analysis.

The Syntax of the WHERE Clause

The WHERE clause always comes after the FROM clause but before the LIMIT clause. It is the core engine for precision when performing SQL for data analysis.

Basic Numerical Filtering

If you want to find products at Mastery Retail that cost more than £50, your query would look like this:

SELECT product_name, retail_price 
FROM products
WHERE retail_price > 50;

Filtering Text (Strings)

When filtering text in SQL for data analysis, you must wrap the text value in single quotes (' ').

SELECT first_name, last_name, city
FROM customers
WHERE city = 'London';

Note for Analysts: MySQL is often case-insensitive by default depending on its configuration, but it is a “best practice” to match the casing of your data to ensure your SQL for data analysis scripts work across different database types (like PostgreSQL).

Comparison Operators You Need to Know

To master SQL for data analysis, you need to be comfortable with these common comparison operators:

  • =: Equal to
  • != or <>: Not equal to
  • >: Greater than
  • <: Less than
  • >=: Greater than or equal to
  • <=: Less than or equal to

Using “Not Equal To” for Data Cleaning

Filtering is often the first step in data cleaning. For example, if your dataset includes “Test” accounts created by developers that are skewing your sales averages, you can exclude them using standard SQL for data analysis logic:

SELECT * FROM orders
WHERE status != 'Test';

Combining Multiple Conditions: AND & OR

Rarely does a business question involve only one filter. This is where logical operators come in to enhance your SQL for data analysis capabilities.

The AND Operator (The “Strict” Filter)

Both conditions must be true. Use this when you want to narrow your results significantly.

SELECT product_name, retail_price, stock_count
FROM products
WHERE retail_price > 100 AND stock_count < 10;

Result: Expensive items that are running low on stock.

The OR Operator (The “Inclusive” Filter)

Only one of the conditions needs to be true. Use this when you want to broaden your search.

SELECT first_name, last_name, city
FROM customers
WHERE city = 'London' OR city = 'Manchester';

Result: All customers living in either London or Manchester.

Why Analysts Prefer WHERE over Excel Filters

When you use SQL for data analysis, you are building a logical pipeline. The WHERE clause allows for complex combinations that are difficult to manage in spreadsheets.

Imagine trying to find: “Customers from London who spent over £500 OR customers from Paris who spent over £1000.” In SQL, this is a clear, readable statement. In Excel, this requires complex “Advanced Filters” or multiple pivot table steps that are prone to human error—an issue solved by consistent SQL for data analysis.

Practice Exercise: Test Your Knowledge

The Mastery Retail orders table contains the following columns: order_id, customer_id, order_total, and order_status.

Your Task: Write a query that selects the order_id and order_total for all orders where the order_total is less than 25 AND the order_status is ‘shipped’.

Click to see the solution

SELECT order_id, order_total
FROM orders
WHERE order_total < 25 AND order_status = 'shipped';

💡 Pro Tip: Filter First, Analyze Later

In SQL for data analysis, the order of operations matters. The database “filters” the rows before it starts calculating any sums or averages (which we will learn later). By writing efficient WHERE clauses, you ensure the database does as little work as possible, making your queries run lightning-fast even on massive datasets.

Further Learning and Resources

To get the most out of your journey in SQL for data analysis, check out these resources:

Next Lesson: We will explore Advanced Filtering using the IN, BETWEEN, and LIKE operators to handle ranges and partial text matches!

Scroll to Top