MySQL

How I Taught the MySQL MAX Function to Make Better Queries

Structured Query LanguageI work for BitLoft as a Data and Business Analyst. Every day, I work with clients to answer complicated questions that can’t be answered with simple business mojo. What really makes my job interesting is writing in SQL. SQL stands for Structured Query Language. It’s a standardized language that allows a user to interact with a database. MySQL is a relational database management system.

I’m going to take you through a problem that I ran into as I was learning some of the nuances of MySQL, as well as the workaround I came up with to solve the problem. Of course I also realize that most query problems can have multiple solutions, so hey, if you use MySQL and want to pitch me a better solution, I’d love to hear from you.

The problem that I stumbled on was for a specific user who was enrolled in multiple online training courses. What I wanted to know was this:

If X user is assigned to Course 1, Course 2, and Course 3, what is his most recent activity date across all of his three enrollments?

To begin approaching this, I gravitated pretty quickly to the MySQL MIN and MAX functions. These functions allow you to review a set of data in a column and identify the maximum or minimum value.

Say I was looking at a table that had users in it, and that table joined to a second table where order events were recorded. In this table, I might have multiple orders per person. In this case, a useful MAX() function would look like this: MAX(orderdate). This would show me only the user record with the joined table information for his most recent order.

So I started with the most basic building blocks. I looked at the field last_activity_date and took the maximum (most recent) instance of it, like so:

MAX(last_activity_date)

I factored this into my approach. In my course enrollment example, however, I ran into a huge problem. Some of the individuals had no activity in one or more of their enrollments, which confused the MAX function because it can’t handle anything other than numbers. Instead of a number, the MAX function was dealing with a database NULL value. (NULL means the absence of data. NULL is a database’s way of saying that a slot is assigned where a piece of data could go, but the slot is empty. For instance, if you have a customer in a database table that expects to see First Name, Last Name, and Email, but the customer has no email address on file, the value for that email address would be NULL.)

MAX functions do not play nice with “no data.” The situation would be like calling four of your friends to find out who had the most bananas, and getting these results:

  • Friend A has 3 bananas.
  • Friend B has 4 bananas.
  • Friend C has 6 bananas.
  • Friend D doesn’t pickup the phone.

Which friend has the most bananas? Well, you don’t know because Friend D could potentially have THOUSANDS of bananas.

For my purpose, I knew that if an individual had no activity, then that activity had no potential to be the the “most recent” activity, but MySQL just couldn’t process it. So I had to teach it how. My solution was to give the query a date that was long before the course even started. I used the epoch date — January 1, 1970. I made an IF statement that says, “Hey database, if you run across a NULL value, treat it as a the epoch date.” To do that, I used the MySQL IFNULL function wrapped around a function that told the database to treat a string of numbers as a date.

IFNULL(last_activity_at, DATE(19901231))

To take the maximum of that, I used my MAX function around all of that!

MAX(IFNULL(last_activity_at, DATE(19901231))

There remained a serious problem: What if I were to run into an edge case where a user had never been active in any of their enrollments? (An edge case is basically a highly unlikely scenario that may or may not be worth factoring into your efforts.) My extended function at this point could potentially see three courses with no activity in any of them. The function would assign 1970 as the last_activity_date in all of those cases and take the maximum of it. It would tell me that the most recent user activity was January 01, 1970, which would not be correct, so I had to teach it about this potential edge case. I decided to tell it that after it sorts out how to handle the NULL values, if it finds a MAX(last_activity_date) that does equal January 01, 1970, then MySQL should call it “no activity.” To do that, I used the IF() function below:

IF(max(IFNULL(last_activity_at, date(19901231))) = date(19901231), ‘No Activity’

But it continues to nest in complexity. The IFNULL function requests only that I give the query ONE value. It asks: “If this value is null, then how should I treat it?”

The IF() function requires two pieces of information:

  • If something equals the specified value, how should the IF() function treat it?
  • If something is NOT that value, how should the IF() function treat it?

I needed to spell out the following:

IF(something = 1, if the value DOES = 1 ‘make it this’, if value does NOT = 1 ‘make it that’)

My earlier IF statement was incomplete because if I were to read it out in human language, it said,

“Find me the maximum last activity date, but if that last activity date is null then treat it as a Jan 01, 1970, and but if the maximum value does turn out to be equal Jan 01, 1970, then say that the value = ‘No Activity.’”

MySQL wanted to know what I wanted to do if the value were NOT 1970. Here, fortunately, it didn’t increase in complexity because I already knew what I wanted. I just stuffed my earlier function into the “if not” side of the IF() function:

IF(max(IFNULL(last_activity_at, date(19901231))) = date(19901231), ‘No Activity’, max(IFNULL(last_activity_at, date(19901231))))

Code Sample, ACarlsen

Now it reads as such:

“Find me the maximum last activity date. If that last activity date is null then treat it as a Jan 01, 1970. If the value DOES equal Jan 01, 1970, then say that the value = ‘No Activity.’ However, if the maximum value DOES NOT equal January 1970, just give me the maximum last activity date where you have considered that null values are treated as January 01, 1970.”

And that is how I found a user’s most recent activity across multiple online training enrollments.

 

Adam “Carl” Carlsen is a Data Analyst at BitLoft. To find our more about how BitLoft can help you with analytics, reach out to us from the BitLoft Contact page.

Share this: