Practicing SQL in leetcode – Game Play Analysis

I’m pleased that I’m off probation at the State of Vermont, but I know from my evaluation that one of my learning objectives over the next year is to improve my SQL skills. I’ve been doing some tutorials but it’s not sticking as well as I’d like, so I’m turning to leetcode since I’ve heard mixed results about it. Might as well see how it goes!

Game Play Analysis I

The first result for ‘leetcode sql’ in DuckDuckGo was a curated list of leetcode exercises. I opened the list in leetcode and the first exercise (Game Play Analysis I) … was pretty much the same business case that I had described in my previous post about subqueries and order of operations in a query.

My first submission used a subquery in much the same way, and I noticed that it was quite slow compared to the other submissions. I compared a faster submission and determined that it was possible in MySQL to use min() and group by to get a more performant result, but I’m unsure if this more performant query is more resilient to poor data normalization. I’ll have to look more into it so that I can learn more.

Game Play Analysis II

The prompt is now not just to find the most recent date for a record but to find the most recent device ID. Interestingly, my subquery method was much easier to modify for Game Play Analysis II and there performed better that 63% of other responses. Another submission I looked at that did substantially better than I used constraints on the where clause and I suspect that the performance gain from not using a CTE is the only optimization possible here.

# Write your MySQL query statement below

 select player_id, device_id  
 from Activity 
 where (player_id, event_date) in (select player_id, min(event_date) as event_date from Activity
groupby player_id);

Game Play Analysis III

My subquery method is back to being low performance when the prompt is amended to report the total number of games a player has participated in for each of the days they have logged in. This other solution doesn’t select unnecessary data and, since sum() isn’t really being used as a window function here, doesn’t need to be resolved out of order with a CTE.

sum(games_played) over (partitionby player_id orderby event_date) as games_played_so_far
from activity

Removing the extra subquery and the unnecessary data places the performance of my query back up in that top 40%.

# Write your MySQL query statement below
    SUM(games_played) OVER (partitionby player_id orderby event_date asc) AS games_played_so_far

from activity
groupby player_id, event_date

The most optimized solutions specify which rows to add, which highlights to me that something I can focus on learning is about row selection. In this case, since the prompt cares about total games played over the life of the account on any given day that the user has been logged in, I suspect the optimization here is preventing rows from future dates being considered for summation.

# Write your MySQL query statement below
SUM(games_played) OVER (
    PARTITIONBY player_id
FROM activity
GROUPBY event_date, player_id

Game Play Analysis IV

This one is pretty tough for me right now, so I am going to do exercises from other series and come back to this one when I’m more prepared.

Further Learning Questions:

  • Game Play Analysis I – does using a subquery like I did make a query more resilient to poorly normalized data, or is it just inefficient? Are there any benefits to using a subquery like I did over simply constraining the where clause?
  • Game Play Analysis II – what kind of optimization does choosing to use constraints on where over a CTE provide?
  • Game Play Analysis III – what kind of optimization does choosing to use analytical functions on only certain rows provide?
  • Game Play Analysis IV – Yet to come!

Thank you for reading. Please let me know if there are any clarifications I can make or further questions I can answer either down in the comments, on LinkedIn, or hit me up on Mastodon.

Related Posts