One of our favorite power-features that’s built into Google Sheets is the QUERY formula. The QUERY formula isn’t available in Excel workbooks, and historically it’s been used mostly in powerful database tools. It requires a bit more effort to use than some other formulas, but it provides outsized rewards. If you explore our Tiller budgeting and personal finance templates, you’ll find we use QUERY formulas throughout to provide concise reports based on the transaction and balance data that Tiller is automatically feeding into your spreadsheet each day.
If you want to write your own QUERY in your Google Sheet, we’ll share some tips here that might help whether you’ve never seen QUERY before or you’re a hard-core stats nerd SQL guru.
Speaking of SQL, QUERY has its origins there. Google Sheets’ implementation doesn’t exactly use SQL (they use the Google Visualization API Query Language), but QUERY will have a familiar syntax and functionality to anyone who has had some exposure to SQL.
So that’s the background, but what can we DO with it?
Let’s run a quick query against our spending data to see how much we spend every month on medical expenses. I could filter and SUM up my 2016 medical category and divide by 12. However, since I have a yearly deductible I have to meet before insurance coverage kicks in, my spending is front loaded at the beginning of the year. That simple filter and SUM approach really wouldn’t give me a good picture of how to budget. Indeed, if I used that approach, we’d probably end up over my budget right away in the early part of the year, before the deductible is met - not very encouraging when you’re trying to stick to a budget! So I really want to use QUERY to look month-by-month.
Querying medical expenses by month
We can do that by aggregating (in this case, SUM) by month. Two very simple but extremely powerful aggregation functions are SUM and AVG, and I use them quite often with both my transaction and balance data. In this blog post, let’s look at what we can do with QUERY, SUM, and GROUP BY.
Here’s what it all looks like in my Google Sheets spreadsheet, which was originally generated using the Tiller Standard Template. I’ve been with Tiller for quite a while, and I have several years worth of transaction data on my Transactions sheet, where I have also already categorized my spending (including medical - of which I have quite a bit*). I added a new blank sheet to my spreadsheet, and created my QUERY in cell A2 of that new blank sheet:
Here's the query
=QUERY(Transactions!A:Z, " SELECT K, -1 * SUM(D) WHERE C LIKE 'Medical' AND A > DATE '2015-12-31' AND A < DATE '2017-01-01' GROUP BY K LABEL -1 * SUM(D) 'Monthly Medical' FORMAT K 'yyyy-MMM', -1 * SUM(D) '$#,##0.00' ")
Dissecting this QUERY formula
Let’s dissect what I did with this formula.
=QUERY(Transactions!A:Z, " … ")
This opening line starts the QUERY formula, and it specifies the data I want to query. The last closing line completes the formula. In this case, I am pulling the data from columns A-Z of my Transactions sheet. Note the trailing quote on the first line and the leading quote on the last line, which are important to designate the query portion of the formula.
Choosing columns with the SELECT clause
SELECT K, -1 * SUM(D)
The SELECT clause indicates which columns to output, and in what order. I’m referencing the column identifier ‘K’ to indicate the first column in my output should correspond to values from column K directly from our input data (the Transactions sheet). Column K is the Month column, in my instance of the Tiller Standard template. Simple enough.
The second column, -1 * SUM(D) is a bit more complicated and is where about half of the ‘magic’ is. Column D in my transactions sheet (input) is the Amount column. Selecting this column using the QUERY means that I want the second column in my output to be the negative of the SUM of the Amount values for some group of rows. We’ll specify how to group further down. The reason I want it to be negative is that the values in my Amount column on my Transactions sheet are debits and are therefore represented in the original data as negative numbers; I’d like to see them in the results represented as positive numbers.
Filtering data with the WHERE clause
WHERE C LIKE 'Medical' AND A > DATE '2015-12-31' AND A < DATE '2017-01-01'
The WHERE clause filters my input data. In this first line, I’m saying I only want to grab data where the category (column C in my input data) is Medical. The second two lines filter based on dates. I only want to grab data after the last day of 2015 and the first day of 2017. Note the order of the date format -- dates are represented in queries as string literals in the format yyyy-MM-dd.
Grouping results with GROUP BY
GROUP BY K
The GROUP BY clause is where the other half of the ‘magic’ is, and it indicates how we want the aggregation functions in the SELECT clause to be applied. In this case, we’re saying we want the SUM to apply to all of the rows for each distinct value in column K (i.e. the Month) in the input data. In other words, we want to SUM by Month, and output each Month as its own row.
Beautifying results with LABEL and FORMAT
LABEL -1 * SUM(D) 'Monthly Medical'
This LABEL clause is primarily for aesthetics, and it indicates how I want the header for the second column to look -- ‘Monthly Medical’ is a lot more meaningful (and readable) than the default header that would otherwise be generated for a column containing aggregated data -- trust me!
Similarly, I want the values formatted in a human-readable way, so I used FORMAT to specify how I want the values in my two columns to look:
FORMAT K 'yyyy-MMM', -1 * SUM(D) '$#,##0.00'
The first column, corresponding to column K from the input, formats my month values to look similar to ‘2016-Jan’. The second column is formatted as US dollar currency, with two decimal places, thousand comma separators, and a dollar sign.
Reviewing and visualizing the QUERY’s data
At this point, I now have a nice-looking table listing out my monthly medical expenses for 2016, and I can see that my spending was indeed higher in the earlier months. I could easily apply the same technique to visualize the seasonal effect on my Utilities, or with a little creativity in the WHERE clause (see especially AND, OR, NOT, LIKE and MATCHES), I could see how much I spend at a particular merchant or for a particular set of transaction descriptions.
I also personally really appreciate the perspective I get from charts, so by selecting my new table and then selecting ‘Chart’ from the ‘Insert’ menu, I can easily chart my freshly aggregated data:
QUERY vs Pivot Tables
Experienced spreadsheet users may note that this type of aggregation, and many other straightforward aggregations that can be done with QUERY, can also be done just as easily with pivot tables. I use both extensively. I’ll often use a pivot table temporarily where I just want to look at something quickly, and use a QUERY where I might want to ‘bake something in’ to my spreadsheet more permanently, or where I’m wanting to do something more complex than what can be constructed with a single pivot table. I will also often start out with a pivot table to get a good idea of what I’m really trying to accomplish, then transition to a QUERY as soon as I start to think I’ll get better results if I could use AND, OR, NOT, LIKE or MATCHES in my WHERE clause.
Enjoy playing around with the very powerful QUERY formula on your own Google Sheet connected to Tiller. There is a lot in particular that you can do with your Transactions and Balance History data. Once you’ve mastered queries like the one above, you may find inspiration to aggregate multiple columns, for comparison -- to compare monthly discretionary spending to charitable giving, for instance. Or you could even look at how much you spend on Saturday versus other days of the week. The power of QUERY is primarily limited by available data and our own creativity. Have fun!
* These are my (Tim J) family’s actual out-of-pocket medical expenses for 2016. They may seem higher than expected, in part due to the fact that one of my children has type 1 diabetes. I advocate extensively for type 1 diabetes awareness on my personal blog and elsewhere, but here I just wanted to point out that this data (as visualized by this query) is personal, relevant and meaningful; no contrived example here!