How to reckon up running totals?

by Eugene Krasovskiy (05-02-2011)

There are often cases when for every row of ordered table you should count the sum of digit column values consist of rows, which are above or/and below with current in some order. So this is the task of reckon up running totals. For example, the table ordered by 'time' column:

time   	var  
1	15
2	10
4	13
7	11

we shouls get the next table:

time	var	total_sum 
1	15	15
2	10	25
4	13	38
7	11	49

Let's state the concrete task for the 'Painting' DB for the better understanding.

Task 1

For every moment when the square q_id = 10 was painted we should get the sum of all spended paint at the moment.

Table 'utB' includes 'b_datetime' and 'b_vol' columns. We should count the sum of ' b_vol' in all rows for every value 'X' from the ' b_datetime' where the current time is b_datetime ≤ X.

Let's resolve the task with the two most popular ways.

1) Subquery in the SELECT clause

This method supposed to do the running totals using the subquery. Being in a hurry you can do the next:


SELECT b_datetime,
	(
		SELECT SUM(T1.b_vol)
		FROM utB T1
		WHERE T1.b_datetime <= T2.b_datetime
		AND T1.b_q_id = 10
	) total_vol
FROM utB T2
WHERE b_q_id = 10;

However it's a wrong query! The cause is that the square #10 can be painted with the different spray cans so in such case we get duplicated rows:

b_datetime		total_vol 
2003-01-01 01:12:31.000 255
2003-01-01 01:12:31.000 255
2003-01-01 01:12:33.000	265
2003-01-01 01:12:34.000	275
2003-01-01 01:12:35.000	520
2003-01-01 01:12:36.000	765

This mistake can be fixed with adding DISTINCT to the first SELECT, but this time subquery will run for every row which have same 'b_datetime' values and only after that duplicates will be fixed. So we should fix duplicates before, for example:


SELECT b_datetime,
	(
		SELECT SUM(T1.b_vol)
		FROM utB T1
		WHERE T1.b_datetime <= T2.b_datetime
		AND T1.b_q_id = 10
	) total_vol
FROM
(
	SELECT DISTINCT b_datetime --fixing duplicates
	FROM utB
	WHERE b_q_id = 10
) T2;

2) Cartesian product

The sence of method is that the table interflow itself with the condition ' X >= b_datetime'. Here 'X' should not be repeated. Otherwise duplicate rows will be counted multiple times in calculating the total sum. Then the sum of ' b_vol' is counted, ordered by ' b_datetime'. See the example below:


SELECT T2.b_datetime, SUM(T1.b_vol) total_vol
FROM utB T1
	INNER JOIN
	(
		SELECT DISTINCT b_datetime	--fixing duplicates
		FROM utB
		WHERE b_q_id = 10 --we contemplate only the square for which b_q_id = 10
	) T2
	ON T1.b_datetime <= T2.b_datetime
WHERE T1.b_q_id = 10
GROUP BY T2.b_datetime;

Below is incorrect result of the query where there is no DISTINCT in the table 'T1'.

b_datetime		total_vol 
2003-01-01 01:12:31.000	510
2003-01-01 01:12:33.000	265
2003-01-01 01:12:34.000	275
2003-01-01 01:12:35.000	520
2003-01-01 01:12:36.000	765

This example was chosen specially to exceed the limits of theme because often you have to keep an eye on different nuances besides of counting running totals. Done and done.

Addition

Both methods demand multiple reading from table. It can be avoided by using numerical sequence generation. Let's reformulate the task to discourse forward:

Task 2

We have to match the paint number in ascending order of 'b_datetime' to every moment the square 'q_id = 10' was painted. Also we should get the sum of all used paint for the square at the time it was painted for every numbers of paint.

Now, if we would use the first method we matched the tables with condition by using numder, but not time. It happens often. So the realisation of the first metod should be looked like this:

SELECT T2.rn, SUM(T1.b_vol) total_vol
FROM
(
	SELECT ROW_NUMBER() OVER(ORDER BY b_datetime) rn,
		SUM(b_vol)b_vol ----It's enough to count this only for one table
	FROM utB WHERE b_q_id = 10
	GROUP BY b_datetime --fixing duplicates
)T1
INNER JOIN
(
	SELECT ROW_NUMBER() OVER(ORDER BY b_datetime) rn
	FROM utB WHERE b_q_id = 10
	GROUP BY b_datetime
)T2
ON T1.rn <= T2.rn
GROUP BY T2.rn;

Here the ROW_NUMBER() function of Transact-SQL is used for row numeration.

Notice that 'T2' table is just sequence of positive integers so it is not necessary to read 'utB' for creating it! It's enough to generate the sequence of positive integers. But we don't know how many numbers do we need to generate, otherwise we know that 'b_vol' is integer greater than zero and the quantity of paint for the one square can't be more than 765. So it will be enough to generate 765 numbers. You can get the quantity of numbers by the subquery, some times it's helpful and consist of the task. At the end we get such query:


SELECT T2.rn, SUM(T1.b_vol) total_vol
FROM
(
SELECT rn, b_vol, COUNT(*)OVER()cnt_rec --counting amount of rows
FROM
	(
		SELECT ROW_NUMBER() OVER(ORDER BY b_datetime) rn,
			SUM(b_vol)b_vol
		FROM utB WHERE b_q_id = 10
		GROUP BY b_datetime
	)T
)T1
INNER JOIN
(
	SELECT a + 10*b + 100*c rn
	FROM
		(SELECT 1 a UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
		UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8
		UNION SELECT 9 UNION SELECT 10)AA,

		(SELECT 0 b UNION SELECT 1 UNION SELECT 2 UNION SELECT 3
		UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7
		UNION SELECT 8 UNION SELECT 9)BB,

		(SELECT 0 c UNION SELECT 1 UNION SELECT 2 UNION SELECT 3
		UNION SELECT 4 UNION SELECT 5 UNION SELECT 6
		UNION SELECT 7)CC

)T2 -- sequence (1..800)
ON T1.rn <= T2.rn
AND T2.rn <= T1.cnt_rec -- limiting the 'T2' rows amount using 'T1' rows amount
GROUP BY T2.rn;

Of course using such hook is not justified this time, but if we would have the table created by the complex resource-consuming query instead of 'T1' table, the opportunity to avoid its self-matching (double recalculation) will improve the performance. And this example was used just for simple explaining.

Previous | Index | Next

Home

SELECT exercises (rating stages)

DML exercises

Developers