Saturday, September 21, 2013

Count user activity in every hour

This post will be about how to do a simple user activity count in a MySQL database with hourly resolution for the last 24 hours.

Suppose we have the following table:

idactivityTimestamp
102013-09-21 23:45:08
92013-09-21 23:30:44
82013-09-21 16:11:51
72013-09-21 15:14:20
62013-09-21 14:25:11
52013-09-21 12:23:22
42013-09-21 09:09:32
32013-09-21 08:44:49
22013-09-21 08:44:38
12013-09-21 06:38:09

We want to have a report, where each previous 24 hours, the activity count is shown. For this, we need a pivot table, because MySQL does not support auto-generated sequences on-the-fly:

idint_valuekey
1 0 TIME_S
2 1 TIME_S
3 2 TIME_S
4 3 TIME_S
5 4 TIME_S
6 5 TIME_S
7 6 TIME_S
8 7 TIME_S
9 8 TIME_S
10 9 TIME_S
11 10 TIME_S
12 11 TIME_S
13 12 TIME_S
14 13 TIME_S
15 14 TIME_S
16 15 TIME_S
17 16 TIME_S
18 17 TIME_S
19 18 TIME_S
20 19 TIME_S
21 20 TIME_S
22 21 TIME_S
23 22 TIME_S
24 23 TIME_S

The first task is to select each 24 hours :

The SQL query to do that is the following:

SELECT NOW() - INTERVAL int_value HOUR-INTERVAL MINUTE(NOW()) MINUTE-INTERVAL SECOND(NOW()) SECOND dt
FROM `pivot` WHERE NOW() - INTERVAL int_value HOUR > NOW() - INTERVAL 1 DAY
and `key` = 'TIME_S' order by int_value asc

What it basically does, is the query select every whole hours backwards, and displays the timestamp of every previous whole hour.
2013-09-21 23:00:00
2013-09-21 22:00:00
2013-09-21 21:00:00
2013-09-21 20:00:00
2013-09-21 19:00:00
2013-09-21 18:00:00
2013-09-21 17:00:00
2013-09-21 16:00:00
2013-09-21 15:00:00
2013-09-21 14:00:00
2013-09-21 13:00:00
2013-09-21 12:00:00
2013-09-21 11:00:00
2013-09-21 10:00:00
2013-09-21 09:00:00
2013-09-21 08:00:00
2013-09-21 07:00:00
2013-09-21 06:00:00
2013-09-21 05:00:00
2013-09-21 04:00:00
2013-09-21 03:00:00
2013-09-21 02:00:00
2013-09-21 01:00:00
2013-09-21 00:00:00
2013-09-20 23:00:00

the result is the following:


The next step is to do this selection again, but increase the hour difference with 1, so we will have an interwall based on the two selections:

SELECT NOW() - INTERVAL int_value + 1 HOUR-INTERVAL MINUTE(NOW()) MINUTE-INTERVAL SECOND(NOW()) SECOND dt
FROM `pivot` WHERE NOW() - INTERVAL int_value + 1 HOUR > NOW() - INTERVAL 1 DAY and `key` = 'TIME_S' order by int_value asc

This selection is the same as the previous one, only the values are shifted with 1 hour.

Now, it is time, to do the selection, where the activities are in the range provided by the two subqueries:

select DATE_FORMAT(from_time.dt,'%Y-%m-%d %H') as from_ ,
DATE_FORMAT(to_time.dt,'%Y-%m-%d %H') as to_, sum(coalesce(activities.num, 0)) as acts from
(SELECT NOW() - INTERVAL int_value HOUR-INTERVAL MINUTE(NOW()) MINUTE-INTERVAL SECOND(NOW()) SECOND dt
FROM `pivot` WHERE NOW() - INTERVAL int_value HOUR > NOW() - INTERVAL 1 DAY
and `key` = 'TIME_S' order by int_value asc) as to_time,
(SELECT NOW() - INTERVAL int_value + 1 HOUR-INTERVAL MINUTE(NOW()) MINUTE-INTERVAL SECOND(NOW()) SECOND dt
FROM `pivot` WHERE NOW() - INTERVAL int_value + 1 HOUR > NOW() - INTERVAL 1 DAY
and `key` = 'TIME_S' order by int_value asc) as from_time
left outer join (select count(distinct ua.activityid) as num, ua.timestamp as timestmp
from `useractivity` ua where  ua.timestamp > (NOW() - INTERVAL 1 DAY) group by 2) as
activities on (DATE_FORMAT(from_time.dt,'%Y-%m-%d %H') = DATE_FORMAT(activities.timestmp,'%Y-%m-%d %H'))
WHERE from_time.dt = (to_time.dt - INTERVAL 1 HOUR)
group by 1, 2 limit 24
The coalesce is needet, to include the hours, where no activity was registered in the database.

The result is the following:

fromtoactivitynumber
2013-09-20 23 2013-09-21 00 0
2013-09-21 00 2013-09-21 01 0
2013-09-21 01 2013-09-21 02 0
2013-09-21 02 2013-09-21 03 0
2013-09-21 03 2013-09-21 04 0
2013-09-21 04 2013-09-21 05 0
2013-09-21 05 2013-09-21 06 0
2013-09-21 06 2013-09-21 07 1
2013-09-21 07 2013-09-21 08 0
2013-09-21 08 2013-09-21 09 2
2013-09-21 09 2013-09-21 10 1
2013-09-21 10 2013-09-21 11 0
2013-09-21 11 2013-09-21 12 0
2013-09-21 12 2013-09-21 13 1
2013-09-21 13 2013-09-21 14 0
2013-09-21 14 2013-09-21 15 1
2013-09-21 15 2013-09-21 16 1
2013-09-21 16 2013-09-21 17 1
2013-09-21 17 2013-09-21 18 0
2013-09-21 18 2013-09-21 19 0
2013-09-21 19 2013-09-21 20 0
2013-09-21 20 2013-09-21 21 0
2013-09-21 21 2013-09-21 22 0
2013-09-21 22 2013-09-21 23 0


Sunday, September 1, 2013

Insert into postgres from existing table with new sequences

There was a database upgrade lately, and we needed to include some auto-generated data based on an other table. The scenario was that one table listed all the available places for rent in the city, and we had to assign a default lease time for it. The lease_interval table had an id based on a sequence, so autogeneration was not an option, because the sequence was for the entire database.


place_pkplacename
123place1
125place2
127place3
129place4
131place5
133plcae6

The main idea was to do the generation of the new table in a for cycle. First step is to determine what variables are needed for the operation.
we needed the current status of the sequence counter, and we have named it p_count. Then we needed the ids of the place on which we iterated, so we named it placeid. All these two were of numeric types.

DECLARE
  p_count  numeric;
  placeid numeric;
The second step was to create a for loop:

FOR placeid IN SELECT place_pk FROM place
 LOOP
  n_count=n_count+1;
  INSERT INTO lease_interval (lease_interval_id, leasetime, place)
  SELECT p_count, NULL, placeid;
 END LOOP;

Great, now we need a criteria for the last_login for each day. The previous query generates timestamps with 00:00:00.000, so we need an interval with the current day as the beginning, and the next day as the end of the interval we are looking for.
Withe the help of the query

select (i::date + '1 day'::interval) from 
 generate_series('2012-12-30', '2013-01-05', '1 day'::interval) i;
Here, the sequence is incremented in every iteration, so we could keep track of the incrementing of the sequence.
then using the SELECT place_pk FROM place statement, we could select all the values of place_pk, and iterate through them.

Then the insert statement is just the usual, with the exception of using the variables declared in the function declaration section.

After the loop is finished, the sequence is updated with the new sequence count.

The function creation and the transaction handling is as usual, as you can see the whole script:


begin;

CREATE OR REPLACE FUNCTION add_default_lease_time()
  RETURNS void AS $$ 
DECLARE
  p_count  numeric;
  placeid numeric;
BEGIN
  select seq_count from sequence INTO p_count;
  FOR placeid IN SELECT place_pk FROM place
 LOOP
  n_count=n_count+1;
  INSERT INTO lease_interval (lease_interval_id, leasetime, place)
  SELECT p_count, NULL, placeid;
 END LOOP;
 update sequence set seq_count= p_count;

END; 
$$ LANGUAGE plpgsql; 

select * from add_default_lease_time();

commit;

And the results:

lease_interval_idleasetimeplace
221null123
222null125
223null127
224null129
225null131
226null133