Developing in Java: September 2013

This post will be about how to do a simple user activity count in a MySQL database with hourly resolution for the last 24 hours.

Suppose we have the following table:

id	activityTimestamp
10	2013-09-21 23:45:08
9	2013-09-21 23:30:44
8	2013-09-21 16:11:51
7	2013-09-21 15:14:20
6	2013-09-21 14:25:11
5	2013-09-21 12:23:22
4	2013-09-21 09:09:32
3	2013-09-21 08:44:49
2	2013-09-21 08:44:38
1	2013-09-21 06:38:09

We want to have a report, where each previous 24 hours, the activity count is shown. For this, we need a pivot table, because MySQL does not support auto-generated sequences on-the-fly:

id	int_value	key
1	0	TIME_S
2	1	TIME_S
3	2	TIME_S
4	3	TIME_S
5	4	TIME_S
6	5	TIME_S
7	6	TIME_S
8	7	TIME_S
9	8	TIME_S
10	9	TIME_S
11	10	TIME_S
12	11	TIME_S
13	12	TIME_S
14	13	TIME_S
15	14	TIME_S
16	15	TIME_S
17	16	TIME_S
18	17	TIME_S
19	18	TIME_S
20	19	TIME_S
21	20	TIME_S
22	21	TIME_S
23	22	TIME_S
24	23	TIME_S

The first task is to select each 24 hours :

The SQL query to do that is the following:


SELECT NOW() - INTERVAL int_value HOUR-INTERVAL MINUTE(NOW()) MINUTE-INTERVAL SECOND(NOW()) SECOND dt

FROM `pivot` WHERE NOW() - INTERVAL int_value HOUR > NOW() - INTERVAL 1 DAY

and `key` = 'TIME_S' order by int_value asc

What it basically does, is the query select every whole hours backwards, and displays the timestamp of every previous whole hour.

2013-09-21 23:00:00

2013-09-21 22:00:00

2013-09-21 21:00:00

2013-09-21 20:00:00

2013-09-21 19:00:00

2013-09-21 18:00:00

2013-09-21 17:00:00

2013-09-21 16:00:00

2013-09-21 15:00:00

2013-09-21 14:00:00

2013-09-21 13:00:00

2013-09-21 12:00:00

2013-09-21 11:00:00

2013-09-21 10:00:00

2013-09-21 09:00:00

2013-09-21 08:00:00

2013-09-21 07:00:00

2013-09-21 06:00:00

2013-09-21 05:00:00

2013-09-21 04:00:00

2013-09-21 03:00:00

2013-09-21 02:00:00

2013-09-21 01:00:00

2013-09-21 00:00:00

2013-09-20 23:00:00

the result is the following:

The next step is to do this selection again, but increase the hour difference with 1, so we will have an interwall based on the two selections:


SELECT NOW() - INTERVAL int_value + 1 HOUR-INTERVAL MINUTE(NOW()) MINUTE-INTERVAL SECOND(NOW()) SECOND dt

FROM `pivot` WHERE NOW() - INTERVAL int_value + 1 HOUR > NOW() - INTERVAL 1 DAY and `key` = 'TIME_S' order by int_value asc

This selection is the same as the previous one, only the values are shifted with 1 hour.

Now, it is time, to do the selection, where the activities are in the range provided by the two subqueries:


select DATE_FORMAT(from_time.dt,'%Y-%m-%d %H') as from_ ,

DATE_FORMAT(to_time.dt,'%Y-%m-%d %H') as to_, sum(coalesce(activities.num, 0)) as acts from

(SELECT NOW() - INTERVAL int_value HOUR-INTERVAL MINUTE(NOW()) MINUTE-INTERVAL SECOND(NOW()) SECOND dt

FROM `pivot` WHERE NOW() - INTERVAL int_value HOUR > NOW() - INTERVAL 1 DAY

and `key` = 'TIME_S' order by int_value asc) as to_time,

(SELECT NOW() - INTERVAL int_value + 1 HOUR-INTERVAL MINUTE(NOW()) MINUTE-INTERVAL SECOND(NOW()) SECOND dt

FROM `pivot` WHERE NOW() - INTERVAL int_value + 1 HOUR > NOW() - INTERVAL 1 DAY

and `key` = 'TIME_S' order by int_value asc) as from_time

left outer join (select count(distinct ua.activityid) as num, ua.timestamp as timestmp

from `useractivity` ua where  ua.timestamp > (NOW() - INTERVAL 1 DAY) group by 2) as

activities on (DATE_FORMAT(from_time.dt,'%Y-%m-%d %H') = DATE_FORMAT(activities.timestmp,'%Y-%m-%d %H'))

WHERE from_time.dt = (to_time.dt - INTERVAL 1 HOUR)

group by 1, 2 limit 24

The coalesce is needet, to include the hours, where no activity was registered in the database.

The result is the following:

from	to	activitynumber
2013-09-20 23	2013-09-21 00	0
2013-09-21 00	2013-09-21 01	0
2013-09-21 01	2013-09-21 02	0
2013-09-21 02	2013-09-21 03	0
2013-09-21 03	2013-09-21 04	0
2013-09-21 04	2013-09-21 05	0
2013-09-21 05	2013-09-21 06	0
2013-09-21 06	2013-09-21 07	1
2013-09-21 07	2013-09-21 08	0
2013-09-21 08	2013-09-21 09	2
2013-09-21 09	2013-09-21 10	1
2013-09-21 10	2013-09-21 11	0
2013-09-21 11	2013-09-21 12	0
2013-09-21 12	2013-09-21 13	1
2013-09-21 13	2013-09-21 14	0
2013-09-21 14	2013-09-21 15	1
2013-09-21 15	2013-09-21 16	1
2013-09-21 16	2013-09-21 17	1
2013-09-21 17	2013-09-21 18	0
2013-09-21 18	2013-09-21 19	0
2013-09-21 19	2013-09-21 20	0
2013-09-21 20	2013-09-21 21	0
2013-09-21 21	2013-09-21 22	0
2013-09-21 22	2013-09-21 23	0

There was a database upgrade lately, and we needed to include some auto-generated data based on an other table. The scenario was that one table listed all the available places for rent in the city, and we had to assign a default lease time for it. The lease_interval table had an id based on a sequence, so autogeneration was not an option, because the sequence was for the entire database.

place_pk	placename
123	place1
125	place2
127	place3
129	place4
131	place5
133	plcae6

The main idea was to do the generation of the new table in a for cycle. First step is to determine what variables are needed for the operation.
we needed the current status of the sequence counter, and we have named it p_count. Then we needed the ids of the place on which we iterated, so we named it placeid. All these two were of numeric types.

DECLARE
  p_count  numeric;
  placeid numeric;

The second step was to create a for loop:

FOR placeid IN SELECT place_pk FROM place
 LOOP
  n_count=n_count+1;
  INSERT INTO lease_interval (lease_interval_id, leasetime, place)
  SELECT p_count, NULL, placeid;
 END LOOP;

Great, now we need a criteria for the last_login for each day. The previous query generates timestamps with 00:00:00.000, so we need an interval with the current day as the beginning, and the next day as the end of the interval we are looking for.
Withe the help of the query

select (i::date + '1 day'::interval) from 
 generate_series('2012-12-30', '2013-01-05', '1 day'::interval) i;

Here, the sequence is incremented in every iteration, so we could keep track of the incrementing of the sequence.
then using the SELECT place_pk FROM place statement, we could select all the values of place_pk, and iterate through them.

Then the insert statement is just the usual, with the exception of using the variables declared in the function declaration section.

After the loop is finished, the sequence is updated with the new sequence count.

The function creation and the transaction handling is as usual, as you can see the whole script:

begin;

CREATE OR REPLACE FUNCTION add_default_lease_time()
  RETURNS void AS $$ 
DECLARE
  p_count  numeric;
  placeid numeric;
BEGIN
  select seq_count from sequence INTO p_count;
  FOR placeid IN SELECT place_pk FROM place
 LOOP
  n_count=n_count+1;
  INSERT INTO lease_interval (lease_interval_id, leasetime, place)
  SELECT p_count, NULL, placeid;
 END LOOP;
 update sequence set seq_count= p_count;

END; 
$$ LANGUAGE plpgsql; 

select * from add_default_lease_time();

commit;

And the results:

lease_interval_id	leasetime	place
221	null	123
222	null	125
223	null	127
224	null	129
225	null	131
226	null	133

Developing in Java

Saturday, September 21, 2013

Count user activity in every hour

Sunday, September 1, 2013

Insert into postgres from existing table with new sequences