Developing in Java: Postgres random in subquery

There was a use-case in a project, where huge number of test data was required, but it had to meet a constraint for one property. The database schema looked like the following:

item:

item_name	price	color	store
Item 1	32	red	null
Item 2	3	blue	null
Item 3	7	yellow	null

store:

store_id	name	coordinate
1	Store 1	12°34′56″N 12°34′56″E
2	Store 2	12°34′56″N 12°34′56″W
3	Store 3	12°34′56″S 12°34′56″E
4	Store 4	12°34′56″S 12°34′56″W

The requirement was that the tables need to be populated with approximately 1 million items, and these items should be randomly associated with the stores. The stores were previously defined, and the store table has already been populated with the data. The items were partially defined, only the store association was null.

The database was Postgres 8.3.

The first SQL attempt was the following:

update item set store = (select store_id from store order by random() limit 1) where item.store is null;

The result was:

item_name	price	color	store
Item 1	32	red	2
Item 2	3	blue	2
Item 3	7	yellow	2

All items received the same store id. It seemed as if the evaluation of the random selection only happened once, and was cached for every record. This can happen because the inner selection does not have any dependencies, so it can be treated as a completely separate query.

The solution is that it has to depend on the outer query, even if the dependency is quite silly from the developer point of view:

update item t1 set t1.store  = (select t2.store_id from store where t1=t1 order by random() limit 1) item.store is null;

The dependency here is the where t1=t1 part. Using this query, the result was exactly what we wanted to achieve:

item_name	price	color	store
Item 1	32	red	2
Item 2	3	blue	4
Item 3	7	yellow	1

Developing in Java

Sunday, December 2, 2012

Postgres random in subquery

No comments:

Post a Comment