item:
item_name | price | color | store |
Item 1 | 32 | red | null |
Item 2 | 3 | blue | null |
Item 3 | 7 | yellow | null |
store:
store_id | name | coordinate |
1 | Store 1 | 12°34′56″N 12°34′56″E |
2 | Store 2 | 12°34′56″N 12°34′56″W |
3 | Store 3 | 12°34′56″S 12°34′56″E |
4 | Store 4 | 12°34′56″S 12°34′56″W |
The requirement was that the tables need to be populated with approximately 1 million items, and these items should be randomly associated with the stores. The stores were previously defined, and the store table has already been populated with the data. The items were partially defined, only the store association was null.
The database was Postgres 8.3.
The first SQL attempt was the following:
update item set store = (select store_id from store order by random() limit 1) where item.store is null;The result was:
item_name | price | color | store |
Item 1 | 32 | red | 2 |
Item 2 | 3 | blue | 2 |
Item 3 | 7 | yellow | 2 |
All items received the same store id. It seemed as if the evaluation of the random selection only happened once, and was cached for every record. This can happen because the inner selection does not have any dependencies, so it can be treated as a completely separate query.
The solution is that it has to depend on the outer query, even if the dependency is quite silly from the developer point of view:
update item t1 set t1.store = (select t2.store_id from store where t1=t1 order by random() limit 1) item.store is null;
The dependency here is the where t1=t1 part. Using this query, the result was exactly what we wanted to achieve:
item_name | price | color | store |
Item 1 | 32 | red | 2 |
Item 2 | 3 | blue | 4 |
Item 3 | 7 | yellow | 1 |
No comments:
Post a Comment