This one’s a short blog post, and by no means new, but it may help you in the future and maybe save some head scratching. Commonly in Data Warehousing projects you will be processing data through tables whilst staging and cleaning source data. These tables may commonly be heaps (no Clustered Index), that have no persistent data; rather they are a transient set of tables that get deleted from when your Warehouse loads each day.
What you may not know is that when a heap is deleted from (as opposed to truncated) the space that was used by that data is not automatically de-allocated from the table, and when new data is added the table space is increased, even though the table may be “empty”.
So what is actually going on? When deleting from a heap the database engine either locks the page or the row, as opposed to the whole table when deleting on a clustered index. During the delete on the heap the page is emptied, but the engine does not de-allocate the page. This behaviour only occurs on a heap, and can cause very large tables with no data in them. For my recent project, it caused the stage and clean processes to slow down gradually over time because the database engine was scanning huge tables with no data in them.
So, what can we do to stop this behaviour? There are three options (as documented by Microsoft here (see Locking Behaviour)):
1. When deleting from the table obtain an exclusive lock on the whole table, by using a TABLOCK hint as follows:
DELETE FROM [TABLE] WITH (TABLOCK) WHERE [COLUMN] = ‘CRITERIA’
2. If possible, truncate the whole table, this will de-allocate all pages.
3. Finally, my least favourite, add a clustered index to the table.