Recently I worked on the package that loops many text files and insert the data into a staging table and then from the staging into original table which has five columns only defined as VARCHAR(50) in the database. Pretty simple , not? Text files have many data and the destination table has many columns (90+) and very wide data tape client defines (VARCHAR(3000)). Actually he needs to get only few columns but who cares, right? We have observed that the memory during the package execution grow rapidly and very quickly MSSQL Server service was become unavailabe. You can imagine that during this spike nobody can work and a local DBA started getting complains from the end users. I would like to say that the server is very strong (4CPU+ 28RAM)...So what's problem?
At the source component , the estimated sized of a row is determined by the maximum columns sizes of all columns returned by the query (remember we have 90+ columns defined as VARCHAR(3000)). This is where the performance problem resides.
After sometime investigations we dropped the wide stage table and created a stage table with exactly same structure as original table has.
That means columns have VARCHAR(50) and we also changed the job schedule to run it frequently to deal with small data sets. So performance was drastically improved. We have not seen any more memory growing and not reducing, there is no more compliance from the end user.
Please read this article if you deal with packages especially if it works with large data sets.
Wednesday, May 14, 2014
Wednesday, January 1, 2014
We know that heaps are the tables without clustered index on. It may have many non clustered index (NCI) but still it is considered a heap. So consider a huge table (heap) that is fragmented. How would you fix it? Technically, you cannot defrag a heap table. But remember we have ALTER table REBUILD command which works very well on the heaps. But there is one big But. If you have many NCI on the table this command will rebuild all of them at once. What does it mean? It can produce transaction-log bloat which hits the over all performance. Think about the process the scan the log or perhaps Log Shipping job that needs to move the log file to the remote server. All this affects performance.
So, to answer the question ,we need to create a clustered index on that table..That is simple answer to the question.
Happy New Year to all!