Consider simple table with one column defined as REAL datatype
CREATE TABLE #t (c REAL)
INSERT INTO #t VALUES (0)
SELECT COUNT(*) FROM #t WHERE c=''
In above statement you expect getting 0 rows to be returns as we filter out for all nonempty rows..But it returns 1 and the answer you find looking at execution plan.
SQL Server will implicitly convert '' to REAL datatype with 0 and a result is 1 row to be return.
Wednesday, December 7, 2011
Sunday, November 6, 2011
Filter out characters---conversion error?
Just help out may colleague to write a query where we needed to filter out all data that contains characters and because a column is defined as VARCHAR we CAST it to INTEGER in order to implement range searching. Please see simplified demo script.
CREATE TABLE #t (c varchar(50))
INSERT INTO #t VALUES ('122')
INSERT INTO #t VALUES ('4545')
INSERT INTO #t VALUES ('4545/454')
INSERT INTO #t VALUES ('4899')
----Failed
SELECT * FROM
(
SELECT c FROM #t WHERE c NOT LIKE '%[/]%'
) AS d WHERE CAST(c AS INT)>10
----Succeed
SELECT * FROM #t
WHERE CASE WHEN c LIKE '%[^0-9]%' THEN 0
WHEN CAST(c AS int) BETWEEN 1 AND 1000 THEN 1
ELSE 0 END = 1
The answer why the first attemp is failed we found looking at execution plan.
Predicate
CONVERT(int,[tempdb].[dbo].[#t].[c],0)>(10) AND NOT [tempdb].[dbo].[#t].[c] like '%[/]%'
Thinking that we filter out all "bad" rows and can CAST the rest is wrong because as we see above predicate is applied for the whole table.
As opposite the second query we used CASE expression to filter out "bad" rows CASE...=1 we see that SQL Server really filters out "bad" rows and now CAST is working.
Predicate
CASE WHEN [tempdb].[dbo].[#t].[c] like '%[^0-9]%' THEN (0) ELSE CASE WHEN CONVERT(int,[tempdb].[dbo].[#t].[c],0)>=(1) AND CONVERT(int,[tempdb].[dbo].[#t].[c],0)<=(1000) THEN (1) ELSE (0) END END=(1)
CREATE TABLE #t (c varchar(50))
INSERT INTO #t VALUES ('122')
INSERT INTO #t VALUES ('4545')
INSERT INTO #t VALUES ('4545/454')
INSERT INTO #t VALUES ('4899')
----Failed
SELECT * FROM
(
SELECT c FROM #t WHERE c NOT LIKE '%[/]%'
) AS d WHERE CAST(c AS INT)>10
----Succeed
SELECT * FROM #t
WHERE CASE WHEN c LIKE '%[^0-9]%' THEN 0
WHEN CAST(c AS int) BETWEEN 1 AND 1000 THEN 1
ELSE 0 END = 1
The answer why the first attemp is failed we found looking at execution plan.
Predicate
CONVERT(int,[tempdb].[dbo].[#t].[c],0)>(10) AND NOT [tempdb].[dbo].[#t].[c] like '%[/]%'
Thinking that we filter out all "bad" rows and can CAST the rest is wrong because as we see above predicate is applied for the whole table.
As opposite the second query we used CASE expression to filter out "bad" rows CASE...=1 we see that SQL Server really filters out "bad" rows and now CAST is working.
Predicate
CASE WHEN [tempdb].[dbo].[#t].[c] like '%[^0-9]%' THEN (0) ELSE CASE WHEN CONVERT(int,[tempdb].[dbo].[#t].[c],0)>=(1) AND CONVERT(int,[tempdb].[dbo].[#t].[c],0)<=(1000) THEN (1) ELSE (0) END END=(1)
Thursday, November 3, 2011
Why all my stored procedures are saved in master database under System stored procedure folder?
Just having a discussion with a colleague , she made some changes in configuration and now when she creates a simple (not a system) stored procedure in master database it saves under Programmability --Stored Procedures--System Stored Procedures. Is it comfortable? No,right? After some investigation I found that we need to return 'allow updates' to 0 , see below script
EXEC sp_configure 'allow updates',0
reconfigure
Now everything got back to work in the 'right' place.
EXEC sp_configure 'allow updates',0
reconfigure
Now everything got back to work in the 'right' place.
Monday, September 5, 2011
Default Schema for Windows Group
Remember if you connect to SQL Server using Windows group, you cannot define a default schema. It is greyed out. This means that if you want everyone to be in a specific schema, you can't do it by default.
See reference
http://connect.microsoft.com/SQLServer/feedback/details/328585/default-schema-for-windows-group.
I hope finally MS will be addressing this issue in the next release of SQL Server named Denali (CTP 4)
/*
I'm happy to report that the next CTP for SQL Server Denali will allow default schemas to be assigned to Windows groups.
I'd like to thank everyone for their votes and attention to this issue. It made a different in being able to appropriately prioritize and make this improvement happen.
Il-Sung.
*/
See reference
http://connect.microsoft.com/SQLServer/feedback/details/328585/default-schema-for-windows-group.
I hope finally MS will be addressing this issue in the next release of SQL Server named Denali (CTP 4)
/*
I'm happy to report that the next CTP for SQL Server Denali will allow default schemas to be assigned to Windows groups.
I'd like to thank everyone for their votes and attention to this issue. It made a different in being able to appropriately prioritize and make this improvement happen.
Il-Sung.
*/
Tuesday, July 12, 2011
SQL Server 2008 R2 Service Pack 1 has been released
MS has just released SQL Server 2008 R2 Service Pack 1. Both the Service Pack and Feature Pack updates are available for download on the Microsoft Download Center. SQL Server 2008 R2 SP1 contains cumulative updates for SQL Server 2008 R2 and fixes to issues that have been reported through our customer feedback platforms.
To download the Service Pack packages please visit:
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26727 ---SQL Server 2008 R2 SP1
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26729 ---SQL Server 2008 R2 SP1 Express
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26728 ---SQL Server 2008 R2 SP1 Feature Packs
To learn more about SQL Server 2008 R2 SP1 please visit Knowledge Base Article For Microsoft SQL Server 2008 R2 SP1.
http://support.microsoft.com/kb/2528583
To download the Service Pack packages please visit:
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26727 ---SQL Server 2008 R2 SP1
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26729 ---SQL Server 2008 R2 SP1 Express
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=26728 ---SQL Server 2008 R2 SP1 Feature Packs
To learn more about SQL Server 2008 R2 SP1 please visit Knowledge Base Article For Microsoft SQL Server 2008 R2 SP1.
http://support.microsoft.com/kb/2528583
Monday, June 13, 2011
Partitioning experience
As you know since SQL Server 2005 MS introduced partitioning feature. I have been using it frequently with my clients but yesterday ,answering question on MSDN forum about partition I learned something new. Let say we have a partition table and staging table with the data and check constraint. See the below script.
create database db1
go
use db1
go
create partition function pf1(int) as range left for values (2,4)
create partition scheme ps1 as partition pf1 all to ([primary])
create table db1.dbo.t1 (id int ,descr varchar(2)) on ps1(id)
create table db1.dbo.t2 (id int,descr varchar(2))
alter table db1.dbo.t2 add check (id > 9)
create nonclustered index i1 on db1.dbo.t1(id)
create nonclustered index i2 on db1.dbo.t2(id)
insert into db1.dbo.t1 values (1,'n1')
insert into db1.dbo.t1 values(2,'n2')
insert into db1.dbo.t1 values(3,'N3')
insert into db1.dbo.t2 values (10,'n1')
insert into db1.dbo.t2 values(15,'n2')
---Add new range
alter partition function pf1() split range (9)
---Moving the data from t2 to t1 into a new added range
alter table db1.dbo.t2 switch to db1.dbo.t1 partition 4
Msg 4972, Level 16, State 1, Line 2
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table 'db1.dbo.t2' allows values that are not allowed by check constraints or partition function on target table 'db1.dbo.t1'.
I got the error, but everything should be ok. Both tables have the same datatypes,indexes,lengths.. What is going on here. I spend some time and found the below explanation in the BOL
/*
We recommend that you specify NOT NULL on the partitioning column of partitioned tables, and also nonpartitioned tables that are sources or targets for ALTER TABLE...SWITCH operations. By doing this, you make sure that any CHECK constraints on partitioning columns do not have to check for null values. Null values are ordinarily placed in the leftmost partition of a partitioned table. When you are switching any partition other than the leftmost partition and when the ANSI_NULLS database option is set to ON, the absence of a NOT NULL constraint on the source and target tables might interfere with any CHECK constraints that are also defined on the partitioning column.
*/
So altering CHECK constraint with id IS NOT NULL solved the problem..
alter table db1.dbo.t2 add check (id > 9 AND id is not null)
create database db1
go
use db1
go
create partition function pf1(int) as range left for values (2,4)
create partition scheme ps1 as partition pf1 all to ([primary])
create table db1.dbo.t1 (id int ,descr varchar(2)) on ps1(id)
create table db1.dbo.t2 (id int,descr varchar(2))
alter table db1.dbo.t2 add check (id > 9)
create nonclustered index i1 on db1.dbo.t1(id)
create nonclustered index i2 on db1.dbo.t2(id)
insert into db1.dbo.t1 values (1,'n1')
insert into db1.dbo.t1 values(2,'n2')
insert into db1.dbo.t1 values(3,'N3')
insert into db1.dbo.t2 values (10,'n1')
insert into db1.dbo.t2 values(15,'n2')
---Add new range
alter partition function pf1() split range (9)
---Moving the data from t2 to t1 into a new added range
alter table db1.dbo.t2 switch to db1.dbo.t1 partition 4
Msg 4972, Level 16, State 1, Line 2
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table 'db1.dbo.t2' allows values that are not allowed by check constraints or partition function on target table 'db1.dbo.t1'.
I got the error, but everything should be ok. Both tables have the same datatypes,indexes,lengths.. What is going on here. I spend some time and found the below explanation in the BOL
/*
We recommend that you specify NOT NULL on the partitioning column of partitioned tables, and also nonpartitioned tables that are sources or targets for ALTER TABLE...SWITCH operations. By doing this, you make sure that any CHECK constraints on partitioning columns do not have to check for null values. Null values are ordinarily placed in the leftmost partition of a partitioned table. When you are switching any partition other than the leftmost partition and when the ANSI_NULLS database option is set to ON, the absence of a NOT NULL constraint on the source and target tables might interfere with any CHECK constraints that are also defined on the partitioning column.
*/
So altering CHECK constraint with id IS NOT NULL solved the problem..
alter table db1.dbo.t2 add check (id > 9 AND id is not null)
Tuesday, May 3, 2011
SQL Server 2005 Maintenance Plan won't delete bakup files created in SQL Server 2008
I have been asked recently to help to one of my friends with "strange behaviour" as he described...He has SQL Server 2008 where sql job copies .BAK files (database backup) into his local disk drive with requirement to keep those file only for one month. So as he has SQL Server 2005 (SP3) installed on his machine the first idea was to create Maintenance Clean Up task to delete the files. But as you imagine that did not work. Surely, when I came to help, I did not know that those files are created in SQL Server 2008 and has been copied to his local machine and after cheching all possibilities I launch up SQL Server Profiler to see what is going on... I saw the below command to be executed for each file to be deleted .
exec sp_executesql N'RESTORE LABELONLY FROM DISK=@P1',N'@P1 nchar(27)',N'X:\GBDB\Log\log080411PM.bak'
Well, I put that statement in Query window and obviously got the error which leads me to the root of the problem.
Msg 3241, Level 16, State 7, Line 1
The media family on device 'X:\GBDB\Log\log040411PM.bak' is incorrectly formed. SQL Server cannot process this media family.
Msg 3013, Level 16, State 1, Line 1
RESTORE LABELONLY is terminating abnormally.
What do you think guys? Does SQL Server need to check out the version of the files need to be deleted or just delete files with .BAK extension regardless on version they were created ?
exec sp_executesql N'RESTORE LABELONLY FROM DISK=@P1',N'@P1 nchar(27)',N'X:\GBDB\Log\log080411PM.bak'
Well, I put that statement in Query window and obviously got the error which leads me to the root of the problem.
Msg 3241, Level 16, State 7, Line 1
The media family on device 'X:\GBDB\Log\log040411PM.bak' is incorrectly formed. SQL Server cannot process this media family.
Msg 3013, Level 16, State 1, Line 1
RESTORE LABELONLY is terminating abnormally.
What do you think guys? Does SQL Server need to check out the version of the files need to be deleted or just delete files with .BAK extension regardless on version they were created ?
Wednesday, April 6, 2011
Name of CTE and referenced table should be different
Just got the below example from the public forum.
CREATE TABLE Batch (Batch CHAR(1),Status INT)
INSERT INTO Batch VALUES ('A',1)
INSERT INTO Batch VALUES ('B',2)
INSERT INTO Batch VALUES ('C',3)
WITH Batch AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY Batch, Status) AS RowNum
FROM Batch
)
DELETE FROM Batch
WHERE RowNum=1
As you can see , running the query SQL Server throws the error.
Msg 252, Level 16, State 1, Line 1
Recursive common table expression 'Batch' does not contain a top-level UNION ALL operator.
SQL Server "thinks" that the CTE referenced to itself but there is no UNION ALL clause.In the blow example CTE named EmpCTE referenced within to itself to join with Employees table.
WITH EmpCTE(empid, empname, mgrid, lvl)
AS
(
-- Anchor Member (AM)
SELECT empid, empname, mgrid, 0
FROM Employees
WHERE empid = 7
UNION ALL
-- Recursive Member (RM)
SELECT E.empid, E.empname, E.mgrid, M.lvl+1
FROM Employees AS E
JOIN EmpCTE AS M
ON E.mgrid = M.empid
)
SELECT * FROM EmpCTE
Finally,in order to resolve the problem you need that CTE and user table have different names. Something like that
WITH Batch_cte AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY Batch, Status) AS RowNum
FROM Batch
)
DELETE FROM Batch_cte
WHERE RowNum=1
CREATE TABLE Batch (Batch CHAR(1),Status INT)
INSERT INTO Batch VALUES ('A',1)
INSERT INTO Batch VALUES ('B',2)
INSERT INTO Batch VALUES ('C',3)
WITH Batch AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY Batch, Status) AS RowNum
FROM Batch
)
DELETE FROM Batch
WHERE RowNum=1
As you can see , running the query SQL Server throws the error.
Msg 252, Level 16, State 1, Line 1
Recursive common table expression 'Batch' does not contain a top-level UNION ALL operator.
SQL Server "thinks" that the CTE referenced to itself but there is no UNION ALL clause.In the blow example CTE named EmpCTE referenced within to itself to join with Employees table.
WITH EmpCTE(empid, empname, mgrid, lvl)
AS
(
-- Anchor Member (AM)
SELECT empid, empname, mgrid, 0
FROM Employees
WHERE empid = 7
UNION ALL
-- Recursive Member (RM)
SELECT E.empid, E.empname, E.mgrid, M.lvl+1
FROM Employees AS E
JOIN EmpCTE AS M
ON E.mgrid = M.empid
)
SELECT * FROM EmpCTE
Finally,in order to resolve the problem you need that CTE and user table have different names. Something like that
WITH Batch_cte AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY Batch, Status) AS RowNum
FROM Batch
)
DELETE FROM Batch_cte
WHERE RowNum=1
Monday, April 4, 2011
MVP Award for 2011 year.
I just got my MVP renewal email from MS, so this is my 5th award for SQL MVP since 2007.
I would say thanks for all support and good communication with all MVP around the world and hopefully it could make me more motivated to give more contribution for the community
I would say thanks for all support and good communication with all MVP around the world and hopefully it could make me more motivated to give more contribution for the community
Wednesday, March 30, 2011
To someone who specializes in SQL Server performance tuning
Must read Conor's blog
http://blogs.msdn.com/b/conor_cunningham_msft/
http://blogs.msdn.com/b/conor_cunningham_msft/
Sunday, February 6, 2011
Alias issue in T-SQL or defensive programming
Recently I have talked to our developer who wanted to delete TOP x rows from the table. I pointed him to the below artcile http://blogs.msdn.com/b/sqlcat/archive/2009/05/21/fast-ordered-delete.aspx where a tip – a view with ORDER BY.
As alternative he wanted using a derived table but cannot understand why all rows are deleted from the table instead of TOP(x). See the below demo.
create table #t (c int)
insert into #t values (1)
insert into #t values (1)
insert into #t values (2)
insert into #t values (3)
insert into #t values (3)
delete #t from (select top (2) c
from t order by c) t
How does DELETE extension in T-SQL work?. The FROM clause after the DELETE specifies the target table to delete. The second optional FROM clause specifies the qualifying rows. But if I change 't' alias to '#t' as original name of the temporary table that would work...
delete #t from (select top (2) c
from t order by c) #t
Now, SQL Server 'sees' that derived table has the same name as a target and thus deletes only TOP(x) rows
There is no goal of this post to get into a discussion about how to write correlated subquery to perform such operations, I just wanted you to pay attention on if you choose using derived tables to perform deletion please make sure that alias you specify for derived table is the same as a target table..
PS. If you are testing and not sure about the result please use BEGIN TRAN... before executing the script.If you see that rows affected by the script is too many issue ROLLBACK TRAN to back to original data.
As alternative he wanted using a derived table but cannot understand why all rows are deleted from the table instead of TOP(x). See the below demo.
create table #t (c int)
insert into #t values (1)
insert into #t values (1)
insert into #t values (2)
insert into #t values (3)
insert into #t values (3)
delete #t from (select top (2) c
from t order by c) t
How does DELETE extension in T-SQL work?. The FROM clause after the DELETE specifies the target table to delete. The second optional FROM clause specifies the qualifying rows. But if I change 't' alias to '#t' as original name of the temporary table that would work...
delete #t from (select top (2) c
from t order by c) #t
Now, SQL Server 'sees' that derived table has the same name as a target and thus deletes only TOP(x) rows
There is no goal of this post to get into a discussion about how to write correlated subquery to perform such operations, I just wanted you to pay attention on if you choose using derived tables to perform deletion please make sure that alias you specify for derived table is the same as a target table..
PS. If you are testing and not sure about the result please use BEGIN TRAN... before executing the script.If you see that rows affected by the script is too many issue ROLLBACK TRAN to back to original data.
Tuesday, January 11, 2011
Dedup on huge table
Hi friends
At this time I would like to share with you my experience to delete duplicates in very large table(800 million of rows).
A very general method is to use ROW_NUMBER function to PARTITION ON desired columns and then filter out only unique data.
WITH cte
AS
(
SELECT,ROW_NUMBER() OVER (PARTITION BY ORDER BY ) rn
FROM tbl
) SELECT * FROM cte WHERE rn=1
As you imagine on huge table it will take too long. In order to optimize that query I used batch processing (divided that transaction into small chunks)
DECLARE @x INT
SET @x = 1
WHILE @x < 44,000,000 -- Set appropriately
BEGIN
;WITH cte
AS
(
SELECT,ROW_NUMBER() OVER (PARTITION BY ORDER BY ) rn
FROM tbl WHERE ID BETWEEN @x AND @x + 10000
)SELECT * FROM cte WHERE rn=1
SET @x = @x + 10000
END
Ok ,it worked much faster as we have a clustered index on ID column such as SQL Server uses it to get the data based on defined range.However, we have another problem that data we are getting back is not actually unique.You see that for specific range ,say (from 1 to 10000) I get the data based on required partition and filter out for rn=1, BUT it is possible that the same row will occur in the next chunk (from 10001 to 20000) and we will also get it back because SQL Server does not recognize it as duplicate we have already got from the first chunk.
More reliable solution is checking on entire table and not to base on ranges.
SELECT,COUNT(*) rn FROM
tbl GROUP BY
HAVING COUNT(*)>1
As you can see it could take long time ,so I also tried to create a Dedupt table with a key IGNORE_DUP_KEY option (thanks to Hugo) but insert into the table was pretty slow as well. Peter Larsson a fellow MVP has suggested the below technique that worked pretty well
CREATE TABLEe #unique (id primary key clustered)
INSERT INTO #temp (id) SELECT MIN(ID) AS ID
FROM tbl
GROUP BY Col1, col2, col3... (here you decide the uniqueness)
--And then insert into batches for
SET @id = 0
WHILE @id < (800 million or more)
BEGIN
SELECT t1.ID, t1.Col
FROM dbo.Table1 AS T1
INNER JOIN #unique as u ONu.id = t1.id
AND u.id BETWEEN @id AND @id +99999
SET @id += 100000
END
It would be great it put back here testing results.
At this time I would like to share with you my experience to delete duplicates in very large table(800 million of rows).
A very general method is to use ROW_NUMBER function to PARTITION ON desired columns and then filter out only unique data.
WITH cte
AS
(
SELECT
FROM tbl
) SELECT * FROM cte WHERE rn=1
As you imagine on huge table it will take too long. In order to optimize that query I used batch processing (divided that transaction into small chunks)
DECLARE @x INT
SET @x = 1
WHILE @x < 44,000,000 -- Set appropriately
BEGIN
;WITH cte
AS
(
SELECT
FROM tbl WHERE ID BETWEEN @x AND @x + 10000
)SELECT * FROM cte WHERE rn=1
SET @x = @x + 10000
END
Ok ,it worked much faster as we have a clustered index on ID column such as SQL Server uses it to get the data based on defined range.However, we have another problem that data we are getting back is not actually unique.You see that for specific range ,say (from 1 to 10000) I get the data based on required partition and filter out for rn=1, BUT it is possible that the same row will occur in the next chunk (from 10001 to 20000) and we will also get it back because SQL Server does not recognize it as duplicate we have already got from the first chunk.
More reliable solution is checking on entire table and not to base on ranges.
SELECT
tbl GROUP BY
HAVING COUNT(*)>1
As you can see it could take long time ,so I also tried to create a Dedupt table with a key IGNORE_DUP_KEY option (thanks to Hugo) but insert into the table was pretty slow as well. Peter Larsson a fellow MVP has suggested the below technique that worked pretty well
CREATE TABLEe #unique (id primary key clustered)
INSERT INTO #temp (id) SELECT MIN(ID) AS ID
FROM tbl
GROUP BY Col1, col2, col3... (here you decide the uniqueness)
--And then insert into batches for
SET @id = 0
WHILE @id < (800 million or more)
BEGIN
SELECT t1.ID, t1.Col
FROM dbo.Table1 AS T1
INNER JOIN #unique as u ONu.id = t1.id
AND u.id BETWEEN @id AND @id +99999
SET @id += 100000
END
It would be great it put back here testing results.
Subscribe to:
Posts (Atom)