sqlBlog

Sunday, April 28, 2013

sqlBlog will be up

Hello

Last few months have been very hectic due to reading,travelling & some personal commitments.Here after sql blog makes a commitment to atleast publish one post per week. Catch you there.

Friday, August 27, 2010

SQL Deployment versus File System Deployment in SSIS

This is another common question we face many times "Should I store my SSIS Packages to File System or on SQL Server?"
There are two places you can store your packages: on the file system or in the msdb database. Each storage option has its own pros and cons and which option you choose will be based on what is more important to you. We’ll cover these pros and cons in much depth in this section but to summarize, the following table gives you a high-level idea of which storage option is best based on the what functionality you’re most interested in. Just because a given storage option is not checked, does not mean it doesn’t have that functionality. The ones checked just are most optimized for the given functionality.

If security concerns you greatly, you may want to consider placing your packages in the msdb database. To secure your packages on the file system, you could have multiple layers of security by using the Windows Active Directory security on the folder on the file system where the packages are at. You could also then place a password on the packages to keep users that may have administrator rights to your machine from executing the package. This does add extra complexity to your package deployments in some cases. If you store your packages in the msdb database, you can assign package roles to each package to designate who can see or execute the package. The packages can also be encrypted in the msdb database, which strengthens your security even more.
Backup and recovery is simpler with storing your packages in the msdb database. If you were to store your packages in the msdb database, then you must only wrap the msdb database into your regular maintenance plan to backup all the packages. As packages are added, they are wrapped into the maintenance plan. The problem with this is that you can restore a single package using this mechanism. You’d have to restore all the packages to a point in time, and that would also restore the jobs and history. The other option is a file system backup, which would just use your favorite backup software to backup the folders and files. If you did this, you must rely on your Backup Operator to do this for you, which makes some uneasy. You could though at that point, restore individual packages to a point in time. In reality, you may just go ahead and redeploy the packages from Source Safe if you couldn’t retrieve a backup file. File system deployments are much simpler but less sophisticated. To deploy packages onto the file system, you must only copy them into the directory for the package store. You can create subdirectories under the parent directory to subdivide it easily. You can also copy a single package over easily as well in case you need to make a package change. To import a package into the package store using the msdb database, you must use Management Studio (or a command-line tool called dtutil.exe) and import them package by package. To do a bulk migration, you could use the deployment utility.
Along the same lines as deployment is troubleshooting. If something were to go bump in the night and you wanted to see if the packages in production were the same release as the packages you thought you had deployed, you must only copy the files down to your machine and perform a comparison using Source Safe or another similar tool. If the files were stored in the msdb database, you would have to right-click on each package in Management Studio and select Export. If the same packages were stored in the file system, you must only copy the files to your machine. Availability of your packages is always on the top of the list for DBAs. If you were to store the packages in the msdb database and the database engine were to go down, the packages are unavailable. If they were stored in the file system, then your packages would be available for execution. Of course, if the database engine is down, then probably one of your key data sources would also be down at the time. The good news is no matter what storage option you choose the performance will be the same. As you can see there are many pros and cons to each storage option and neither overwhelmingly wins. The main reason that we choose to use the file system generally is for simplicity of deployment.

Functionality	Best in File System	Best in MSDB
Security		X
Backup and Recovery	X
Deployment	X
Troubleshooting	X
Execution Speed	X	X
Availability	X

Monday, September 1, 2008

Temporary tables versus dervied tables

Most of the times programmers go for creation of temporary tables in T Sql Programming for performance gains. It is true that temporary tables provide performance gains but the question is whether it is better to use derived tables in place of temporary tables. Which one is better?

Well most of you by now should know how to create a temporary table creation and derived table creation.

Derived tables are essentially SELECT statements within SELECT statements

For Example:

Select * from
(
Select col1, col2 from table1
)
A

Creation of temporary table follows the following sequence:

1. Creation of table in tempdb
2. Select the data and insert into the table
3. Select the data from temporary table and parent table
4. Drop the temporary table

Creation of derived table uses the following steps

1. Select the data
2. Select the parent data with the derived table data.

Hence the number of steps required while creating a temporary table are more and its an over head to drop the temporary table and release the resources. But with derived table the overhead is less and rewriting the temporary tables replacing them with derived tables would also avoid recompiling the stored procedures.

Let’s see an example as to how to avoid temporary tables

There Employee and Department table. Employee table has the structure
EmployeeID,EmployeeName,DepartmentID

Department table has the structure
DepartmentID,Department Name

The objective of the query is to give a count of employees in each department along with the employee name. So the output should look something like

Employee Department Count
Emp1 Dep1 3
Emp2 Dep1 3
Emp3 Dep1 3
Emp4 Dep2 2
Emp5 Dep2 2

Create table #t1
(
DepartmentID ,Count
)

Insert into #t1 (select departmentid,count(*) from employee group by departmentid)

Select EmployeeName, department.departmentname,#t1.count
from employee join department on employee.departmentid = department.departmentid
join #t1 on department.departmentid = #t1.departmentid

drop table #t1

Using derived tables the query can be rewritten as

Select EmployeeName, department.departmentname,a.cn
from employee join department on employee.departmentid = department.departmentid
join
( select departmentid,count(*) as cn from employee group by departmentid) a
on department.departmentid = a.departmentid

Sunday, August 10, 2008

Avoid Using Cursors

Always try to avoid cursors as they are an overhead to the engine. Instead try to write the same logic by using a temporary table or a table valued variable. This case is applicable to cases where the volume of data dealt is high. If the volume of data dealt is low try to use an inline table.

If the cursor cannot be avoided then check for the following scenarios

SQL Server offers you several different types of cursors, each with its different performance characteristics. Always select the cursor with the least amount of overhead that has the features you need to accomplish your goals. The most efficient cursor you can choose is the fast forward-only cursor.
When using a server-side cursor, always try to fetch as small a result set as possible. This includes fetching only those rows and columns the client needs immediately. The smaller the cursor, no matter what type of server-side cursor it is, the fewer resources it will use, and performance will benefit.
When you are done using a cursor, don’t just CLOSE it, you must also DEALLOCATE it. Deallocation is required to free up the SQL Server resources used by the cursor. If you only CLOSE the cursor, locks are freed, but SQL Server resources are not. If you don’t DEALLOCATE your cursors, the resources used by the cursor will stay allocated, degrading the performance of your server until they are released

Data Access

Performance Tuning can be applied in various areas of sql server like minimizing the number of calls over network, encapsulating the logic in a stored procedure which helps to use the query plan created by the sql engine or writing to multiple data files depending on requirement. This topic mostly covers tuning from the query analyzer perspective i.e. when you are executing the query in the query analyzer how do u look out for options in performance tuning.

Access the Data that is needed:

Do not access all the data available in the table. Instead access the data or columns which are required in the procedure or the logic.

Never deliberately write a code with SELECT * instead always try to use the columns which are necessary. Even if the logic does not need a column like u can write a code if the rows exist for a particular field as shown below

If exists (select * from table where field =

Begin

–Logic Here

End

Instead replace select * by select and include the column name which is a key in the clustered key index or the primary key depending upon the situation. This provides a better performance.

Always use filter criteria by which you can minimize the rows fetched in a table.

Few Tips

The first post would be a few tips to the sql server users

Just wanted to get started off with something.

So some sql server performance tips

Does your SQL statement have a WHERE clause?

I know this sounds obvious, but don’t retrieve more data than you need. However, less obvious is that even if your SELECT statement retrieves the same quantity of data without a WHERE clause, it may run faster with one.

Is SELECT DISTINCT being used properly?

Again, pretty obvious, but using SELECT DISTINCT where no duplicate records are being returned is an unnecessary performance hit. If you are getting duplicate records, first double check your table joins as this is often the cause and only use the DISTINCT clause if you really need it.

Are you using UNION instead of UNION ALL?

A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it is much quicker.

Are your stored procedures prefixed with ’sp_’?

Any stored procedures prefixed with ’sp_’ are first searched for in the Master database rather than the one it is created in. This will cause a delay in the stored procedure being executed.

Are all stored procedures referred to as dbo.sprocname?

When calling a stored procedure you should include the owner name in the call, i.e. use EXEC dbo.spMyStoredProc instead of EXEC spMyStoredProc.

Prefixing the stored procedure with the owner when executing it will stop SQL Server from placing a COMPILE lock on the procedure while it determines if all objects referenced in the code have the same owners as the objects in the current cached procedure plan.

Are you using temporary tables when you don’t need to?

Although there is sometimes a benefit of using temporary tables, generally they are best eliminated from your stored procedure. Don’t assume that retrieving data multiple times is always less efficient than getting the data once and storing it in temporary table as often it isn’t. Consider using a sub-query or derived table instead of a temporary table (see examples below). If you are using a temporary table in lots of JOINS in you stored procedure and it contains loads of data, it might be beneficial to add an index to your temporary table as this may also improve performance.

An example of a derived table instead of a temporary table

SELECT COLUMN1, COLUMN2, COUNTOFCOL3
FROM A_TABLE A
INNER JOIN (SELECT COUNT(COLUMN3) AS COUNTOFCOL3, COLUMN2
FROM B_TABLE B
INNER JOIN C_TABLE C ON B.ID = C.ID) ON A.ID = B.ID

Are you using Cursors when you don’t need to?

Cursors of any kind slow down SQL Server’s performance. While in some cases they are unavoidable, often there are ways to remove them from your code.

Consider using any of these options instead of using a cursor as they are all faster:

Derived tables
Sub-queries
CASE statements
Multiple queries
Temporary tables

Are your Transactions being kept as short as possible?

If you are use SQL transactions, try to keep them as short as possible. This will help db performance by reducing the number of locks. Remove anything that doesn’t specifically need to be within the transaction like setting variables, select statements etc.

Is SET NO COUNT ON being used?

By default, every time a stored procedure is executed, a message is sent from the server to the client indicating the number of rows that were affected by the stored procedure. You can reduce network traffic between the server and the client if you don’t need this feature by adding SET NO COUNT ON at the beginning of your stored procedure.

Are you using IN or NOT IN when you should be using EXISTS or NOT EXISTS?

If you are using IN or NOT IN in a WHERE clause that contains a sub-query you should re-write it to use either EXISTS, NOT EXISTS or perform a LEFT OUTER JOIN. This is because particularly the NOT IN statement offers really poor performance. The example below probably better explains what I mean:

e.g. This SQL statement:

SELECT A_TABLE.COLUMN1
FROM A_TABLE
WHERE A_TABLE.COLUMN2 NOT IN (SELECT A_TABLE2.COLUMN2
FROM A_TABLE2)

Could be re-written like this:

SELECT A_TABLE.COLUMN1
FROM A_TABLE
WHERE NOT EXISTS (SELECT A_TABLE2.COLUMN2
FROM A_TABLE2
WHERE A_TABLE.COLUMN2 = A_TABLE2.COLUMN2)

Do you have a function that acts directly on a column used in a WHERE clause?

If you apply a function to a column used in the WHERE clause of your SQL statement, it is unlikely that the SQL statement will be able to make use of any indexes applied to that column.

e.g.

SELECT A_TABLE.LASTNAME
FROM A_TABLE
WHERE SUBSTRING (FIRSTNAME,1,1) = ‘m’

Could be re-written:

SELECT A_TABLE.LASTNAME
FROM A_TABLE
WHERE FIRSTNAME LIKE = ‘m%’

Where you have a choice of using the IN or BETWEEN clauses

Use the BETWEEN clause as it is much more efficient

e.g. This SQL statement:

SELECT A_TABLE.NAME
FROM A_TABLE
WHERE A_TABLE.NUMBER IN (100, 101, 102, 103)

Should be re-written like this:

SELECT A_TABLE.NAME
FROM A_TABLE
WHERE A_TABLE.NUMBER BETWEEN 100 AND 103

Are you doing excessive string concatenation in your stored procedure?

Where possible, avoid doing loads of string concatenation as it is not a fast process in SQL Server.

Have you checked the order of WHERE clauses when using AND?

If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written (assuming that no parenthesis have been used to change the order of execution). You may want to consider one of the following when using AND:

Locate the least likely true AND expression first. This way, if the AND expression is false, the clause will end immediately, saving time.
If both parts of an AND expression are equally likely being false, put the least complex AND expression first. This way, if it is false, less work will have to be done to evaluate the expression.

Have you checked that you are using the most efficient operators?

Often you don’t have much of a choice of which operator you use in your SQL statement. However, sometimes there is an alternative way to re-write your SQL statement to use a more efficient operator. Below is a list of operators in their order of performance (with the most efficient first).

=
>, >=, <, <=
LIKE
<>