Friday, November 29, 2013

SQL Tips for writing better TSQL query with performance in mind

A few months back, I had my chance to get my hands dirty with some old set of code written much before I even got into the IT space. My task was to add a bunch of new features on top the existing code base.  I had in fact spent a few hours looking at the exiting code (back-end in particular) and to my shock it had used things which will reduce query performance, increase complexity and attribute to poor maintainability. Over the time everybody ignored it until we started paying the price for the rush job done to get the product to market. The impact came mostly as customer complaints and not as audit wake up call and in either case we have to resolve it sooner.

 It’s often hard to convince the business owners the need to optimizing code for the future maintainability as from their stand point there is no return on investment in the short term. On the other hand it is much easier to educate the product maintenance team about the benefits of following some good guidelines which in most cases results in maintainable and optimized code. Here I want to articulate on the tips to improve the SQL performance just from a query perspective as in most cases the tuning is done using optimizing Indexes, reducing fragmentation, partitioning etc. The intention is to cover a few tips to follow while writing or modifying the existing stored procedures which may improve performance and reduce unwanted server resource consumption.

  1. Use the actual column names in SELECT statement instead of using ‘*' Symbol

  1. Use stored procedures or parameterised queries. (Facilitates SQL plan and data cache reuse)
  2. Suppress number of rows affected message
SET NOCOUNT ON
         4.  Keep Transact-SQL transactions as short as possible within a stored procedure.
  1. Try to use fixed length data types in Where clause.
  

  1. Offload tasks, like string manipulations, concatenations, row numbering, case conversions, and type conversions etc. to the front-end applications wherever possible.
  2. Avoid using Temporary tables and  where ever possible use table variables instead (table variables are not a good a candidate in transaction scenarios)
  3. Avoid searching using NOT equals operators (eg: NOT and <>) as they result in table and index scans instead of Seek.
  4. Perform all your referential integrity checks and data validations using constraints (foreign key and check constraints) instead of triggers
  5. Avoid dynamic SQL statements as much as possible. [use sp_executesql  instead of EXEC statement if dynamic sql use is unavoidable]
  6. Use outer joins sparingly or where ever really necessary; inner joins are more efficient.
  7. Use TRUNCATE instead of DELETE if you need to clear a whole table (Not possible if referential integrity constraints is present and mostly applicable for staging tables).
  8. Instead of using SELECT COUNT(*) from <table_name> try using (In OLAP)
SELECT rows FROM sysindexes  WHERE id = OBJECT_ID('<table_name>') AND indid < 2 
[Good for OLAP systems; in small OLTP Databases there is no need to trust the sysindexes]
  1. Prefer using EXISTS over OR , IN operators. [exceptions apply for smaller and limited subsets]
  2. If you have “>= and <= “operators or in case of some “IN” operations try to see if BETWEEN  can be used as its more efficient.
  3. Where possible, avoid string concatenation, as it is not a fast process.
  4. If possible, try to avoid using data conversion functions in the WHERE clause.
e.g. SELECT   * FROM Requests WHERE  CONVERT(VARCHAR(12),created_on,101)=CONVERT(VARCHAR(12),getdate()-360,101)
  1. A query with one or more OR clauses can sometimes be rewritten as a series of queries that are combined with a UNION ALL statement or using comma separated set (in SQL 2008 and above)
  2. Always use SQL-92  syntax instead of SQL-89

  1. Try to avoid using DDL (Data Definition Language) statements inside your stored procedure.
  2. Do not use HAVING clause for filtering on non-aggregate scenarios (use Where Clause)
  3. Try to avoid using SQL Server cursors, whenever possible.
  4. Use the select statements with TOP keyword or the SET ROWCOUNT statement, if you need to return only the first n rows.
  5. Try to use UNION ALL statement instead of UNION, whenever possible (union does remove duplicates hence its costly)
  6. Don't use the prefix "sp_" in the stored procedure name
  7. Use While loop instead of cursors.Also do not attempt recursive function calls as different SQL server had different default allowed recursion limits.
All the points mentioned are valid for SQL 2005 and above. There are some non-performance related SQL best practices that you can try when you are facing the SQL engine next time.

  1. Minimize the use of NULLs in the result set
Use ISNULL or COALESCE functions to deal with them
  1. All database objects should have a corresponding SQL script [Have schema scripts ready and stored in source control]
  2. Use upper case for all T-SQL key words.
  3. Give the prolog for  script/Stored Procedure with business logic
  4. Always Prefix DBO to the Stored Procedure name.(if indeed you are using default schema)
  5. Use user-defined functions to encapsulate code for reuse. (use this option carefully as it has the potential to create low performing query) 
Lastly do not forget to learn and harness the real power of Common table expressions. It is a great way to write clean and high performing code