Friday, November 29, 2013

SQL Tips for writing better TSQL query with performance in mind

A few months back, I had my chance to get my hands dirty with some old set of code written much before I even got into the IT space. My task was to add a bunch of new features on top the existing code base.  I had in fact spent a few hours looking at the exiting code (back-end in particular) and to my shock it had used things which will reduce query performance, increase complexity and attribute to poor maintainability. Over the time everybody ignored it until we started paying the price for the rush job done to get the product to market. The impact came mostly as customer complaints and not as audit wake up call and in either case we have to resolve it sooner.

 It’s often hard to convince the business owners the need to optimizing code for the future maintainability as from their stand point there is no return on investment in the short term. On the other hand it is much easier to educate the product maintenance team about the benefits of following some good guidelines which in most cases results in maintainable and optimized code. Here I want to articulate on the tips to improve the SQL performance just from a query perspective as in most cases the tuning is done using optimizing Indexes, reducing fragmentation, partitioning etc. The intention is to cover a few tips to follow while writing or modifying the existing stored procedures which may improve performance and reduce unwanted server resource consumption.

  1. Use the actual column names in SELECT statement instead of using ‘*' Symbol

  1. Use stored procedures or parameterised queries. (Facilitates SQL plan and data cache reuse)
  2. Suppress number of rows affected message
SET NOCOUNT ON
         4.  Keep Transact-SQL transactions as short as possible within a stored procedure.
  1. Try to use fixed length data types in Where clause.
  

  1. Offload tasks, like string manipulations, concatenations, row numbering, case conversions, and type conversions etc. to the front-end applications wherever possible.
  2. Avoid using Temporary tables and  where ever possible use table variables instead (table variables are not a good a candidate in transaction scenarios)
  3. Avoid searching using NOT equals operators (eg: NOT and <>) as they result in table and index scans instead of Seek.
  4. Perform all your referential integrity checks and data validations using constraints (foreign key and check constraints) instead of triggers
  5. Avoid dynamic SQL statements as much as possible. [use sp_executesql  instead of EXEC statement if dynamic sql use is unavoidable]
  6. Use outer joins sparingly or where ever really necessary; inner joins are more efficient.
  7. Use TRUNCATE instead of DELETE if you need to clear a whole table (Not possible if referential integrity constraints is present and mostly applicable for staging tables).
  8. Instead of using SELECT COUNT(*) from <table_name> try using (In OLAP)
SELECT rows FROM sysindexes  WHERE id = OBJECT_ID('<table_name>') AND indid < 2 
[Good for OLAP systems; in small OLTP Databases there is no need to trust the sysindexes]
  1. Prefer using EXISTS over OR , IN operators. [exceptions apply for smaller and limited subsets]
  2. If you have “>= and <= “operators or in case of some “IN” operations try to see if BETWEEN  can be used as its more efficient.
  3. Where possible, avoid string concatenation, as it is not a fast process.
  4. If possible, try to avoid using data conversion functions in the WHERE clause.
e.g. SELECT   * FROM Requests WHERE  CONVERT(VARCHAR(12),created_on,101)=CONVERT(VARCHAR(12),getdate()-360,101)
  1. A query with one or more OR clauses can sometimes be rewritten as a series of queries that are combined with a UNION ALL statement or using comma separated set (in SQL 2008 and above)
  2. Always use SQL-92  syntax instead of SQL-89

  1. Try to avoid using DDL (Data Definition Language) statements inside your stored procedure.
  2. Do not use HAVING clause for filtering on non-aggregate scenarios (use Where Clause)
  3. Try to avoid using SQL Server cursors, whenever possible.
  4. Use the select statements with TOP keyword or the SET ROWCOUNT statement, if you need to return only the first n rows.
  5. Try to use UNION ALL statement instead of UNION, whenever possible (union does remove duplicates hence its costly)
  6. Don't use the prefix "sp_" in the stored procedure name
  7. Use While loop instead of cursors.Also do not attempt recursive function calls as different SQL server had different default allowed recursion limits.
All the points mentioned are valid for SQL 2005 and above. There are some non-performance related SQL best practices that you can try when you are facing the SQL engine next time.

  1. Minimize the use of NULLs in the result set
Use ISNULL or COALESCE functions to deal with them
  1. All database objects should have a corresponding SQL script [Have schema scripts ready and stored in source control]
  2. Use upper case for all T-SQL key words.
  3. Give the prolog for  script/Stored Procedure with business logic
  4. Always Prefix DBO to the Stored Procedure name.(if indeed you are using default schema)
  5. Use user-defined functions to encapsulate code for reuse. (use this option carefully as it has the potential to create low performing query) 
Lastly do not forget to learn and harness the real power of Common table expressions. It is a great way to write clean and high performing code 

Monday, August 19, 2013

Create a custom web scrapper using html agility pack and XPath

Imagine you want to build a scrapper for a website the tools comes in handy are Html Agility pack and  a bit of knowledge on xpath.  Html Agility pack is a free to use HTML parser with very few dependencies and the main one is .Nets Xpath implementation.
The general implementation contains the following code to extract the html response from the url provided
var url= “http://en.wikipedia.org/wiki/Australia”;
var web = new HtmlWeb();
HtmlDocument responseHtmlDoc = web.Load(url);
//now start interrogating the htmlDocument using xpaths to get the data
responseHtmlDoc.DocumentNode.SelectSingleNode("//div[@class='test’]");

But now imagine you have to first post some data to retrieve the actual url to retrieve required data.  For example
You have to mimic the action of entering search criteria and button click to get set of urls  
Entering user and password and clicking on login button (that’s scary..) etc
If you are going for an asp.net website then it will have event validations and proper view state values set to perform the initial data post action. Let’s try to create a simple page scrapper which allows you to go against an asp.net forms based website and interrogate the html response to retrieve data you want.

The following steps will give an idea of how to implement custom scrapper

Step 1.    Download html agility pack from code plex [http://htmlagilitypack.codeplex.com/]. This will allow you to load the MSHTML,W3C HTML in an HtmlDocument data structure
Step 2.    Create a Custom WebClient class to make requests after manipulating cookie.
using System.Net;
internal class CookieAwareWebClient : WebClient
{
private CookieContainer cc = new CookieContainer();
private string _lastPage;

protected override WebRequest GetWebRequest(Uri address)
{
var r = base.GetWebRequest(address);
var wr = r as HttpWebRequest;
if (wr != null)
{
wr.CookieContainer = cc;
if (_lastPage != null)
{
wr.Referer = _lastPage;
}
}
_lastPage = address.ToString();
return r;
}

protected override WebResponse GetWebResponse(WebRequest request)
{
var response = base.GetWebResponse(request);
return response;
}
}

Step 3.    Inspect  the web pages viewstate value and event validation behaviors and write a viewstate value modification method
private static NameValueCollection GetViewState(string getResponse, string postcode)
{
if (string.IsNullOrEmpty(getResponse))return null;

var viewStateIndex = getResponse.IndexOf("__VIEWSTATE");
var eventValidationIndex = getResponse.IndexOf("__EVENTVALIDATION");
var collection = new NameValueCollection { { "__EVENTTARGET", "" },
{ "__EVENTARGUMENT", "" }, };
var viewState = getResponse.Substring(viewStateIndex + 37);
viewState = viewState.Substring(0, (viewState.IndexOf("/>") - 2));
collection.Add("__VIEWSTATE", viewState);
collection.Add("__VIEWSTATEENCRYPTED", "");
var eventValidation = getResponse.Substring((eventValidationIndex + 49));
eventValidation = eventValidation.Substring(0, eventValidation.IndexOf("/>") - 2);
collection.Add("__EVENTVALIDATION", eventValidation);
collection.Add("content_0$contentcolumnmain_0$txtPostcode", postcode);
collection.Add("content_0$contentcolumnmain_0$btnSearch", "Search");
return collection;
}
Step 4.    Let’s make a call to web URL and sent  the automated actions
using HtmlAgilityPack;
var webClient = new CookieAwareWebClient();
var uri = "http://someurl.aspx";
string getResponse = string.Empty;
using (StreamReader reader = new StreamReader(webClient.OpenRead(uri)))
{
getResponse = reader.ReadToEnd();
}
//call custom GetViewState method
var viewStateValues = this.GetViewState(getResponse, “mr. x”);
// Upload the NameValueCollection.
byte[] responseArray = webClient.UploadValues(uri , "POST", viewStateValues);
// Save the response string for future
var responseStringToBeStored = Encoding.ASCII.GetString(responseArray);


Now you can inspect the response “responseStringToBeStored” variable content and strip out links that can be used to scrape the page as mentioned in the beginning of the post.

Monday, June 3, 2013

Web Browser timezone specific date display in asp.net

It may sound like a simple requirement to display date and time localized to the logged in user but it gets really tricky if you have to implement it on an age old asp.net application.Applications built  with SQL Server 2005 and below with no time zone aware datatypes and data are mostly stored at the servers time zone (no UTC based storage adopted aswell). 

The problem statement was simple; though the server is situated in VICTORIA(AEST – Australian Eastern Standard Time) a user logging in from Perth (AWST – Australian Western Standard Time) should see the time-zone according to their browser time-zone. The server is located in Victoria and users around the world should be able to see the date-time according their web browser's time-zone. 

Lets see the problem with an example;
User from Sydney teleconference task scheduled at 10 am AEST(winter time). A user from Perth should see this as 8 am AWST but our good old system database will have the data stored as 10 am as server is in AEST time-zone.

There are two aspects to this problem, 
1. Getting the time zone information from the browser
2. Implement the time zone specific date and time displayed through out the system

This can be easily implemented by following the four steps
Step 1: Define a JavaScript function to create cookie to store browser time zone [TimezoneFinder.js file]


function setTimezoneCookie() {
    var timezone_cookie = "timezoneoffset";
    // if the timezone cookie not exists create one.
    if (!$.cookie(timezone_cookie)) { // check if the browser supports cookie       
        var test_cookie = 'test cookie';
        $.cookie(test_cookie, true);
        // browser supports cookie
        if ($.cookie(test_cookie)) {          
            $.cookie(test_cookie, null);  // delete the test cookie           
            $.cookie(timezone_cookie, new Date().getTimezoneOffset()); // create a new cookie
            location.reload();// re-load the page
        }
    }
    else {// if the current timezone and the one stored in cookie are different
          // then store the new timezone in the cookie and refresh the page.
        var storedOffset = parseInt($.cookie(timezone_cookie));
        var currentOffset = new Date().getTimezoneOffset();
        // user may have changed the timezone
        if (storedOffset !== currentOffset) {
            $.cookie(timezone_cookie, new Date().getTimezoneOffset());
            location.reload();
        }
    }

}

Step 2: Include script file and call the "setTimezoneCookie" function in the Master page
<script src="[path]/TimezoneFinder.js" type="text/javascript"></script>
<script type="text/javascript">
$(function () {
  setTimezoneCookie();
});

</script>

Step 3: Lets assume all the aspx codebehind files are inheriting from a common BasePage to allow pushing common UI specific logic. Now lets read the timezone cookie value and store it in a session variable.
protected override void OnPreLoad(EventArgs e)
{
  base.OnPreLoad(e);
 //Save the timezone information from cookie to a session variable
 Session["timezoneoffset"] = this.Request.Cookies.AllKeys.Contains("timezoneoffset",
s => s.ToLower())
this.Request.Cookies["timezoneoffset"].Value : null;

}

Step 4: Final step will be to create a bunch of utility methods to help with the conversion. Lets create some extension method to hide the detail and give a clean syntax for the calling code.
[TimezoneConvertExtensions class]
public static class TimezoneConvertExtensions
{
    private static int? _serverTimezoneOffset = null;
    private static int ServerTimezoneOffset
    {
        get
        {
            return _serverTimezoneOffset.HasValue
                        ? _serverTimezoneOffset.Value
                        : (_serverTimezoneOffset = GetServerTimezoneOffset()).Value;
        }
    }
    private static int GetServerTimezoneOffset()
    {
        var timeZone = TimeZone.CurrentTimeZone;
        var offset = timeZone.GetUtcOffset(DateTime.Now);

        return offset.Hours * 60 + offset.Minutes;
    }

    public static DateTime ToClientTime(this DateTime dt)
    {
        if (dt == DateTime.MinValue) return dt; 
// read the value from session         
        var timeOffSet = HttpContext.Current.Session["timezoneoffset"];
        if (timeOffSet != null)
        {
            var offset = int.Parse(timeOffSet.ToString()) + ServerTimezoneOffset;
            dt = dt.AddMinutes(-1 * offset);
            return dt;
        }// if there is no offset in session return the datetime in server timezone           
        return dt.ToLocalTime();
    }

    public static DateTime ToServerTime(this DateTime dt)
    {
        if (dt == DateTime.MinValue) return dt;
        var timeOffSet = HttpContext.Current.Session["timezoneoffset"];
        if (timeOffSet != null)
        {
            var offset = int.Parse(timeOffSet.ToString()) + ServerTimezoneOffset;
            dt = dt.AddMinutes(1 * offset);
            return dt;
        }
        return dt.ToLocalTime();
    }

}

How to Use in code:
By default all the data will be saved with respect to the server timezone (here AEST ) .In other words the created datetime of a record will be according to db server timezone. 

These set of records can be easily localised and displayed in the browsers timezone  using the following code
someDate.ToClientTime() 
e.g. test using DateTime.Now.ToClientTime()

On the other hand If the system wants to save the user entered date and time then received datetime has to be explicitly converted to server timezone for saving. The following call will convert the data to server timezone before saving. (for example a conference appointment time saved from AWST timezone has to be saved in AEST timezone in the server so that it makes sense for a browser request from AWST at later point in time)
 e.g. userEnteredDate.ToServerTime()