Building a simple dynamic site map

A while ago I worked on a project to improve a website indexing within search engines. The project was initiated because there was a noticeable usage downward trend for the site. The main reason was the site was not being listed within relevant searches by search engines. This was because our content didn't emphasize the keywords within the search due to our HTML layout; URL structure and crawling errors.

One aspect of the project was to improve the site layout and provide more information publicly.  The layout improvements were: to include the use of headers to highlight search keywords; a simpler layout to reduce HTML errors.  Providing more information was to help increase coverage of keywords.  Another improvement made to the site was the restructuring of the site URLs.  By making the URLs more human readable and converting the site search functionality to use ASP.Net routes.

A site map was introduced to enable to help the search providers index our site of our static and searchable content. More information about site maps, which Google, Bing and Yahoo supports, can be found at the following link: http://www.sitemaps.org/.

I needed to find a method of creating a site map that would only list the valid search URLs available based upon the content within our database.  The approach I took was to get the SQL Server to generate the XML for the site map and to have an ASP.Net page that would stream the XML out.  I would like to start off with the SQL that I use to create the site map XML using the following elements of TSQL:
  • WITH XMLNAMESPACES (DEFAULT 'http://www.sitemaps.org/schemas/sitemap/0.9'): to set the name space of the XML document.
  • FOR XML PATH('url'): for XML operator also setting the parent element name.
  • ROOT('urlset'): for creating the root node for a valid XML document.
Full Code:

   1:  ;WITH XMLNAMESPACES (DEFAULT 'http://www.sitemaps.org/schemas/sitemap/0.9')
   2:  , SiteMapURLS(URL, changefreq, LastUpdated, PriorityWieght) -- Name Query to get the site URL and the last Date when for URL was last Changed
   3:  AS
   4:  (
   5:  SELECT [ta].[URL] -- URL patten like 'http://www.mysite.co.uk/' + [ta].[URLPart] +'/' 
   6:  , 'daily' --or one others: always, hourly, weekly, monthly, yearly, never
   7:  , [ta].[DateField] -- MAX(DateField) to get the last date
   8:  , 0.5 -- Valid values range from 0.0 to 1.0
   9:  FROM TableA AS ta
  10:  -- Group By [ta].[URL] OR URL patten like 'http://www.mysite.co.uk/' + [ta].[URLPart] +'/' 
  11:  ) 
  12:  SELECT [smURL].[URL] AS 'loc' -- creates the following: <loc> 
  13:  , REPLACE(CONVERT(varchar,ISNULL([smURL].[LastUpdated],'2013.01.01'),102),'.','-') AS 'lastmod' -- creates the following: <lastmod> and corrects the date format
  14:  , [smURL].[changefreq]) AS '<lastmod>' -- to create the following: <changefreq>
  15:  , CAST(ROUND([smURL].[PriorityWieght],1) AS DECIMAL(2,1)) AS 'priority' -- to create the following <priority>
  16:  FROM SiteMapURLS AS smURL
  17:  FOR XML PATH('url'), ROOT('urlset');

Within the website I created a generic handler called sitemap.ashx. This page would then stream the content of the stored procedure.

While the site was being updated it was decided to improve the data access layer by using Entity Framework. In order to make sure that the site had a consistent data access I made use of the Entity Framework. The following code shows how I managed to stream the XML content out of the database:

   1:  Public Sub ProcessRequest(context As HttpContext) :IHttpHandler.ProcessRequest
   2:      context.Response.ContentType = "text/xml"
   3:                        
   4:      //Getting the Connection from entity framework
   5:      Using sqlConnection As SqlConnection = DbContext.Database.Connection
   6:          Using sqlCommand As New SqlCommand("website.csp_GetWebSiteMap" , sqlConnection)
   7:              sqlCommand.CommandType = CommandType.StoredProcedure
   8:              sqlCommand.Connection.Open()
   9:              Using xmlReader As XmlReader = sqlCommand.ExecuteXmlReader()
  10:                  Dim xdocument As XDocument = xdocument.Load(xmlReader)
  11:                  context.Response.Write(xdocument)
  12:                  xmlReader.Close()
  13:                  sqlCommand.Connection.Close()
  14:              End Using
  15:          End Using
  16:      End Using
  17:  End Sub

To enable the search crawlers to find our site map I created a robot.txt file. This file also helps to restrict URLs which you don't want crawlers to access like URLs which require authenticated user.

There is one caveat to this approach which is the URLs which are created in SQL Server do not need to be encoded, which was the case for me. If the URLs you have need to be encoded you would need to create a simple SQL CLR function that would make a call the following method(http://msdn.microsoft.com/en-us/library/ee360286(v=vs.110).aspx).

Comments

Popular posts from this blog

SSIS File handle leak in For Each Loop

Azure SQL Server IaaS SSIS: A required privilege is not held by the client

Challenges of move SQL Server data to Microsoft Azure SQL Server PaaS