One aspect of the project was to improve the site layout and provide more information publicly. The layout improvements were: to include the use of headers to highlight search keywords; a simpler layout to reduce HTML errors. Providing more information was to help increase coverage of keywords. Another improvement made to the site was the restructuring of the site URLs. By making the URLs more human readable and converting the site search functionality to use ASP.Net routes.
A site map was introduced to enable to help the search providers index our site of our static and searchable content. More information about site maps, which Google, Bing and Yahoo supports, can be found at the following link: http://www.sitemaps.org/.
I needed to find a method of creating a site map that would only list the valid search URLs available based upon the content within our database. The approach I took was to get the SQL Server to generate the XML for the site map and to have an ASP.Net page that would stream the XML out. I would like to start off with the SQL that I use to create the site map XML using the following elements of TSQL:
- WITH XMLNAMESPACES (DEFAULT 'http://www.sitemaps.org/schemas/sitemap/0.9'): to set the name space of the XML document.
- FOR XML PATH('url'): for XML operator also setting the parent element name.
- ROOT('urlset'): for creating the root node for a valid XML document.
1: ;WITH XMLNAMESPACES (DEFAULT 'http://www.sitemaps.org/schemas/sitemap/0.9')
2: , SiteMapURLS(URL, changefreq, LastUpdated, PriorityWieght) -- Name Query to get the site URL and the last Date when for URL was last Changed
5: SELECT [ta].[URL] -- URL patten like 'http://www.mysite.co.uk/' + [ta].[URLPart] +'/'
6: , 'daily' --or one others: always, hourly, weekly, monthly, yearly, never
7: , [ta].[DateField] -- MAX(DateField) to get the last date
8: , 0.5 -- Valid values range from 0.0 to 1.0
9: FROM TableA AS ta
10: -- Group By [ta].[URL] OR URL patten like 'http://www.mysite.co.uk/' + [ta].[URLPart] +'/'
12: SELECT [smURL].[URL] AS 'loc' -- creates the following: <loc>
13: , REPLACE(CONVERT(varchar,ISNULL([smURL].[LastUpdated],'2013.01.01'),102),'.','-') AS 'lastmod' -- creates the following: <lastmod> and corrects the date format
14: , [smURL].[changefreq]) AS '<lastmod>' -- to create the following: <changefreq>
15: , CAST(ROUND([smURL].[PriorityWieght],1) AS DECIMAL(2,1)) AS 'priority' -- to create the following <priority>
16: FROM SiteMapURLS AS smURL
17: FOR XML PATH('url'), ROOT('urlset');
Within the website I created a generic handler called sitemap.ashx. This page would then stream the content of the stored procedure.
While the site was being updated it was decided to improve the data access layer by using Entity Framework. In order to make sure that the site had a consistent data access I made use of the Entity Framework. The following code shows how I managed to stream the XML content out of the database:
1: Public Sub ProcessRequest(context As HttpContext) :IHttpHandler.ProcessRequest
2: context.Response.ContentType = "text/xml"
4: //Getting the Connection from entity framework
5: Using sqlConnection As SqlConnection = DbContext.Database.Connection
6: Using sqlCommand As New SqlCommand("website.csp_GetWebSiteMap" , sqlConnection)
7: sqlCommand.CommandType = CommandType.StoredProcedure
9: Using xmlReader As XmlReader = sqlCommand.ExecuteXmlReader()
10: Dim xdocument As XDocument = xdocument.Load(xmlReader)
14: End Using
15: End Using
16: End Using
17: End Sub
To enable the search crawlers to find our site map I created a robot.txt file. This file also helps to restrict URLs which you don't want crawlers to access like URLs which require authenticated user.
There is one caveat to this approach which is the URLs which are created in SQL Server do not need to be encoded, which was the case for me. If the URLs you have need to be encoded you would need to create a simple SQL CLR function that would make a call the following method(http://msdn.microsoft.com/en-us/library/ee360286(v=vs.110).aspx).