Friday, 24 June 2011

A Complete URL Rewriting Solution for ASP.NET 2.0

This article uses regular expressions to specify rewriting rules and resolves possible difficulties with postback from pages accessed via virtual URLs.

Why use URL rewriting?

The two main reasons to incorporate URL rewriting capabilities into your ASP.NET applications are usability and maintainability.



Usability

It is well-known that users of web applications prefer short, neat URLs to monstrous addresses packed with difficult to comprehend query string parameters. From time to time, being able to remember and type in a concise URL is less time-consuming than adding the page to a browser's favorites to access later. Again, when access to a browser's favorites is unavailable, it can be more convenient to type in the URL of a page on the browser address bar, without having to remember a few keywords and type them into a search engine in order to find the page.
Compare the following two addresses and decide which one you like more:
  1. http://www.somebloghost.com/Blogs/Posts.aspx?Year=2006&Month=12&Day=10
  2. http://www. somebloghost.com/Blogs/2006/12/10/
The first URL contains query string parameters to encode the date for which some blog engines should show available postings. The second URL contains this information in the address, giving the user a clear idea of what he or she is going to see. The second address also allows the user to hack the URL to see all postings available in December, simply by removing the text encoding the day '10': http://www.somehost.com/Blogs/2006/12/.

Maintainability

In large web applications, it is common for developers to move pages from one directory to another. Let us suppose that support information was initially available at http://www.somebloghost.com/Info/Copyright.aspx and http://www.somebloghost.com/Support/Contacts.aspx, but at a later date the developers moved the Copyright.aspx and Contacts.aspx pages to a new folder called Help. Users who have bookmarked the old URLs need to be redirected to the new location. This issue can be resolved by adding simple dummy pages containing calls to Response.Redirect(new location). However, what if there are hundreds of moved pages all over the application directory? The web project will soon contain too many useless pages that have the sole purpose of redirecting users to a new location.
Enter URL rewriting, which allows a developer to move pages between virtual directories just by editing a configuration file. In this way, the developer can separate the physical structure of the website from the logical structure available to users via URLs.

Native URL mapping in ASP.NET 2.0

ASP.NET 2.0 provides an out-of-the-box solution for mapping static URLs within a web application. It is possible to map old URLs to new ones in web.config without writing any lines of code. To use URL mapping, just create a new urlMappings section within the system.web section of your web.config file and add the required mappings (the path ~/ points to the root directory of the web application):
<urlMappings enabled="true">
   <add url="~/Info/Copyright.aspx" mappedUrl="~/Help/Copyright.aspx" />
   <add url="~/Support/Contacts.aspx" mappedUrl="~/Help/Contacts.aspx" />
</urlMappings>
Thus, if a user types http://www.somebloghost.com/Support/Contacts.aspx, he can then see the page located at http://www.somebloghost.com/Help/Contacts.aspx, without even knowing the page had been moved.
This solution is fine if you have only two pages that have been moved to other locations, but it is completely unsuitable where there are dozens of re-located pages, or where a really neat URL needs to be created.
Another possible disadvantage of the native URL mapping technique is that if the page Contacts.aspx contains elements initiating postback to the server (which is most probable), then the user will be surprised that the URL http://www.somebloghost.com/Support/Contacts.aspx changes to http://www.somebloghost.com/Help/Contacts.aspx. This happens because the ASP.NET engine fills the action attribute of the form HTML tag with the actual path to a page. So the form renders like this:
<form name="formTest" method="post" action="http://www.simple-talk.com/Help/Contacts.aspx" id="formTest">
</form>
Thus, URL mapping available in ASP.NET 2.0 is almost always useless. It would be much better to be able to specify a set of similar URLs in one mapping rule. The best solution is to use Regular Expressions (for overview see Wikipedia and for implementation in .NET see MSDN), but an ASP.NET 2.0 mapping does not support regular expressions. We therefore need to develop a different solution to built-in URL mapping.

The URL rewriting module 

The best way to implement a URL rewriting solution is to create reusable and easily configurable modules, so the obvious decision is to create an HTTP Module (for details on HTTP Modules see MSDN Magazine) and implement it as an individual assembly. To make this assembly as easy to use as possible, we need to implement the ability to configure the rewrite engine and specify rules in a web.config file.
During the development process we need to be able to turn the rewriting module on or off (for example if you have a bug that is difficult to catch, and which may have been caused by incorrect rewriting rules). There should, therefore, be an option in the rewriting module configuration section in web.config to turn the module on or off. So, a sample configuration section within web.config can go like this:
<rewriteModule>
  <rewriteOn>true</rewriteOn>
  <rewriteRules>
      <rule source="(\d+)/(\d+)/(\d+)/"
        
destination="Posts.aspx?Year=$1&amp;Month=$2&amp;Day=$3"/>
      <rule source="(.*)/Default.aspx"
        
destination="Default.aspx?Folder=$1"/>
  </rewriteRules>
</rewriteModule>
This means that all requests that run like: http://localhost/Web/2006/12/10/ should be internally redirected to the page Posts.aspx with query string parameters.
Please note that web.config is a well-formed XML file, and it is prohibited to use the symbol & in attribute value strings. In this case, you should use &amp; instead in the destination attribute of the rule element.
To use the rewriteModule section in the web.config file, you need to register a section name and a section handler for this section. To do this, add a configSections section to web.config:
 <configSections>
    <sectionGroup name="modulesSection">
      <section name="rewriteModule" type="RewriteModule.
RewriteModuleSectionHandler, RewriteModule
"/>
    </sectionGroup>
  </configSections>
This means you may use the following section below the configSections section:
<modulesSection>
    <rewriteModule>
      <rewriteOn>true</rewriteOn>
      <rewriteRules>
              <rule source="(\d+)/(\d+)/(\d+)/"
destination="Post.aspx?Year=$1&amp;Month=$2&amp;Day=$3"/>
              <rule source="(.*)/Default.aspx"
destination="Default.aspx?Folder=$1"/>
      </rewriteRules>
    </rewriteModule>
  </modulesSection>
Another thing we have to bear in mind during the development of the rewriting module is that it should be possible to use 'virtual' URLs with query string parameters, as shown in the following: http://www.somebloghost.com/2006/12/10/?Sort=Desc&SortBy=Date. Thus we have to develop a solution that can detect parameters passed via query string and also via virtual URL in our web application.

So, let’s start by building a new Class Library. We need to add a reference to the System.Web assembly, as we want this library to be used within an ASP.NET application and we also want to implement some web-specific functions at the same time. If we want our module to be able to read web.config, we need to add a reference to the System.Configuration assembly.

Handling the configuration section

To be able to read the configuration settings specified in web.config, we have to create a class that implements the IConfigurationSectionHandler interface (see MSDN for details). This can be seen below:
using System;
using System.Collections.Generic;
using System.Text;
using System.Configuration;
using System.Web;
using System.Xml;


namespace RewriteModule
{
    public class RewriteModuleSectionHandler : IConfigurationSectionHandler
    {

        private XmlNode _XmlSection;
        private string _RewriteBase;
        private bool _RewriteOn;

        public XmlNode XmlSection
        {
            get { return _XmlSection; }
        }

        public string RewriteBase
        {
            get { return _RewriteBase; }
        }

        public bool RewriteOn
        {
            get { return _RewriteOn; }
        }
        public object Create(object parent,
                            object configContext,
                            System.Xml.XmlNode section)
        {
            // set base path for rewriting module to
            // application root
            _RewriteBase = HttpContext.Current.Request.ApplicationPath + "/";

            // process configuration section
            // from web.config
            try
            {
                _XmlSection = section;
                _RewriteOn = Convert.ToBoolean(
                            section.SelectSingleNode("rewriteOn").InnerText);
            }
            catch (Exception ex)
            {
                throw (new Exception("Error while processing RewriteModule
configuration section."
, ex));
            }
            return this;
        }
    }
}
The Class RewriteModuleSectionHandler will be initialized by calling the Create method with the rewriteModule section of web.config passed as XmlNode. The SelectSingleNode method of the XmlNode class is used to return values for module settings.

Using parameters from rewritten URL

When handling virtual URLS such as http://www. somebloghost.com/Blogs/gaidar/?Sort=Asc (that is, a virtual URL with query string parameters), it is important that you clearly distinguish parameters that were passed via a query string from parameters that were passed as virtual directories. Using the rewriting rules specified below:
<rule source="(.*)/Default.aspx" destination="Default.aspx?Folder=$1"/>,
You can use the following URL:
http://www. somebloghost.com/gaidar/?Folder=Blogs
...and the result will be the same as if you used this URL:
http://www. somebloghost.com/Blogs/gaidar/
To resolve this issue, we have to create some kind of wrapper for 'virtual path parameters'. This could be a collection with a static method to access the current parameters set:
using System;
using System.Collections.Generic;
using System.Text;
using System.Collections.Specialized;
using System.Web;

namespace RewriteModule
{

    public class RewriteContext
    {
        // returns actual RewriteContext instance for
        // current request
        public static RewriteContext Current
        {
            get
            {
                // Look for RewriteContext instance in
                // current HttpContext. If there is no RewriteContextInfo
                // item then this means that rewrite module is turned off
                if(HttpContext.Current.Items.Contains("RewriteContextInfo"))
                    return (RewriteContext)
HttpContext.Current.Items["RewriteContextInfo"];
                else
                    return new RewriteContext();
            }
        }

        public RewriteContext()
        {
            _Params = new NameValueCollection();
            _InitialUrl = String.Empty;
        }

        public RewriteContext(NameValueCollection param, string url)
        {
            _InitialUrl = url;
            _Params = new NameValueCollection(param);
           
        }

        private NameValueCollection _Params;

        public NameValueCollection Params
        {
            get { return _Params; }
            set { _Params = value; }
        }

        private string _InitialUrl;

        public string InitialUrl
        {
            get { return _InitialUrl; }
            set { _InitialUrl = value; }
        }
    }
}
You can see from the above that it is possible to access 'virtual path parameters' via the RewriteContext.Current collection and be sure that those parameters were specified in the URL as virtual directories or pages names, and not as query string parameters.

Rewriting URLs

Now let's try some rewriting. First, we need to read rewriting rules from the web.config file. Secondly, we need to check the actual URL against the rules and, if necessary, do some rewriting so that the appropriate page is executed.
We create an HttpModule:
class RewriteModule : IHttpModule{
public void Dispose() { }
public void Init(HttpApplication context)
{}
}
When adding the RewriteModule_BeginRequest method that will process the rules against the given URL, we need to check if the given URL has query string parameters and call HttpContext.Current.RewritePath to give control over to the appropriate ASP.NET page.
using System;
using System.Collections.Generic;
using System.Text;
using System.Web;
using System.Configuration;
using System.Xml;
using System.Text.RegularExpressions;
using System.Web.UI;
using System.IO;
using System.Collections.Specialized;

namespace RewriteModule
{
    class RewriteModule : IHttpModule
    {

        public void Dispose() { }

        public void Init(HttpApplication context)
        {
            // it is necessary to
            context.BeginRequest += new EventHandler(
                 RewriteModule_BeginRequest);
        }

        void RewriteModule_BeginRequest(object sender, EventArgs e)
        {

            RewriteModuleSectionHandler cfg =
(RewriteModuleSectionHandler)
ConfigurationManager.GetSection
("modulesSection/rewriteModule");

            // module is turned off in web.config
            if (!cfg.RewriteOn) return;

            string path = HttpContext.Current.Request.Path;

            // there us nothing to process
            if (path.Length == 0) return;

            // load rewriting rules from web.config
            // and loop through rules collection until first match
            XmlNode rules = cfg.XmlSection.SelectSingleNode("rewriteRules");
            foreach (XmlNode xml in rules.SelectNodes("rule"))
            {
                try
                {
                    Regex re = new Regex(
                     cfg.RewriteBase + xml.Attributes["source"].InnerText,
                     RegexOptions.IgnoreCase);
                    Match match = re.Match(path);
                    if (match.Success)
                    {
                        path = re.Replace(
                             path,
                             xml.Attributes["destination"].InnerText);
                        if (path.Length != 0)
                        {
                            // check for QueryString parameters
                       if(HttpContext.Current.Request.QueryString.Count != 0)
                       {
                       // if there are Query String papameters
                       // then append them to current path
                       string sign = (path.IndexOf('?') == -1) ? "?" : "&";
                       path = path + sign +
                          HttpContext.Current.Request.QueryString.ToString();
                       }
                       // new path to rewrite to
                       string rew = cfg.RewriteBase + path;
                       // save original path to HttpContext for further use
                       HttpContext.Current.Items.Add(
                         "OriginalUrl",
                         HttpContext.Current.Request.RawUrl);
                       // rewrite
                       HttpContext.Current.RewritePath(rew);
                       }
                       return;
                    }
                }
                catch (Exception ex)
                {
                    throw (new Exception("Incorrect rule.", ex));
                }
            }
            return;
        }

    }
}

We must then register this method:
public void Init(HttpApplication context)
{
 context.BeginRequest += new EventHandler(RewriteModule_BeginRequest);
}
But this is just half of the road we need to go down, because the rewriting module should handle a web form's postbacks and populate a collection of 'virtual path parameters'. In the given code you will not find a part that does this task. Let's put 'virtual path parameters' aside for a moment. The main thing here is to handle postbacks correctly.
If we run the code above and look through the HTML source of the ASP.NET page for an action attribute of the form tag, we find that even a virtual URL action attribute contains a path to an actual ASP.NET page. For example, if we are using the page ~/Posts.aspx to handle requests like:

http://www. somebloghost.com/Blogs/2006/12/10/Default.aspx
,

...we find the action="/Posts.aspx". This means that the user will be using not the virtual URL on postback, but the actual one: http://www. somebloghost.com/Blog.aspx. This is not what we want to use here! So, a few more lines of code are required to achieve the desired result.
First, we must register and implement one more method in our HttpModule:
        public void Init(HttpApplication context)
        {
            // it is necessary to
            context.BeginRequest += new EventHandler(
                 RewriteModule_BeginRequest);
            context.PreRequestHandlerExecute += new EventHandler(
                 RewriteModule_PreRequestHandlerExecute);
        }

      void RewriteModule_PreRequestHandlerExecute(object sender, EventArgs e)
        {
            HttpApplication app = (HttpApplication)sender;
            if ((app.Context.CurrentHandler is Page) &&
                 app.Context.CurrentHandler != null)
            {
                Page pg = (Page)app.Context.CurrentHandler;
                pg.PreInit += new EventHandler(Page_PreInit);
            }
        }
This method checks if the user requested a normal ASP.NET page and adds a handler for the PreInit event of the page lifecycle. This is where RewriteContext will be populated with actual parameters and a second URL rewriting will be performed. The second rewriting is necessary to make ASP.NET believe it wants to use a virtual path in the action attribute of an HTML form.
void Page_PreInit(object sender, EventArgs e)
        {
            // restore internal path to original
            // this is required to handle postbacks
            if (HttpContext.Current.Items.Contains("OriginalUrl"))
            {
              string path = (string)HttpContext.Current.Items["OriginalUrl"];

              // save query string parameters to context
              RewriteContext con = new RewriteContext(
                HttpContext.Current.Request.QueryString, path);

              HttpContext.Current.Items["RewriteContextInfo"] =  con;

              if (path.IndexOf("?") == -1)
                  path += "?";
              HttpContext.Current.RewritePath(path);
            }
        }
Finally, we see three classes in our RewriteModule assembly:

Registering RewriteModule in web.config

To use RewriteModule in a web application, you should add a reference to the rewrite module assembly and register HttpModule in the web application web.config file. To register HttpModule, open the web.config file and add the following code into the system.web section:
<httpModules>
<add name="RewriteModule" type="RewriteModule.RewriteModule, RewriteModule"/>
</httpModules>

Using RewriteModule

There are a few things you should bear in mind when using RewriteModule:
  • It is impossible to use special characters in a well-formed XML document which is web.config in its nature. You should therefore use HTML-encoded symbols instead. For example, use &amp; instead of &.
  • To use relative paths in your ASPX pages, you should call the ResolveUrl method inside HTML tags: <img src="<%=ResolveUrl("~/Images/Test.jpg")%>" />. Note, that ~/ points to the root directory of a web application.
  • Bear in mind the greediness of regular expressions and put rewriting rules to web.config in order of their greediness, for instance
<rule source="Directory/(.*)/(.*)/(.*)/(.*).aspx"
destination="Directory/Item.aspx?
Source=$1&amp;Year=$2&amp;ValidTill=$3&amp;Sales=$4"/>
<rule source="Directory/(.*)/(.*)/(.*).aspx"
destination="Directory/Items.aspx?
Source=$1&amp;Year=$2&amp;ValidTill=$3"/>
<rule source="Directory/(.*)/(.*).aspx"
destination="Directory/SourceYear.aspx?
Source=$1&amp;Year=$2&amp;"/>
<rule source="Directory/(.*).aspx"
destination="Directory/Source.aspx?Source=$1"/>
  • If you would like to use RewriteModule with pages other than .aspx, you should configure IIS to map requests to pages with the desired extensions to ASP.NET runtime as described in the next section.

IIS Configuration: using RewriteModule with extensions other than .aspx

To use a rewriting module with extensions other than .aspx (for example, .html or .xml), you must configure IIS so that these file extensions are mapped to the ASP.NET engine (ASP.NET ISAPI extension). Note that to do so, you have to be logged in as an Administrator.
Open the IIS Administration console and select a virtual directory website for which you want to configure mappings.
Windows XP (IIS 5)Virtual Directory "RW"                                

Windows 2003 Server (IIS 6)
Default Web Site

Then click the Configuration… button on the Virtual Directory tab (or the Home Directory tab if you are configuring mappings for the website).
Windows XP (IIS 5)                     

Windows 2003 Server (IIS 6)

Next, click on the Add button and type in an extension. You also need to specify a path to an ASP.NET ISAPI Extension. Don't forget to uncheck the option Check that file exists.

If you would like to map all extensions to ASP.NET, then for IIS 5 on Windows XP you have only to map .* extension to the ASP.NET ISAPI extension. But for IIS 6 on Windows 2003 you have to do it in a slightly different way: click on the Insert… button instead of the Add… button, and specify a path to the ASP.NET ISAPI extension.

Conclusions

Now we have built a simple but very powerful rewriting module for ASP.NET that supports regular expressions-based URLs and page postbacks. This solution is easily implemented and gives users the ability to use short, neat URLs free of bulky Query String parameters. To start using the module, you simply have to add a reference to the RewriteModule assembly in your web application and add a few lines of code to the web.config file, whereupon you have all the power of regular expressions at your disposal to override URLs. The rewrite module is easily maintainable, because to change any 'virtual' URL you only need to edit the web.config file. If you need to test your application without the module, you can turn it off in web.config without modifying any code.
To gain a deeper insight into the rewriting module, take a look through the source code and example attached to this article. I believe you'll find using the rewriting module a far more pleasant experience, than using the native URL mapping in ASP.NET 2.0.

No comments:

Post a Comment