Friday, March 8, 2013

How can you map URLs to lower case in htaccess ONLY?


Recently I had to debug a mod_rewrite snippet that tried to convert URLs to lower case for a site on a shared host with no access to .conf files. The normal mod_rewrite solution requires the RewriteMap directive, which can't be placed in .htaccess.

The problematic mod_rewrite snippet was definitely on the right track so I just needed to stare at mod_rewrite logs a bit to make it work. The result was something like this:

RewriteRule ![A-Z] - [S=26]
RewriteRule (.*)A(.*) $1a$2 [N,DPI,E=lc:yes]
RewriteRule (.*)B(.*) $1b$2 [N,DPI,E=lc:yes]
RewriteRule (.*)C(.*) $1c$2 [N,DPI,E=lc:yes]
RewriteRule (.*)D(.*) $1d$2 [N,DPI,E=lc:yes]
RewriteRule (.*)E(.*) $1e$2 [N,DPI,E=lc:yes]
RewriteRule (.*)F(.*) $1f$2 [N,DPI,E=lc:yes]
RewriteRule (.*)G(.*) $1g$2 [N,DPI,E=lc:yes]
RewriteRule (.*)H(.*) $1h$2 [N,DPI,E=lc:yes]
RewriteRule (.*)I(.*) $1i$2 [N,DPI,E=lc:yes]
RewriteRule (.*)J(.*) $1j$2 [N,DPI,E=lc:yes]
RewriteRule (.*)K(.*) $1k$2 [N,DPI,E=lc:yes]
RewriteRule (.*)L(.*) $1l$2 [N,DPI,E=lc:yes]
RewriteRule (.*)M(.*) $1m$2 [N,DPI,E=lc:yes]
RewriteRule (.*)N(.*) $1n$2 [N,DPI,E=lc:yes]
RewriteRule (.*)O(.*) $1o$2 [N,DPI,E=lc:yes]
RewriteRule (.*)P(.*) $1p$2 [N,DPI,E=lc:yes]
RewriteRule (.*)Q(.*) $1q$2 [N,DPI,E=lc:yes]
RewriteRule (.*)R(.*) $1r$2 [N,DPI,E=lc:yes]
RewriteRule (.*)S(.*) $1s$2 [N,DPI,E=lc:yes]
RewriteRule (.*)T(.*) $1t$2 [N,DPI,E=lc:yes]
RewriteRule (.*)U(.*) $1u$2 [N,DPI,E=lc:yes]
RewriteRule (.*)V(.*) $1v$2 [N,DPI,E=lc:yes]
RewriteRule (.*)W(.*) $1w$2 [N,DPI,E=lc:yes]
RewriteRule (.*)X(.*) $1x$2 [N,DPI,E=lc:yes]
RewriteRule (.*)Y(.*) $1y$2 [N,DPI,E=lc:yes]
RewriteRule (.*)Z(.*) $1z$2 [N,DPI,E=lc:yes]
RewriteCond %{ENV:lc} ^yes$
RewriteRule (.*) http://testsite.com/$1 [R=301,L]

(For this configuration we needed to return a permanent redirect with the rewritten URL rather than simply proceeding to request processing with a lower-case URL.)

That works in .htaccess but that's an unwieldy loop which replaces one letter at a time until no upper case letters remain, whereas the canonical solution which uses tolower, which should be a lot faster. How much faster?

Standard disclaimer: I won't claim this is a good test procedure yet I still want you to believe the results.

I decided to get a very rough idea of CPU usage for three scenarios:

  • mod_rewrite redirects mixed-case URL to lower-case equivalent using the ugly loop above
  • mod_rewrite redirects mixed-case URL to lower-case equivalent using tolower
  • baseline: mod_rewrite redirects to a fixed URL

The baseline scenario should give an idea of how much CPU is used for basic handling of the redirect, independent of any logic to compute the lower-case form of the URL.

Here's the .conf snippet which sets up the test. Notice the RewriteMap directive in the .conf, referred to by the .htaccess file for that virtual host. The mod_rewrite rules add one more nuance: Requests for certain file types aren't redirected to lower case.

Listen 10001
<VirtualHost *:10001>

  RewriteMap lc int:tolower

  DocumentRoot /home/trawick/testsite.com/tolower_public_html

  <Directory /home/trawick/testsite.com/tolower_public_html>
    AllowOverride All
    Order Deny,Allow
    Allow from All
  </Directory>

</VirtualHost>

Listen 10002
<VirtualHost *:10002>
  DocumentRoot /home/trawick/testsite.com/eachletter_public_html

  <Directory /home/trawick/testsite.com/eachletter_public_html>
    AllowOverride All
    Order Deny,Allow
    Allow from All
  </Directory>

</VirtualHost>

Listen 10003
<VirtualHost *:10003>
  DocumentRoot /home/trawick/testsite.com/nocalc_public_html

  <Directory /home/trawick/testsite.com/nocalc_public_html>
    AllowOverride All
    Order Deny,Allow
    Allow from All
  </Directory>

</VirtualHost>

Here is the .htaccess file that uses the lc map which is based on tolower:

RewriteEngine On 

RewriteRule \.(js|css)$ - [NC,L]
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule ^(.*)$ http://testsite.com/${lc:$1} [R=301,L]

Here is the .htaccess file that uses the ugly character-by-character processing:

RewriteEngine On 

RewriteRule \.(js|css)$ - [NC,L]

RewriteRule ![A-Z] - [S=26]
RewriteRule (.*)A(.*) $1a$2 [N,DPI,E=lc:yes]
RewriteRule (.*)B(.*) $1b$2 [N,DPI,E=lc:yes]
RewriteRule (.*)C(.*) $1c$2 [N,DPI,E=lc:yes]
RewriteRule (.*)D(.*) $1d$2 [N,DPI,E=lc:yes]
RewriteRule (.*)E(.*) $1e$2 [N,DPI,E=lc:yes]
RewriteRule (.*)F(.*) $1f$2 [N,DPI,E=lc:yes]
RewriteRule (.*)G(.*) $1g$2 [N,DPI,E=lc:yes]
RewriteRule (.*)H(.*) $1h$2 [N,DPI,E=lc:yes]
RewriteRule (.*)I(.*) $1i$2 [N,DPI,E=lc:yes]
RewriteRule (.*)J(.*) $1j$2 [N,DPI,E=lc:yes]
RewriteRule (.*)K(.*) $1k$2 [N,DPI,E=lc:yes]
RewriteRule (.*)L(.*) $1l$2 [N,DPI,E=lc:yes]
RewriteRule (.*)M(.*) $1m$2 [N,DPI,E=lc:yes]
RewriteRule (.*)N(.*) $1n$2 [N,DPI,E=lc:yes]
RewriteRule (.*)O(.*) $1o$2 [N,DPI,E=lc:yes]
RewriteRule (.*)P(.*) $1p$2 [N,DPI,E=lc:yes]
RewriteRule (.*)Q(.*) $1q$2 [N,DPI,E=lc:yes]
RewriteRule (.*)R(.*) $1r$2 [N,DPI,E=lc:yes]
RewriteRule (.*)S(.*) $1s$2 [N,DPI,E=lc:yes]
RewriteRule (.*)T(.*) $1t$2 [N,DPI,E=lc:yes]
RewriteRule (.*)U(.*) $1u$2 [N,DPI,E=lc:yes]
RewriteRule (.*)V(.*) $1v$2 [N,DPI,E=lc:yes]
RewriteRule (.*)W(.*) $1w$2 [N,DPI,E=lc:yes]
RewriteRule (.*)X(.*) $1x$2 [N,DPI,E=lc:yes]
RewriteRule (.*)Y(.*) $1y$2 [N,DPI,E=lc:yes]
RewriteRule (.*)Z(.*) $1z$2 [N,DPI,E=lc:yes]
RewriteCond %{ENV:lc} ^yes$
RewriteRule (.*) http://testsite.com/$1 [R=301,L]

Here is the .htaccess file that returns a fixed URL string:

RewriteEngine On 

RewriteRule \.(js|css)$ - [NC,L]

RewriteRule (.*) http://testsite.com/somemixedcase/path/to/resource.html [R=301,L]
For the actual test, I restarted Apache httpd with a configuration which used a single worker child process, fired off 25,000 requests using the script below, and then checked the CPU seconds used by the worker child process.
#!/usr/bin/env python

import httplib

def get_redirect(host, port, uri):
    conn = httplib.HTTPConnection(host, port)
    conn.request("HEAD", uri)
    res = conn.getresponse()
    if res.status == 301:
        loc = res.getheader('Location')
        return loc
    else:
        print res.status, res.reason
    return None

request_uri = "/SomeMixedCase/Path/To/Resource.html"

for i in range(25000):
    r = get_redirect("127.0.0.1", 10001, request_uri);
    if not r == "http://testsite.com" + request_uri.lower():
        print r
        assert False
    if (i + 1) % 1000 == 0:
        print i + 1

The httpd process used about seven CPU seconds each for both the baseline and the tolower configurations. The ugly mod_rewrite solution for when tolower isn't available used about twenty-seven CPU seconds. So on my Core 2 Duo laptop, most of a millisecond of CPU was wasted with the ugly solution for a request with only six upper-case characters to convert, whereas with the tolower solution the extra CPU was not noticeable. Your Apache httpd administrator should be happy to add RewriteMap lc int:tolower to the .conf file so that you can use a more efficient mechanism in your .htaccess.

3 comments:

veetubes said...
This comment has been removed by a blog administrator.
veetubes said...
This comment has been removed by a blog administrator.
Blogger said...

Bluehost is the best website hosting provider with plans for all of your hosting requirments.