home search customers contact
 
   

 

Razer Technology Solutions, Inc.

332 Gordon Drive
Exton, PA 19341
610-524-7073

  Apache Module: mod_rewrite

Module mod_rewrite Tutorial (Part 2): Rule Conditions
by: Dirk Brockhausen


In this tutorial's last instalment we started off with a discussion of the basics of Module
mod_rewrite. In the example reviewed there we made use of a rule which, put in full words, states:

"If access to file .htaccess is attempted, return an error message stating that access is denied."

This rule is valid globally, i.e. everyone will receive the specified error message.

We can, however, restrict a rule by what is termed "rule conditions" - in this case, the rule will only
be executed if the condition set has actually been met.

Syntax: The condition must precede the rule!

Let us explain this procedure with an example. The lines below are entries in file ".htaccess".

RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon
RewriteRule ^.*$ - [F]

The first three lines were covered in detail in Part 1 of this tutorial. Their function is to initialize the
rewriting engine.
The last two lines will refuse access to a spider carrying UserAgent "EmailSiphon".
This specific spider is an email harvester culling addresses from web pages.

Our line:

RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon

is made up of the following three parts:

Directive: RewriteCond
TestString: %{HTTP_USER_AGENT}
CondPattern: ^EmailSiphon

The TestString is a server variable which is written in the general form of "%{NAME_OF_VARIABLE}".

In our example we have defined the "HTTP_USER_AGENT" as "NAME_OF_VARIABLE".

CondPattern is a regular expression. Before we continue with its specifics, let us take a
look at regular expressions and their function in general.

Regular expression
Regular expressions are a means of describing text patterns. They are used to check if a text pattern is
present in any given text. Once determined, this pattern can then be manipulated.

Regular expressions are similar to a small, compact programming language in its own right.

E.g. the regular expression "s/abc/xyz/g" will globally replace the string "abc" in a text by "xyz".

Here is an overview of the most important elements with some examples:

.(dot) - text (any character)
| - alternation (i.e. /abc|def/)
* - quantifier (any number is allowed)
^ $ - line anchors
s - operator (string1 to be replaced by string2)
g - modifier (search parses the whole text)

Regular expressions are construed with the help of these elements and alphanumeric characters.

Regular expressions are not used isolated by themselves; instead, they are integrated in other
tools, e.g. in languages like Perl or in text editors such as Emacs.

In connection with Module mod_rewrite they are used in the directives RewriteRule and RewriteCond.

"^" represents the beginning of a string. It follows that the UserAgent must begin with string "EmailSiphon"
and nothing else. ("NewEmailSiphon", for example, would not work.) In this case the condition would not be met.

But as this particular regular expression doesn't contain the character "$" (end of line anchor), the
UserAgent could, for example, be "EmailSiphon2".


The last script line:

RewriteRule ^.*$ - [F]

defines what will happen when a spider is requesting access.

The regular expression "^.*$" signifies:

If access to any file is requested, the error message "forbidden" will be displayed.

The dot "." in the regular expression is a meta symbol (wildcard) and signifies any random character.

"*" signifies that the string may occur an unlimited number of times. In this case, regardless which
specific page is called, an error message will be displayed.


EmailSiphon is, of course, not the only email harvester. Another famous member of this family is
"ExtractorPro".

So let's say we want to fend off this spider as well. In this case we will require another condition to be
met.

This gives us the following entries to file ".htaccess":

RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro
RewriteRule ^.*$ - [F]

The third argument ([OR]) in line:

RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]

is termed a "flag". In regard to conditions there
exist two possible flags:

NC (no case)
OR (or next condition)

Flag "NC" permits case insensitive testing of the
condition pattern.


Example

RewriteCond %{HTTP_USER_AGENT} ^emailsiphon [NC]

This line specifies that both "emailsiphon" and "EmailSiphon" shall be recognized.

If you wish to use multiple flags, you may delimit them by commas.

RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro

There are no restrictions to the number of conditions. Thus, you could block 10, 100, 1000 or more
established email harvesters. Defining these 1000 conditions is merely a question of server performance
and of ".htaccess" transparency.

In the above example, the string "HTTP_USER_AGENT" is being used.

Further server variables are:

REMOTE_HOST
REMOTE_ADDR

For example, if you want to block the spider comming from < www.cyveillance.com >, you will use variable
"REMOTE_HOST". Thus:

RewriteCond %{REMOTE_HOST} ^www\.cyveillance\.com$
RewriteRule ^.*$ - [F]

The dot "." in the domain name must be protected by "\" (backslash), otherwise it would be handled like
any other meta character.

If you want to block any given IP, the condition will read:

RewriteCond %{REMOTE_ADDR} ^216\.32\.64\.10$
RewriteRule ^.*$ - [F]

In the regular expression, enter the IP in its entirety, delimited by the line anchors.

You may even exclude a whole IP range from access:

RewriteCond %{REMOTE_ADDR} ^216\.32\.64\.
RewriteRule ^.*$ - [F]

This example will cover all individual IPs from "216.32.64.0" through "216.32.64.255".

Here's a little teaser quiz for you to check out your skills. The solution will be featured in the next
part of our tutorial. Enjoy!

RewriteCond %{REMOTE_ADDR} ^216\.32\.64
RewriteRule ^.*$ - [F]

Quiz Question
If we don't write "^216\.32\.64\." for a regular expression in the configuration above, but
"^216\.32\.64" instead, will we get the identical effect, i.e. will this exclude the same IPs?

Up until now we have used a simple RewriteRule which will generate an error message. In the
3rd part of our tutorial we will analyze how RewriteRule may be used to redirect visitors to specific files.

Part 3 Next >>



This text may freely be republished or distributed provided the following resource box is included intact
either at the beginning or the end of the article and a complimentary copy or notice (link) is sent to the
author at the address specified below:

Dirk Brockhausen is the co-founder and principal of fantomaster.com Ltd. (UK) and fantomaster.com GmbH
(Belgium), a company specializing in webmasters software development, industrial-strength cloaking and
search engine positioning services. He holds a doctorate in physics and has worked as an SAP
consultant and software developer since 1994. He is also Technical Editor of fantomNews, a free newsletter
focusing on search engine optimization, available at http://fantomaster.com/fantomnews-sub.html

You can contact him at fntecheditor@fantomaster.com
(c) copyright 2000 by fantomaster.com



Corporate Press Releases (more)

- Software Articles
- x86 Hardware
- WebMaster Articles
- Recent Additions



Find Out About Our RFI Package. eMail: offerings@razertech.com


  home · services · about us · copyright · the eZine · contact · customer site
Copyright © Razer Technology Company 1999 - 2008. Legal Disclaimer Site Mod:1/2008.