I had an interesting conversation/debate over on reddit today on the topic of input handling and I thought it was worth posting up.
Essentially there are two approaches handling data in an web applications.
1. Carry out input validation as the data enters your application. This can either be white-list (only allow "known good" data types), or black-list (try to block "known-bad" data types)
2. Carry out output normalization on the data as it leaves your application. Here you look to understand the special or "meta" characters for the type of system or data format that you are outputting to and ensure that the data is encoded or rendered in such a way that it can't have a negative effect on that output system.
So which of these approaches is better? Well my view on that after my discussion is that they both have pros and cons, which need to be considered before making a choice.
Pros: The advantage of white list input validation is that, in a lot of cases, you can relatively easily cut down the number of attacks that will be effective against an application with minimal effort. For example if you're not taking in mark-up (eg, HTML) in any part of your application then stripping or blocking the < and > characters from your input will drastically reduce your exposure to Cross Site Scripting. This is the approach that .NET request validation takes.
Cons:The problem with input validation is, it can never take account of all possibilities. It's impossible to know when you take input into the application, how that data will be used and exactly where it will be processed in future, so there's always a risk that it will miss some class of character which turns out to be important to a given format
Pros: So essentially the opposite applies. The advantage of output normalization is that it can take into consideration the exact nature of the system that the data is being passed to and can ensure that the data it's passing will not have a negative effect. This kind of approach can be seen in HTML encoding function like h() in Ruby on Rails.
Cons: Essentially for me the major downside here is a practical one. A security control that needs to be implemented many times to be effective is one that is likely to be forgotten and because under ordinary conditions the application is likely to behave perfectly even though the control isn't in place, a developer may well not notice the problem until it's too late. You can see this kind of effect in a lot of web applications. I've seen many cases where the obvious areas of the application (form fields) have been covered for things like Cross Site Scripting, but more obscure areas (drop-downs, cookies, HTTP headers) get missed out, either because the developer forgets, or because they don't realise that those areas of the application are susceptible to attack
So which of these approaches would I recommend..... Well I'm a security person, so I'll say Both for defence in depth!
Beyond that I'd say that Input validation is a great first step and will cut down the practical attacks greatly, but if you're looking for a "perfect" approach then you'll need to add output normalization to the mix...