/    Sign up×
Community /Pin to ProfileBookmark

PHP Data Filtering

I don’t have access to PHP 5, so I can’t use any of the new FILTER functions. However, I want to filter the data coming into my forms to prevent attacks. Is there a reference of pre-written scripts I can use or would I be better off writing my own functions that clean the data?

to post a comment
PHP

18 Comments(s)

Copy linkTweet thisAlerts:
@MindzaiMar 06.2010 — How you clean the data depends on what you are going to do with it. If you are going to insert the user data into a database then you need to at least escape the strings, for example mysql_real_escape_string. However my advice would be use something like PDO with prepared statements and that will take care of things for you. Display to the user is another potential security issue, so there you would want to at least run strings through htmlspecialchars or htmlentities. If you are sending email you need to filter for header injections, for this I tend to use a regex. It really all depends on what you want to use the data for, and what validation rules you want to apply to it. Most of the filter functions from PHP5 are easily replicated with regex, it will just not be as fast as the native C functions.

However if you are really concerned about security the biggest thing you can do is upgrade to PHP5, any host who doesn't offer it would make me immediately suspicious of their dedication to security in the first place.
Copy linkTweet thisAlerts:
@Jarrod1937Mar 07.2010 — If you can also filter out any characters you won't need for a fact, like filter out &#37; < > " ' - and ; ...etc.

It depends if you need to include these characters, but its best to remove the ones you won't need entirely from any user input. Xss and sql injection attacks can use encoded characters and other tricks to get around basic string escaping. Its also good to develop and run a $_GET and $_POST array sanitize function, adding exceptions/custom filtering when needed. This way you won't have to worry about filtering every user input all the time, it will be done automatically and even for hidden form input.
Copy linkTweet thisAlerts:
@bp_travisauthorMar 08.2010 — Thanks for the tips! I'm not a pro at PHP, so I don't exactly know what PDO is. When you say have a function that runs through the $_POST variables, how exactly would you implement that?

Thanks
Copy linkTweet thisAlerts:
@Jarrod1937Mar 08.2010 — [code=php]
//-------- POST Func ---------------//
function sanitizepost(&$input) {
foreach ($input as $key => $value) {

$filtervar=somefilterfunction/code applied to $value
// ===== Filtered input reassign ==== //
$input[$key]=$filteredvar;
}
}
[/code]


The $_POST and $_GET variables are really just super global arrays. Because of this you can create a function like above that cycles through the array elements and applies whatever sanitizing/filtering code you wish. You can then add exceptions/custom filters based on the $key value to identify the field its filtering (like an email address for example, it generally needs less restrictive filtering).

The cool thing is that this allows automatic filtering of most user input pathways.
Copy linkTweet thisAlerts:
@bp_travisauthorMar 30.2010 — Got around to trying out your code and it produced an array to string conversion error when I ran it through my $_POST variables
Copy linkTweet thisAlerts:
@Jarrod1937Mar 30.2010 — Can i see your implementation? I'm using that very code now so i know it works.

edit: though i do have a typo in my code above:

$filter[B]ed[/B]var=somefilterfunction/code applied to $value

// ===== Filtered input reassign ==== //

$input[$key]=$filteredvar;

But that still wouldn't produce an array to string error. Your filter function may not be handling the $value var correctly.
Copy linkTweet thisAlerts:
@bp_travisauthorMar 30.2010 — After I fixed the variable misspelling, there is no more errors, but all the post variables come out as the word array rather than their true value.

Here is my implementation of your function.
[code=php]
function sanitizepost(&$input) {
foreach ($input as $key => $value) {

$filteredvar=stripslashes(stripslashes(ereg_replace("[^A-Za-z0-9[:space:]]","",$input)));
// ===== Filtered input reassign ==== //
$input[$key]=$filteredvar;
}
}
[/code]


All my $_POST variables are either text or a value from a checkbox. Actually some of the checkbox names are arrays themselves - checkbox[].
Copy linkTweet thisAlerts:
@Jarrod1937Mar 30.2010 — You're feeding your function the $input, which is the original array. You want to feed it the $value, which is the actual values of the $_POST vars:


function sanitizepost(&$input) {

foreach ($input as $key => $value) {

$filteredvar=stripslashes(stripslashes(ereg_replace("[^A-Za-z0-9[:space:]]","",[B]$value[/B])));
// ===== Filtered input reassign ==== //
$input[$key]=$filteredvar;
}

}
Copy linkTweet thisAlerts:
@bp_travisauthorMar 30.2010 — Worked perfect! How can I make the script ignore certain $_POST variables. For example, I don't need it to sanitize the checkboxes that are in an array (they are named: item[]). I loop through them later, but they don't need sanitized.
Copy linkTweet thisAlerts:
@MindzaiMar 30.2010 — This is the issue with one-size-fits-all approaches such as this. It is too inflexible. You are usually better off applying appropriate rules to each data item individually.
Copy linkTweet thisAlerts:
@Jarrod1937Mar 30.2010 — This is the issue with one-size-fits-all approaches such as this. It is too inflexible. You are usually better off applying appropriate rules to each data item individually.[/QUOTE]
That's the beauty of the function. You can identify each item individually if you want to, just by referencing the $key (which is the name you're referencing when you type $_POST['somevalue']). Thus you can automatically apply a generic filtering algo, and then refine it for different data items if you wish.

Travis, if you want, you can reference the $key value to exclude the checkboxes if the $key matches the name of the checkboxes. However i'd be careful in assuming you don't need to sanitize them. Remember anyone can save your page, and alter the values to anything they wish, same reason why hidden form input is not safe.
Copy linkTweet thisAlerts:
@MindzaiMar 30.2010 — You can identify each item individually if you want to, just by referencing the $key...[/QUOTE]

Kind of defeats the purpose of a function if it can only be used with one set of data. I favour the approach of having a selection of functions which each perform one type of filtering, then calling them as necessary. A function such as the one posted here is going to be far too extreme in a lot of cases.
Copy linkTweet thisAlerts:
@Jarrod1937Mar 30.2010 — Kind of defeats the purpose of a function if it can only be used with one set of data. I favour the approach of having a selection of functions which each perform one type of filtering, then calling them as necessary. A function such as the one posted here is going to be far too extreme in a lot of cases.[/QUOTE]
Keep in mind the function can easily be switched based on many factors. Thus you're not limited to using it with one data set. I have my current function working drastically different on 4 different data sets.

Even if you wanted perfect granular control, this method is still better than the per situation approach as you can mass lump many different fields into a list and have them treated all at once as a single data set. And you can do this for as many datasets as you wish, all being filtered from a nice single main function.

That and are you sure that [B]every[/B] type of input (checkboxes, radio buttons, hidden inputs, $_GET vars, manually sent user POST vars (using specialized tools), and so on) is filted properly? With this function you don't have to worry that you forgot to filter something. The whole point is that it automatically sanitizes any vars that you may have missed, while allowing you the granular control you need.
Copy linkTweet thisAlerts:
@MindzaiMar 30.2010 — I have my current function working drastically different on 4 different data sets.[/QUOTE]

I'd venture that a function should concentrate on doing one thing only, and doing it well. If it were me, I'd have 4 specialist functions rather than one multi-purpose one. I suppose it's largely a matter of taste. My point really is to advise against blindly running any and all input through a heavy-handed function and assuming you're good to go.
Copy linkTweet thisAlerts:
@Jarrod1937Mar 30.2010 — I'd venture that a function should concentrate on doing one thing only, and doing it well. If it were me, I'd have 4 specialist functions rather than one multi-purpose one. I suppose it's largely a matter of taste. My point really is to advise against blindly running any and all input through a heavy-handed function and assuming you're good to go.[/QUOTE]
Yes, i see your point, but you're missing something. Whose to say you can't use the main filter function for any automatic sanitizing while adding exceptions to data sets of your choice. For those individual data sets you can then apply your specialized functions. Thus you get the benefit of a generic automatic sanitizing, while allowing the granular control to use specialized functions for those specialized data sets.
Copy linkTweet thisAlerts:
@bp_travisauthorMar 31.2010 — I see the point in both your arguments. I guess it just depends on preference and the situation. Regardless of which one I choose, I attempted to get the script not the filter the checkboxes, but could not get it to work:
[code=php]
function sanitizepost(&$input) {
foreach ($input as $key => $value) {
if ($key != "Checkbox1" || $key != "Checkbox2"){
$filteredvar=stripslashes(stripslashes(ereg_replace("[^A-Za-z0-9[:space:]]","",$value)));
// ===== Filtered input reassign ==== //
$input[$key]=$filteredvar;
}
}
}

[/code]
Copy linkTweet thisAlerts:
@Jarrod1937Mar 31.2010 — Edit: nevermind, i see you edited your post after you figured that out.

Edit 2: For your new problem, swap the || with &&, you want it to not filter the checkboxes if the key != this AND key != that. In the future if your exception list grows too big, you can simply create an "exception array", which is just an array that you can add keys to as you wish. Then during the filtering you can just check to make sure that the current $key does not exist within the array. You can also use this method to segregate your data sets and apply your own specialized sanitize functions as opposed to the general one.
Copy linkTweet thisAlerts:
@bp_travisauthorMar 31.2010 — Worked! I needed to filter out certain checkboxes because the checkbox names had [] in them and if I ran them through the sanitize function, I was unable to loop through the checkboxes later (since the [] in their name made them so I could loop through them later).

Thanks for all your code help.
×

Success!

Help @bp_travis spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.29,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...