Index Share
Strength: 0

How to disable utf-8 control characters with regex (Text)

How to disable utf8 control characters with regex

If you allow utf8 control characters in user generated content people can do funny things to your webpage.

Your whole style can be corrupted and the wrongdoer can hide himself from administrative punishment by using utf8 cloaked names (in worst case the wrong dude gets the trouble).

Here is how to prevent this with some simple regex expressions:

Prevent people corrupting your page

This regex simply checks if all characters are space or word characters and allows an empty string. Make sure that ^ matches the beginning of the string and not of a newline and $ matches likewise the end of the string


Prevent people from using fancy space tricks

If you only want people not to use silent spaces or tabs you can replace the \s by an white-space. This way only white-spaces are allowed as spaces:

^[\w ]*$

But it is still possible to add spaces to the start or end and effectively creating silent spaces this way:

^(\w[\w ]*\w|\w?)$

Hopefully these regex hints were helpful.

Note: there is an disadvantage: you can't use <>/() and other operators anymore. For this you have to extend the regex with unicode selectors.



Select Token