Hi people,
I need to construct a URL that will contain text that users provided. Since users can provide any text, I need to clean the text such that it could be used inside a URL without making its syntax invalid.
This means some simple stuff like converting whitespaces to hyphens, but I also need to remove punctuation marks and so on.
Does anyone know of a sure way of cleaning text like that?
How can I know whether the “clean” text is safe to be used in a URL?
here is an example:
The user provided the following text: I’m Not here!
The clean version would be im-not-here
I am lowercasing the text on purpose.
I am not sure that the ! symbol can appear in a URL.
Note that the text might non-English. In such case, any non-English character must be encoded in a similar way that Wikipedia does
any tips would be appreciated
Thanks