• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

PHP/MySQL question: When is the best time to apply htmlentities() to user's input?

stndn

Golden Member
I'm developing a PHP front-end for a review site. The data submitted by the user will be processed by PHP and submitted to MySQL database.
In order to prevent unexpected entries, we would apply htmlentities() to the user input before doing mysql_real_escape_string().

This is to say, for example, if the user enters:
<strong>This & that - it's all "your choice"</strong>

We want the text to be encoded to (pardon the spaces between & and the entity):
& lt; strong & gt; This & amp; that - it's all & quot; your choice & quot; & lt;/strong & gt;

Such that when viewed from a browser, instead of displaying the comment in bold (due to <strong></strong> ) , we want the text to be displayed as is. That is: Shown literally as <strong>This & that - it's all "your choice"</strong>

Now the question is, when is the better time to apply htmlentities()?
1. When the user enters the text, the PHP script will do the htmlentities(), then store the encoded text into database. With this, the text don't need to be encoded everytime we display the text.

2. When the user enters the text, the entry is submitted to the database as is. Only when the text is displayed then we do the encoding with htmlentities() called to the data.


Of course there are dis/advantages to both approach. Let's say if someday we change our development such that the user's review are to be displayed in full HTML (that is - to show the above text as bold), we will then have to do html_entity_decode() on all the texts IF we choose to store using method #1.
However, if the data is stored using method #2, then we just read out the text from database and display them as is.


So, what's your opinion on this?
Should the data be encoded before submitting to database, or when it's displayed?
Any performance or security concern if we choose to apply either one of the method?
Or is there better way to store the data and still open for future changes?


thanks.
-stndn.

edit:
modified the second line to show the &-encoded texts
 
store something that is as close as possible to what the user entered and ensure that your display logic does the right thing (escapes the html entities or displays the rendered html as appropriate)
 
The rationale for changing it only upon display is because the changes are not 100% reversible. If you later need what the user typed in, you cannot reliably recover that information.
 
Hmmm... I thought I read somewhere that it's a safer approach to change them before storing to database..
Although I'm really unsure now that I've read more on this topic.

My initial thought was that if I have to change them everytime I display them, it will put a little more processing to the scripts. I understand that the extra processing is almost negligible, though... I mean, 20+ htmlentities() won't really slow down the script too much, right? Maybe a few extra miliseconds?

But yah, changes are not 100% reversible, and that's the important point...

Anyways, it's settled then - store data as is, and change when displayed.


thanks.
-stndn.
 
The danger that you need to avoid is SQL injection, but that's handled through escaping the quotes, not htmlentities.
 
Back
Top