Addslashes(): don't call it a comebackstill number one for escaping SQL statementsI've seen a lot of people talking about mysql_real_escape_string() vs addslashes() vs addcslashes(). There seems to be a lot of real confusion about what these functions do (even with the php.net manual around), especially when it comes to character sets. I feel that some people are being scared into using some escaping methods with which they are not very familiar. So, I've decided to lay it all out in a few charts so there is no confusion about what each function does and how each can help protect against SQL injection attacks. AS = addslashes() MRES = mysql_real_escape_string() ACS = addcslashes() //called with "\\\000\n\r'\"\032%_"
*1 MRES can escape with single quotes instead of backslashes if you set NO_ESCAPE_BACKSLASH during your SQL session. Since this is strictly a MySQL function, its use is dubious. *2 Although MRES can be character set aware, it requires that you explicitly set the character set of the string which needs to be escaped, a feature that was not available until php 5.2.3. *3 Although MRES stops the intended SQL injection, it can result in database errors, which, if unexpected could leak more information to the attacker. It also requires you to properly set your character sets, a feature that was not available until php 5.2.3. As you can see, none of these functions fully protect against "multi-byte" injection attacks. I qualify this with "fully" because if you have mismatched character sets, MRES can present your attacker with an unexpected error message, or fail to work at all. As far as speed goes, you can see that MRES can be up to twice as slow as AS or ACS. This is because the data to be escaped is sent to the libmysqlclient library. While this has some benefits when dealing with outdated character sets, it presents a number of problems. When performing unit tests (you write unit tests, don't you?), if you are using MRES to escape your data, you are saddled with being required to have a Mysql server running. Hopefully the same version as the live server. too. In the eyes of a unit testing 'purist', requiring a running database might disqualify your test from being 'true' unit tests. I say this half-jokingly. But, it does raise the question of how isolated you can make your code and could stand in the way of making mock objects for your database layer. Multi-byte character sets is where the entire SQL escaping debate really takes off. There is so much confusion about how to properly escape text for exotic (or you could call them "outdated") character sets. In the above comparison there is a line that reads "character-set aware", and only MRES has a "yes" for that column. If we're assuming that "character-set aware" is a Good Thing, this begs the question, "When do you need to be aware of your character set?" Answer, all the time. But, for SQL escaping, it turns out that you need to be character set aware only some of the time. In order to execute a multi-byte SQL injection the database tables must be created with a special character set and/or collation during table creation time, or the mysql client library must be instructed to use a special character set during the connection. Having a different table collation/character set other than the current character set of the SQL client (libmysqlclient for PHP) will result in a collation error even if the SQL is escaped properly by MRES. The client character set was set with mysql_set_charset()
As you can see from this table, if your database is constructed only to deal with BGK (or SJIS, or other vulnerable charset) and you only deal with that character set at connection time, then you need MRES to properly escape your data. Here is where PHP specific issues come to light. A lot of people have read that they should send the SQL commands "SET NAMES" or "SET CHARACTER SET" to MySQL in order to change the running connection's character sets. While this is true, it doesn't tell the whole story for PHP. PHP uses the libmysqlclient library to pass commands to mysql either over a network or through a Unix socket. This MySQL library has some utilities for clients wanting to use MySQL, one of these features is the function "mysql_real_escape_string". Unfortunately, for libmysqlclient, it cannot parse commands sent to MySQL, so it is blissfully unaware of any "SET NAMES" commands you might have sent to the server. As you can see in the graph above, any MySQL commands pass right through libmysqlclient, but the PHP function mysql_real_escape_string() is actually part of the libmysqlclient C code and does not need to round trip to the server. (It does need to round trip "out" of the Zend memory space, which is why you see the slowness on large binary objects.) The "new-ish" PHP function mysql_set_charset() is exactly what you need to instruct libmysqlclient to be aware of any desired character set. This should be used in conjunction with any "SET NAMES" commands so that the library and the server are in sync. AS is the most basic function. It has its limitations, but if you are aware of the character sets flying around you it should not pose any significant danger. Since ACS or str_replace must be used for LIKE-type queries, ACS should be sufficient over MRES for all but the most unusual cases. If ACS could take an optional parameter to use a single quote or a backslash it would be the best SQL escaping tool, hands down, for any backend database. The moral of this story is, go ahead and keep using addslashes(). It's fast, and fast... and it works with UTF-8. If you have a SQL library, which knows when you're adding LIKE or GRANT clauses and parameters, use addcslashes() instead. What else do you need to know? People who say that you "should" be doing something this way or that (parameterized queries) should understand that *knowing* your data and being confident about a solution is more important than just reaching for the most comfy looking security blanket without understanding what's going on. Remember, all data looks like binary data to a computer, it's only how you interpret it with charsets, plugins, protocols, etc. which makes data really data. |