Do Hackers Care About Your Meta CharSet?

á is part of the meta charset UTF-8

The other day, I got this question about the meta charset tag in the mail:

“Would you be so kind as to explain to me, in logical terms, how in 7 hecks does the presence of:
<meta charset=”utf-8″>
is supposed to affect a ‘hacker attack’ in accordance to your book?
I literally cannot imagine hackers caring much about which charset you display on the UI side when attacking.”

This is a great question. The book he’s referencing is Sams Teach Yourself Bootstrap in 24 Hours, and in it I emphasize the importance of using the meta charset tag and placing it as the first element of your <head> element. But while I explain that leaving it out can leave a page vulnerable to hacks, I don’t explain why. But I will now.

Cross-Site Scripting (XSS) Attacks

According to Wikipedia

“[a cross-site scripting or XSS attack] injects client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same-origin policy.”

In other words, a website that has an XSS vulnerability can be hacked.

The UTF-7 Attack and How to Protect Your Site

There is an exploit that hackers can use called the UTF-7 XSS attack. They attach content to web forms and other pages that include malicious code encoded as the fake encoding “utf-7.”

Web pages that do not have the character set defined are vulnerable and can be attacked with this exploit. This doesn’t mean that every web page without the character set defined are vulnerable, because XSS attacks require other elements like JavaScript or a form. But if a web developer adds those elements to the page in the future, that page could be hacked.

Why Define the Character Set?

While it’s true that not every page has a cross-site scripting (XSS) vulnerability directly, by getting in the habit of always defining the character set of your web pages, you reduce the risk that you’ll forget to do it on a page that could be attacked.

It’s also true that many web servers use HTTP requests to define the character set automatically on all pages. But it’s not easy for a web designer to check that. Plus if the server changes or the pages are moved, the new server might not have that feature. By adding the <meta charset=”utf-8″> tag to your <head> element, you ensure that the page has one level of protection without having to do anything else.

Note: you don’t have to use the meta tag to define the character set. You can use HTTP on the server, set it with PHP or another scripting language. It doesn’t really matter how you define the character set, as long as it’s set. It’s just easiest for most web developers to do it with the meta tag, because that’s what we have direct access to.

Do I Have to Use UTF-8 in the Meta Charset?

It doesn’t matter what character set you use, as long as you use a valid one. I like UTF-8 because it includes a huge number of characters so I can write things like ñ and ∫ right in my HTML without having to use character codes. The default character set in HTTP 1.1 is ISO-8859-1.

This is Not the Final Word on XSS Protection

Please do not expect that adding the meta charset tag will protect your site and scripts from all XSS attacks. This is not true. There are many things you need to do to protect your scripts from XSS vulnerabilities. But adding the meta charset tag is a start.

XSS is not the only way your website can be hacked, for example the Shellshock exploit that came out a few years ago. Don’t think just because you’re writing HTML that your web pages cannot be hacked. Always take security seriously, even if, like the questioner, you cannot imagine a way that hackers can attack your pages. Just because you can’t imagine it doesn’t mean that the hackers can’t. And they are imagining new attacks and exploits every day.

Leave a Reply

Your email address will not be published. Required fields are marked *

9 − 8 =