Punctuation Problem in WordPress

Did you recently move your website to a new host? Perhaps you weren’t happy with the loading speed, uptime, or storage of your old web hosting service, and have now upgraded to a host that has fast loading speeds, high uptime, and plenty of storage.

Fast page loading speeds and high uptime are quite important. Not only does a slow website frustrate visitors, but it also has a negative impact on your site’s ranking in the SERPs. Downtime, on the other hand, can cause you to lose precious sales, and if it lasts longer than a couple of hours, Google crawlers may end up permanently dropping your site from their index.

While there are a lot of perks that come with moving your website to a new host, it’s not without its bugs. One of the most common problems that arise when your website is moved to a new server is a punctuation problem. If you’re using WordPress, you may be familiar with the issue: all of a sudden your apostrophes () are showing up as the †character instead. This is due to a UTF-8 encoding issue which sometimes occurs during website transfers.

Don’t worry, though! This issue is quite easy to fix, and we’ll show you just how to do that.

What Does †Mean?

†replace apostrophes when your WordPress website experiences a character encoding issue. Usually, the problem has to do with UTF-8 encoding. This punctuation problem isn’t limited to WordPress, it sometimes happens with emails, too. So in order to understand how to stop †from replacing your apostrophes, we need to understand…

What Is UTF-8?

UTF-8 stands for Unicode Transformation Format (UTF), and it’s a type of encoding used in Unicode Consortium character codes. But what does this… mean?

Encoding: Converting Computer Language to Human Language

So, all things digital – including various software, desktop, and phone apps, as well as websites, pages, and posts – are based on a system of bytes. It, therefore, follows that the text involved in both coding and presentation (written text for the end-user) is also stored in bytes. Each one of the various characters that comprise a word of text are represented by strings of bits in computers.

In 1960, i.e. the early days of digital technology, symbols – or characters – were converted to binary (the language read by computers) with the ASCII – the American Standard Code for Information Interchange. The ASCII is basically one of the earliest standardized encoding systems for text, which means that it translates the characters that we use into binary strings which computers use.

However, the ASCII was limited to English text and the English alphabet, so a new solution was necessary to convert characters from different alphabets and systems of symbols, one which would be universally applicable across the globe.

Introducing the Unicode

This was solved with the introduction of the Unicode. The Unicode also assigns special, unique codes (code points) to each character so that they can be converted from human language to computer language. Unlike ASCII, however, Unicode has a coding system which covers all characters in all languages, with over a million unique codes devoted to each character and symbol – even emojis.

The Unicode Standard – character sets used for language conversion – is developed by the Unicode Consortium. It’s meant to replace existing character sets (for instance ASCII) with the UTF – Unicode Transformation Format. Today, Unicode has been implemented in all browsers, most operating systems, and coding languages like HTML, Java, JavaScript, email, XML, PHP, and so on. The two most commonly used encodings by the Unicode are UTF-8 and UTF-16.

Unicode Transformation Format 8 (UTF-8)

UTF-8 is by far the most commonly used Unicode encoding method. Not only is it the default set for HTML5, but it’s also used for 95.6% of all websites, as it’s the preferred encoding for websites and emails. UTF-8 can translate Unicode characters to their corresponding binary strings, and vice versa. The main reason why UTF-8 is preferred to other existing encodings is because of its efficient use of memory, i.e. storage space.

UTF-8 represents characters in one to four-byte units (1 byte=8 bits – that’s where the name comes from!). So why is this more space-efficient? Because UTF-8 assigns one-byte strings to commonly used characters (like the Latin alphabet), and larger strings (like four bytes) to less-commonly used characters, thus leaving enough space for all these symbols, which surpass a million.

What Causes Encoding Problems In WordPress?

So if UTF-8 is that great, how come it can still cause an issue in WordPress? Unfortunately, computers, servers, different types of software, and so on, can still have misunderstandings about fonts, languages, and encoding languages. That’s why you may experience punctuation issues in WordPress, such as †appearing instead of an apostrophe on your website.

For instance, some of the causes of the encoding problem are:

  • Older systems. Change takes time – especially when there are so many devices in the world, including servers and computers, which are still using older encodings.
  • Microsoft Windows uses different encoding. Microsoft has its own character sets, like ISO-8859-x or Windows-1252. Microsoft applications also do not use UTF-8 by default, especially when there aren’t characters involved that fall outside the scope of the Windows-1252 character set. This can cause issues when other operating systems – like Linux – are trying to read or open a Windows-made file.
  • New character fonts. When artists come up with new fonts, they have to redraw each possible character for it to be included in the Unicode. Plus, the Unicode Standard occasionally adds new characters and symbols to the set, which makes those other fonts incomplete. What this means, in the end, is that not all alphabets can use all fonts.
  • Byte Order Mark (BOM). BOM, a sequence of non-printable Unicode bytes, precedes any Unicode text in order to facilitate conversion (interpretation). Although it’s not mandatory, the BOM makes the process of application determining the Unicode format subtype easier to detect. However, this can cause incompatibility problems as not all applications are equipped to read the BOM. So for instance, a non-compatible application may read the BOM string sequence as a type of ASCII text, and erroneously interpret it as a Windows-1252 file. In this case, you may see a sequence of random characters () at the beginning of the file.

Now that we know may cause character encoding issues in WordPress, let’s take a look at how we can fix them.

How to Fix Character Encoding Issues in WordPress

There are two ways to prevent your WordPress website from interpreting apostrophes as â€, or replacing any other characters you usually use with gibberish.

Both methods are quite simple and involve a little bit of tinkering with your WordPress root files.

Option #1

We’ll outline the first solution to solving the encoding (punctuation) problem in WordPress. Simply put, you’ll need to comment out two UTF-8 lines from your wp-config.php file. If you’ve never edited your WordPress files, don’t worry! It’s all quite simple, just follow our step-by-step guide:

  1. Connect to your website’s server via FTP client.

(Is this your first time connecting to your website’s root files? No problem! Simply open your browser or file explorer, and write ftp://[FTP-server-IP-or-domain-name] into the search bar, replacing server IP or domain name with your server’s IP or domain name. Now, hit Enter. Type in your username and password and click Log On to connect to your FTP server. If you don’t know your username and password, find your host’s welcome email in your inbox, it should contain this information. If it’s not there, message your host requesting it.)

  1. Go to the WordPress folder.
  2. Go to the Root directory and find your wp-config.php file.
  3. Download the file and open it any code or text editor – Notepad works just fine!
  4. Find the following two lines:
    define(‘DB_CHARSET’, ‘utf8’);
    define(‘DB_COLLATE’, ”);
  5. Comment the two lines out by adding “//” in front of them, like so:
    //define(‘DB_CHARSET’, ‘utf8’);
    //define(‘DB_COLLATE’, ”);
  6. Save the file, and close it.
  7. Upload the updated wp-config.php file into your WordPress root folder, overwriting the previous one.

You’re done! your WordPress punctuation problem should be resolved.

Option #2

The second way you can do this is by following everything we’ve done up to step 5. Then:

  1. Delete the ‘utf8’ string in the first line, so this is what you’re left with:
    define(’DB_CHARSET’,”);
    define(’DB_COLLATE’, ”);

That’s it.

This punctuation problem, i.e. character encoding issue, can also happen when your database gets upgraded, so keep in mind that you may need to do this again.

A Few Words Before You Go…

Hopefully, you’ve managed to solve the character encoding issue in WordPress, and now your apostrophes are apostrophes and not â€. While UTF-8 is a widely used, sophisticated encoding that allows character sets including alphabets from all around the world to be used across different types of software, it can cause bugs every now and then. Luckily, the solution is easy!

View Related Articles

Cheap .com Domain Registration

The .com domain is the most popular domain extension in the world today. According to the latest statistics published by Verisign, there are over 350 million registered domains, and 51% of them use .com domain extensions. We’re not saying the crowd is always right, but in some cases, it makes a pretty good point.

All About The .win Top-Level Domain

The .win gTLD has grown in popularity generally amongst gaming websites and communities. Everyone wants to win, right? In any case, if you’re thinking about launching your own .win website, let us give you a tour of the history, purposes, and popularity of this gTLD.

All About the .blog Domain Extension

.blog is one of the most recently introduced gTLDs (generic top-level domain), and has quickly gained a sizable popularity. Should you register a .blog domain for yourself or your business? What are the advantages when compared to the classic gTLDs such as .com, .net, or .org? Let’s find out.

Leave a Comment

Your email address will not be published. Required fields are marked *