The lack of Unicode support in the wiki is something that has bugged me for a couple of years. M$ Word is the usual source of these non-standard characters, such things as em- and en-dashes and curly (smart) quotes. If these non-ASCII (non-keyboard) characters were pasted into a topic they would be displayed as garbled nonsense in the viewer and editor.
This project was written in Delphi 2007 which does not natively support Unicode. I have XE, but I have no desire to go through the hair-pulling, forehead-slapping ordeal of updating all of the components in this project to yet another version of Delphi.
Since it did not support Unicode, the most likely suspect to me was the standard TRichEdit component. So, back in April I purchased the TMS Unicode component package. Well, that didn’t get very far because there was a major bug in the search function. Really, what good is a wiki without a search function? So, I emailed support and they put it on their to-do list. Eventually it got fixed and I updated that component this morning. Unfortunately that didn’t fix anything. So began the dogged pursuit of a fix.
The next suspicious component on the list was the HTML viewer. So, I found an update to the 5 year old open source component on code.google. After removing the old and re-compiling the new, no change.
Still believing it was a component problem, I looked into the database driver as the culprit. I pulled up SQL manager and did a dump of the raw data. It was fine, all the proper Unicode characters were there.

Then I began to believe that it had to be a problem with the variable declaration. All the experts on stack overflow had an opinion on what was the best string type to use for Unicode, and they were all different. So, I systematically changed the variable holding the text to every oddball string type available in Delphi: Utf8String, WideString, ANSIstring, and RawByteString. It didn’t matter, the output was still garbled.
I finally came across the open source Fundamentals code library on SourceForge. Included in that library were various decoder routines for converting character sets and encodings to and from Unicode. A function within it called UTF8StringToUnicodeString was the trick. Finally, it displays Unicode formatted text properly.
I hope all you copy-and-pasters appreciate how much fricking work this was. And don’t get me started on what a gawd-awful-piece-of-shit-poor-excuse-for-a-word-processor M$ Word is.

















