Basics
Applications in CODESYS can process a wide variety of characters, for example, to output an error message in various languages. Or to display visualizations in a language selected by the user which accepts user input in a wide variety of languages, characters, or symbols. If a comprehensive character set is not necessary, or if a project should not be changed, then strings which are encoded Latin-1 format can still be used.
Character set |
Code page number |
Description |
Character encoding |
---|---|---|---|
ASCII |
20127 |
|
7-bit encoded character |
DOS-Latin-1 |
819, 850 |
|
8-bit encoded character |
Latin-1 |
28591 |
|
8-bit encoded character |
Windows 1252 Encoding |
1252 |
|
8-bit encoded character |
Unicode |
|
|
|
Unicode 14.0 |
|
144,697 characters |
|
UTF-16 |
1200 |
|
16-bit encoded characters The characters are encoded either in 2 bytes or 4 bytes. |
UTF-8 |
65001 |
|
Tuple of 8-bit words per character The characters are encoded in different length from 1 to 4 bytes. |
UTF-8 in CODESYS
UTF-8 encoding is the encoding with the most comprehensive character set. Therefore, it is recommended that you enable UTF-8 encoding for new projects as well as for existing projects to be used in a new context.
Data type |
Compile option: UTF8 Encoding for STRING |
Which encoding is used project-wide? |
---|---|---|
STRING |
Enabled |
UTF-8 |
Disabled |
Windows 1252 encoding (default Windows encoding) Latin-1 |
|
WSTRING |
Enabled |
UTF-16 |
Disabled |
UTF-16 |
In CODESYS, the “STRING” data type can be encoded in Latin-1 or UTF-8 formats. The “WSTRING” data type always encodes its characters as Unicode in UTF-16.
Encoding a single string literal in UTF-8 format
Even if the project-wide encoding format is set to Latin-1, you can encode a single literal in UTF-8 format. To do this, add the “UTF8#” type prefix to the literal.
{attribute 'monitoring_encoding' := 'UTF-8'} strVarUtf8: STRING := UTF8#'你好,世界!ÜüÄäÖö';
For more information, see:
Constant: UTF8# String; ⮫ “Constant: UTF8# String ”
Pragma Attribute: ⮫ monitoring_encoding
String conversion for UTF-8 encoding
If you have enabled UTF-8 encoding project-wide, then you can use the string conversion functions as usual.
String manipulation
Use library functions to manipulate your strings.
If “STRING” variables should be manipulated, then an index access to a variable in ASCII format often leads to the desired result. It is better not to use this construct. It is not just a bad programming style. To make matters worse, with UTF-8 encoding, index access leads to unwanted string manipulation.
UTF-8 encoding only for project-wide configuration
A UTF-8 encoding is used if the project-wide compile option UTF8 encoding for STRING is enabled. Library functions and add-ons are then also oriented according to this setting.
If you use single UTF-8 encoded strings, then you have to make sure that they are interpreted correctly wherever they are used. For example, a string variable in the OPC server will be converted to UTF-8 before being transferred to a client if the setting is not selected. Values such as “UTF8#'äöü'” would then be misinterpreted. Similar problems can arise when outputting strings in the visualization.