Character Sets for Survey Messages

Overview

This article provides comprehensive information about character sets used in the EFS Survey, focusing on UTF-8 as the standard character set and listing other available options for various languages and regions.

Information

This guide explains the importance of UTF-8 in the EFS Survey and provides a detailed list of character sets that can be used for different language requirements.

UTF-8: The Standard Character Set for EFS Survey
Character sets that can be used in the EFS Survey

The standard character set for EFS Survey: UTF-8

The EFS Survey admin area is coded in UTF-8. Similarly, for newly created projects the
UTF-8 character set is set as default, if you have not made any different presettings for your
account.
UTF-8 is a character set defined by the Unicode Consortium.
Using UTF-8 will significantly facilitate implementation of foreign language and multilingual projects, in particular:

UTF-8 encompasses all characters that there are, thus all literary languages can be
reproduced.
You can enter characters from any given language directly in the admin area using
the keyboard.
All of the entered data and settings will be saved internally and uniformly in UTF8 – questionnaire texts, participant data, and internal EFS Survey data, such as to-do notes or user accounts.
The answers of participants to open questions are coded uniformly in UTF-8, thus
all open entries in multilingual surveys can be exported and viewed in one single
record.
Survey and panel passwords can contain characters from all possible languages

Character sets that can be used in EFS Survey

You can set the character set of your projects yourself. Tivian recommends the use
of the UTF-8 in general, this is true in particular with surveys that otherwise would
require several character sets.
The following table contains a complete selection of all available character sets.

Character set	Description
UTF-8	International character set Standard character set of EFS Survey
ISO 8859-1 West European	Latin 1 Covers the languages Albanian, Danish, German, English, Faroeic, Finnish, French, Galizic, Erse, Icelandic, Italian, Catalan, Dutch, Norwegian, Portuguese, Swedish and Spanish. A few characters, such as the Dutch “ij”, German quotation marks below, and the Euro symbol are missing.
ISO 8859-2 East European	Latin 2 Croatian, Polish, Romanian, Slovak, Slovenian, Czech and Hungarian.
ISO 8859-3 South European	Latin 3 Esperanto, Galizic, Maltese and Turkish
ISO 8859-4 Baltic	Latin 4 Estonian, Finnish, Greenlandic, Latvian and Lithuanian
ISO 8859-5 Cyrillic	Covers largely the languages Bulgarian, Macedonian, Russian, Serbian, Ukrainian, and White Russian.
ISO 8859-6 Arabian	Arabian. The direction of the text is from right to left.
ISO 8859-7 Greek	Modern Greek
ISO 8859-8 Hebrew	Hebrew. The direction of the text is from right to left.
ISO-8859-9 Turkish	Latin 5 Turkish. Based on ISO 6659-1, contains Turkish characters instead of Icelandic ones. Is also used for Kurdish.
ISO 8859-13 Baltic	Latin 7 Baltic languages. Replaces Latin 4 and Latin 6.
ISO -8859-15 West European	Latin 9 Extension of ISO-8859-1, in which a few seldom used symbols have been replaced with the Euro symbol, French and Finnish characters. Thus the languages French and Finnish have been covered completely.
ASCII (7-bit Charset)	ASCII character set
KO18-R, Russian	Russian and Bulgarian
Simplified Chinese, PRC standard	Chinese simplified
GB2312, EUC encoding, Simplified Chinese	Chinese simplified
GBK, Simplified Chinese	Chinese simplified
CNS11643 (Plane 1-3), EUC encoding, Traditional Chinese	Traditional Chinese
Big5, Traditional Chinese	Traditional Chinese. Used in Taiwan and Hong Kong.
Big5 with Hong Kong extensions, Traditional Chinese	Traditional Chinese with extensions for the Cantonese dialect
JISX 0201, 0208 and 0212, EUC encoding Japanese	Japanese
JISX 0201, 0208 and 0212, EUC encoding Japanese	Japanese
Shift-JIS, Japanese	Japanese
JIS X 0201, 0208, in ISO 2022 form, Japanese	Japanese
KS C 5601, EUC encoding, Korean	Korean
ISO 2022 KR, Korean	Korean
TIS620 Thai	Thai

FAQ

Why is UTF-8 recommended for the EFS Survey?

UTF-8 is recommended because it supports all characters from all languages, facilitates multilingual surveys, ensures uniform data storage, and allows for consistent coding of open-ended responses across different languages.

Can I use multiple character sets in a single survey?

While it's possible to use different character sets, it's generally recommended to use UTF-8 for all surveys, especially those that would otherwise require multiple character sets. This ensures consistency and simplifies data management.

How do I change the character set for my EFS Survey project?

You can set the character set for your projects in the EFS Survey admin area. However, it's recommended to use UTF-8 unless you have a specific reason to use a different character set.

Choose files or drag and drop files

Tags:

Was this article helpful?

Yes

Priyanka Bhotika
Posted

Comments

Please sign in to comment