In the course of
web application development, it is sometimes necessary to publish a web page or any web content in languages other than English. However, some non-European languages like Chinese, Japanese, Korean etc., has writings, which cannot be represented with a single byte-code as these languages require larger character sets. DBCS or Double-Byte Character Set is solution to this where each character is represented by 2-bytes.
The term Character Encoding means how characters are arranged or positioned as a certain sequence of bits. The process uses an encoding form to convert an integer code to a series of integer code values to store in a system with fixed bit widths. Thus, to represent a certain character, a Character Set is used to map it directly with a definite bit pattern. Use of DBCS increases the possible number of binary combinations to 65,536 from 256 in Single-Byte Character set.
In Double-Byte Character Set, all characters including the Control characters are encoded in 2-bytes. The most significant bit set occupies the lead byte in DBCS and is paired with a Single-Byte Character Set known as a trail byte. A lead-byte value is always above 127 and also no 7-bit ASCII character can be used as a lead byte. Moreover, for maintaining compatibility in
web & software development environments, the DBCS is associated with full width (mostly graphic characters) characters.
DBCS codes double the memory requirement which is sometimes undesirable in constrained scenarios. Also, the requirement of large scale code modification to switch from 8-bit to 16-bit process is another drawback of DBCS. Nevertheless, in comparison with Unicode (Universal Character Set Standards) which is preferred because of its vast repertoire of more than 100,000 characters, Double-Byte Character Set is preferred in
affordable web development or application designing processes where old operating systems or applications are in place. Also, DBCS is considered as simpler and safer programming option as the code written can work well with arrays and substrings with very little scope of errors.