156 relations: Abstraction layer, Addison-Wesley, Alt code, ANSEL, Arabic numerals, ASCII, Émile Baudot, Backward compatibility, Bacon's cipher, Baudot code, BCD (character encoding), Big5, Binary Ordered Compression for Unicode, Bitstream, Braille, Byte, Byte order mark, Byte-oriented protocol, C++, CCSID, Character (computing), Character encodings in HTML, Chinese telegraph code, CJK characters, Code, Code page, Code page 437, Code page 720, Code page 737, Code page 850, Code page 852, Code page 855, Code page 857, Code page 858, Code page 860, Code page 861, Code page 862, Code page 863, Code page 865, Code page 866, Code page 869, Code page 930, Code page 932 (Microsoft Windows), Code page 950, Code point, Code word, Comparison of Unicode encodings, Computation, Computer data storage, Computer science, ..., Content sniffing, Control character, Cross-platform, Cygwin, Cyrillic script, Data, Diacritic, EBCDIC, EBCDIC 037, EBCDIC 1047, Endianness, Escape sequence, Extended Unix Code, Fieldata, File (command), Function (mathematics), GB 18030, GB 2312, GBK (character encoding), Glyph, Greek alphabet, Guobiao standards, Hans Schjellerup, Hong Kong Supplementary Character Set, Hypertext Transfer Protocol, IBM, IBM 1401, IBM 1620, IBM 700/7000 series, Iconv, Indian Script Code for Information Interchange, Integer, International Components for Unicode, International maritime signal flags, ISO/IEC 2022, ISO/IEC 646, ISO/IEC 6937, ISO/IEC 8859, ISO/IEC 8859-1, ISO/IEC 8859-10, ISO/IEC 8859-11, ISO/IEC 8859-13, ISO/IEC 8859-14, ISO/IEC 8859-15, ISO/IEC 8859-16, ISO/IEC 8859-2, ISO/IEC 8859-3, ISO/IEC 8859-4, ISO/IEC 8859-5, ISO/IEC 8859-6, ISO/IEC 8859-7, ISO/IEC 8859-8, ISO/IEC 8859-9, JIS X 0208, JIS X 0213, KOI-7, KOI8-R, KOI8-U, KS X 1001, Latin alphabet, Legacy system, Luit, Mac OS Roman, Microsoft Windows, MIK (character set), MIME, Mojibake, Mojikyo, Morse code, Mozilla, Number, Octet (computing), Plane (Unicode), Punycode, Shift JIS, SIL International, Standard Compression Scheme for Unicode, String (computer science), Tamil Script Code for Information Interchange, Telegraph key, Telegraphy, Transcoding, TRON (encoding), Typographic ligature, Unicode, Universal Character Set characters, Universal Coded Character Set, Unix-like, UTF-16, UTF-32, UTF-8, Variable-width encoding, VSCII, Web browser, Windows code page, Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258, Writing system, XML. Expand index (106 more) » « Shrink index
In computing, an abstraction layer or abstraction level is a way of hiding the implementation details of a particular set of functionality, allowing the separation of concerns to facilitate interoperability and platform independence.
Addison-Wesley is a publisher of textbooks and computer literature.
On IBM compatible personal computers, many characters not directly associated with a key can be entered using the Alt Numpad input method or Alt code: pressing and holding the ''Alt'' key while typing the number identifying the character with the keyboard's numeric keypad.
ANSEL, the American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use, was a character set used in text encoding.
Arabic numerals, also called Hindu–Arabic numerals, are the ten digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, based on the Hindu–Arabic numeral system, the most common system for the symbolic representation of numbers in the world today.
ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication.
Jean-Maurice-Émile Baudot (11 September 1845 – 28 March 1903), French telegraph engineer and inventor of the first means of digital communication Baudot code, was one of the pioneers of telecommunications.
Backward compatibility is a property of a system, product, or technology that allows for interoperability with an older legacy system, or with input designed for such a system, especially in telecommunications and computing.
Bacon's cipher or the Baconian cipher is a method of steganography (a method of hiding a secret message as opposed to just a cipher) devised by Francis Bacon in 1605.
The Baudot code, invented by Émile Baudot, is a character set predating EBCDIC and ASCII.
BCD ("Binary-Coded Decimal"), also called alphanumeric BCD, alphameric BCD, BCD Interchange Code, or BCDIC, is a family of representations of numerals, uppercase Latin letters, and some special and control characters as six-bit character codes.
Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for Traditional Chinese characters.
Binary Ordered Compression for Unicode (BOCU) is a MIME compatible Unicode compression scheme.
A bitstream (or bit stream), also known as binary sequence, is a sequence of bits.
Braille is a tactile writing system used by people who are visually impaired.
The byte is a unit of digital information that most commonly consists of eight bits, representing a binary number.
The byte order mark (BOM) is a Unicode character,, whose appearance as a magic number at the start of a text stream can signal several things to a program consuming the text.
Byte-oriented framing protocol is "a communications protocol in which full bytes are used as control codes.
C++ ("see plus plus") is a general-purpose programming language.
CCSID is an abbreviation used by IBM to mean "Coded Character Set Identifier".
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.
HTML (Hypertext Markup Language) has been in use since 1991, but HTML 4.0 (December 1997) was the first standardized version where international characters were given reasonably complete treatment.
The Chinese telegraph code, Chinese telegraphic code, or Chinese commercial code is a four-digit decimal code (character encoding) for electrically telegraphing messages written with Chinese characters.
In internationalization, CJK is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives (collectively, CJK characters) in their writing systems.
In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form or representation, sometimes shortened or secret, for communication through a communication channel or storage in a storage medium.
In computing, a code page is a table of values that describes the character set used for encoding a particular set of characters, usually combined with a number of control characters.
Code page 437 is the character set of the original IBM PC (personal computer), or DOS.
Code page 720 (also known as CP 720, IBM 00720, OEM 720) is a code page used under DOS to write Arabic.
Code page 737 (also known as CP 737, IBM 00737, OEM 737, MS-DOS Greek) is a code page used under DOS to write the Greek language.
Code page 850 (also known as CP 850, IBM 00850, OEM 850, DOS Latin 1) is a code page used under DOS and Psion’s EPOC16 operating systems in Western Europe.
Code page 852 (also known as CP 852, IBM 00852, OEM 852 (Latin II), MS-DOS Latin 2) is a code page used under DOS to write Central European languages that use Latin script (such as Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak or Slovene).
Code page 855 (also known as CP 855, IBM 00855, OEM 855, MS-DOS Cyrillic) is a code page used under DOS to write Cyrillic script.
Code page 857 (also known as CP 857, IBM 00857, OEM 857, MS-DOS Turkish) is a code page used under DOS to write Turkish.
Code page 858 (also known as CP 858, IBM 00858, OEM 858) is a code page used under DOS to write Western European languages.
Code page 860 (also known as CP 860, IBM 00860, OEM 860, DOS Portuguese) is a code page used under DOS to write Portuguese and it is also suitable to write Spanish and Italian.
Code page 861 (also known as CP 861, IBM 00861, OEM 861, DOS Icelandic) is a code page used under DOS to write the Icelandic language (as well as other Nordic languages).
Code page 862 (also known as CP 862, IBM 00862, OEM 862 (Hebrew), MS-DOS Hebrew) is a code page used under DOS for Hebrew.
Code page 863 (also known as CP 863, IBM 00863, OEM 863, MS-DOS French Canada) is a code page used under DOS to write French language (mainly in Quebec) although it lacks the letters Æ, æ, Œ, œ, Ÿ and ÿ.
Code page 865 (also known as CP 865, IBM 00865, OEM 865, DOS Nordic) is a code page used under DOS to write Nordic languages (except Icelandic, for which code page 861 is used).
Code page 866 (CP 866; Альтернативная кодировка) is a code page used under DOS and OS/2 to write Cyrillic script.
Code page 869 (CP 869, IBM 869, OEM 869) is a code page used under DOS to write Greek language.
CCSID 930 (sometimes known as CP930 or codepage 930) is one of several Japanese EBCDIC code pages created by IBM for representation of Japanese text.
Microsoft Windows code page 932 (abbreviated MS932, Windows-932 or ambiguously CP932), also called Windows-31J amongst other names (see § Terminology below), is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding.
Code page 950 is Microsoft's implementation of the de facto standard Big5.
In character encoding terminology, a code point or code position is any of the numerical values that make up the code space.
In communication, a code word is an element of a standardized code or protocol.
This article compares Unicode encodings.
Computation is any type of calculation that includes both arithmetical and non-arithmetical steps and follows a well-defined model, for example an algorithm.
Computer data storage, often called storage or memory, is a technology consisting of computer components and recording media that are used to retain digital data.
Computer science deals with the theoretical foundations of information and computation, together with practical techniques for the implementation and application of these foundations.
Content sniffing, also known as media type sniffing or MIME sniffing, is the practice of inspecting the content of a byte stream to attempt to deduce the file format of the data within it.
In computing and telecommunication, a control character or non-printing character is a code point (a number) in a character set, that does not represent a written symbol.
In computing, cross-platform software (also multi-platform software or platform-independent software) is computer software that is implemented on multiple computing platforms.
Cygwin is a Unix-like environment and command-line interface for Microsoft Windows.
The Cyrillic script is a writing system used for various alphabets across Eurasia (particularity in Eastern Europe, the Caucasus, Central Asia, and North Asia).
Data is a set of values of qualitative or quantitative variables.
A diacritic – also diacritical mark, diacritical point, diacritical sign, or an accent – is a glyph added to a letter, or basic glyph.
Extended Binary Coded Decimal Interchange Code (EBCDIC) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems.
IBM code page 37 is an EBCDIC code page with the full Latin-1 character set used in IBM mainframes.
Code page 1047 is an EBCDIC code page with the full Latin-1 character set.
Endianness refers to the sequential order in which bytes are arranged into larger numerical values when stored in memory or when transmitted over digital links.
An escape sequence is a series of characters used to change the state of computers and their attached peripheral devices, rather than to be displayed or printed as regular data bytes would be.
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
FIELDATA (also written as Fieldata) was a pioneering computer project run by the US Army Signal Corps in the late 1950s that intended to create a single standard (as defined in MIL-STD-188A/B/C) for collecting and distributing battlefield information.
file is a standard Unix program for recognizing the type of data contained in a computer file.
In mathematics, a function was originally the idealization of how a varying quantity depends on another quantity.
GB 18030 is a Chinese government standard, described as Information technology — Chinese coded character set and defines the required language and character support necessary for software in China.
GB2312 is the registered internet name for a key official character set of the People's Republic of China, used for simplified Chinese characters.
GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the People's Republic of China.
In typography, a glyph is an elemental symbol within an agreed set of symbols, intended to represent a readable character for the purposes of writing.
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC.
GB standards are the Chinese national standards issued by the Standardization Administration of China (SAC), the Chinese National Committee of the ISO and IEC.
Hans Carl Frederik Christian Schjellerup (February 8, 1827 – November 13, 1887) was a Danish astronomer.
The Hong Kong Supplementary Character Set (commonly abbreviated to HKSCS) is a set of Chinese characters – 4,702 in total in the initial release—used in Cantonese, as well as when writing the names of some places in Hong Kong (whether in written Cantonese or standard written Chinese sentences).
The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, and hypermedia information systems.
The International Business Machines Corporation (IBM) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries.
The IBM 1401 is a variable wordlength decimal computer that was announced by IBM on October 5, 1959.
The IBM 1620 was announced by IBM on October 21, 1959, and marketed as an inexpensive "scientific computer".
The IBM 700/7000 series is a series of large-scale (mainframe) computer systems that were made by IBM through the 1950s and early 1960s.
In Unix-like operating systems, iconv (an abbreviation of '''i'''nternationalization conversion) is a command-line program and a standardized application programming interface (API) used to convert between different character encodings.
Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India.
An integer (from the Latin ''integer'' meaning "whole")Integer 's first literal meaning in Latin is "untouched", from in ("not") plus tangere ("to touch").
International Components for Unicode (ICU) is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization.
International maritime signal flags refers to various flags used to communicate with ships.
ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an ISO standard (equivalent to the ECMA standard ECMA-35) specifying.
ISO/IEC 646 is the name of a set of ISO standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964.
ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or rather of ISO/IEC 646-IRV.
ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings.
ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No.
ISO/IEC 8859-10:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No.
ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001.
ISO/IEC 8859-13:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 13: Latin alphabet No.
ISO/IEC 8859-14:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 14: Latin alphabet No.
ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No.
ISO/IEC 8859-16:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 16: Latin alphabet No.
ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No.
ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No.
ISO/IEC 8859-4:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 4: Latin alphabet No.
ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988.
ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987.
ISO/IEC 8859-7:2003, Information technology — 8-bit single-byte coded graphic character sets — Part 7: Latin/Greek alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987.
ISO/IEC 8859-8, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings.
ISO/IEC 8859-9:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No.
JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language.
JIS X 0213 is a Japanese Industrial Standard defining coded character sets for encoding the characters used in Japan.
KOI-7 (КОИ-7) is a 7-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet.
KOI8-R (RFC 1489) is an 8-bit character encoding, designed to cover Russian, which uses a Cyrillic alphabet.
KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet.
KS X 1001 (Korean Graphic Character Set for Information Interchange), formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer.
The Latin alphabet or the Roman alphabet is a writing system originally used by the ancient Romans to write the Latin language.
In computing, a legacy system is an old method, technology, computer system, or application program, "of, relating to, or being a previous or outdated computer system." Often a pejorative term, referencing a system as "legacy" means that it paved the way for the standards that would follow it.
luit is a utility program used to translate the character set of a computer program so that its output can be displayed correctly on a terminal emulator that uses a different character set.
Mac OS Roman is a character encoding primarily used by the classic Mac OS to represent text.
Microsoft Windows is a group of several graphical operating system families, all of which are developed, marketed, and sold by Microsoft.
MIK (МИК) is a 8-bit Cyrillic code page used with DOS.
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email to support.
Mojibake (文字化け) is the garbled text that is the result of text being decoded using an unintended character encoding.
is a set of computer software and fonts for enhanced logogram word-processing.
Morse code is a method of transmitting text information as a series of on-off tones, lights, or clicks that can be directly understood by a skilled listener or observer without special equipment.
Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape.
A number is a mathematical object used to count, measure and also label.
The octet is a unit of digital information in computing and telecommunications that consists of eight bits.
In the Unicode standard, a plane is a continuous group of 65,536 (216) code points.
Punycode is a representation of Unicode with the limited ASCII character subset used for Internet host names.
--> Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1.
SIL International (formerly known as the Summer Institute of Linguistics) is a U.S.-based, worldwide, Christian non-profit organization, whose main purpose is to study, develop and document languages, especially those that are lesser-known, in order to expand linguistic knowledge, promote literacy, translate the Christian Bible into local languages, and aid minority language development.
The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language character blocks.
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable.
Tamil Script Code for Information Interchange (TSCII) is a coding scheme for representing the Tamil script.
A telegraph key is a switching device used primarily to send Morse code.
Telegraphy (from Greek: τῆλε têle, "at a distance" and γράφειν gráphein, "to write") is the long-distance transmission of textual or symbolic (as opposed to verbal or audio) messages without the physical exchange of an object bearing the message.
Transcoding is the direct digital-to-digital conversion of one encoding to another, such as for movie data files (e.g., PAL, SECAM, NTSC), audio files (e.g., MP3, WAV), or character encoding (e.g., UTF-8, ISO/IEC 8859).
TRON Code is a multi-byte character encoding used in the TRON project.
In writing and typography, a ligature occurs where two or more graphemes or letters are joined as a single glyph.
Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
The Universal Coded Character Set (UCS) is a standard set of characters defined by the International Standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS) (plus amendments to that standard), which is the basis of many character encodings.
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification.
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode.
UTF-32 stands for Unicode Transformation Format in 32 bits.
UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes.
A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation in a computer.
VSCII, also known as TCVN 5712:1993, ISO-IR-180, and Vietnamese Standard Code for Information Interchange is a set of three Vietnamese national standard character encodings for using the Vietnamese language with computers.
A web browser (commonly referred to as a browser) is a software application for accessing information on the World Wide Web.
Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s.
Windows-1250 is a code page used under Microsoft Windows to represent texts in Central European and Eastern European languages that use Latin script, such as Polish, Czech, Slovak, Hungarian, Slovene, Bosnian, Croatian, Serbian (Latin script), Romanian (before 1993 spelling reform) and Albanian.
Windows-1251 is a 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Bulgarian, Serbian Cyrillic and other languages.
Windows-1252 or CP-1252 (code page 1252) is a 1 byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages (other languages use different default encodings).
Windows-1253 is a Windows code page used to write modern Greek.
Windows-1254 is a code page used under Microsoft Windows to write Turkish.
Windows-1255 is a code page used under Microsoft Windows to write Hebrew.
Windows-1256 is a code page used to write Arabic (and possibly some other languages that use Arabic script, like Persian and Urdu) under Microsoft Windows. This code page is not compatible with ISO 8859-6 and MacArabic encodings.
Windows-1257 (Windows Baltic) is a single byte code page used to support the Estonian, Latvian and Lithuanian languages under Microsoft Windows.
Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts.
A writing system is any conventional method of visually representing verbal communication.
In computing, Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.
CDRA, Character Data Representation Architecture, Character Set, Character code, Character coding, Character coding system, Character encoding form, Character encoding scheme, Character encoding system, Character encodings, Character repertoire, Character set, Character sets, Charset, Charsets, Code character, Code unit, Coded Character Set, Coded character, Coded character set, Codeset, Convmv, File encoding, File encodings, IBM CDRA, IBM Character Data Representation Architecture, International character set, Legacy character set, Legacy encoding, Symbol set, Text encoding, Text encodings.