Blog Archive

Monday, May 15, 2023

05-14-2023-1945 - Hexadecimal floating point (now called HFP by IBM) ; Unicode ; Universal_Coded_Character_Set

Hexadecimal floating point (now called HFP by IBM) is a format for encoding floating-point numbers first introduced on the IBM System/360 computers, and supported on subsequent machines based on that architecture,[1][2][3] as well as machines which were intended to be application-compatible with System/360.[4][5]

In comparison to IEEE 754 floating point, the HFP format has a longer significand, and a shorter exponent. All HFP formats have 7 bits of exponent with a bias of 64. The normalized range of representable numbers is from 16−65 to 1663 (approx. 5.39761 × 10−79 to 7.237005 × 1075).

The number is represented as the following formula: (−1)sign × 0.significand × 16exponent−64

https://en.wikipedia.org/wiki/IBM_hexadecimal_floating-point

Unicode, formally The Unicode Standard,[note 1][note 2] is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters[3][4] covering 161 modern and historic scripts, as well as symbols, thousands of emoji[5] (including in colors), and non-visual control and formatting codes. 

https://en.wikipedia.org/wiki/Unicode

The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS) (plus amendments to that standard), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

The UCS has over 1.1 million possible code points available for use/allocation, but only the first 65,536, which is the Basic Multilingual Plane (BMP), had entered into common use before 2000. This situation began changing when the People's Republic of China (PRC) ruled in 2006 that all software sold in its jurisdiction would have to support GB 18030. This required software intended for sale in the PRC to move beyond the BMP.

The system deliberately leaves many code points not assigned to characters, even in the BMP. It does this to allow for future expansion or to minimise conflicts with other encoding forms.

The original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP. A range of code points in the S (Special) Zone of the BMP remains unassigned to characters. UCS-2 disallows use of code values for these code points, but UTF-16 allows their use in pairs. Unicode also adopted UTF-16, but in Unicode terminology, the high-half zone elements become "high surrogates" and the low-half zone elements become "low surrogates".[clarification needed]

Another encoding, UTF-32 (previously named UCS-4), uses four bytes (total 32 bits) to encode a single character of the codespace. UTF-32 thereby permits a binary representation of every code point in the APIs, and software applications. 

https://en.wikipedia.org/wiki/Universal_Coded_Character_Set



No comments:

Post a Comment