Computer Science/Scientific Computing

2. IEEE Standard

728x90

IEEE standard (IEEE 754)

Single precision

$\mathbb F(2, 24, -126, 127)$ , each number is stored in 32 bits (4 bytes)

💡

precision은 24인데 mantissa는 23인 이유는 hidden bit때문이다

💡

L : 1 - 2^{8}

H : 2^8 - 1

(

2^8

은 NaN이나 infinity를 판단하는 용도로 특별히 사용된다)

Double precision

$\mathbb F(2, 53, -1022, 1023)$ , each number is stored in 64 bits (8 bytes)

💡

precision은 53인데 mantissa는 52인 이유는 hidden bit때문이다

💡

L : 1 - 2^{11}

H : 2^{11} - 1

(

2^{11}

은 NaN이나 infinity를 판단하는 용도로 특별히 사용된다)

Biased exponent

추가적으로 exponent area에 들어가는 값은 biased 된 값이다. (즉 exponent are에 들어가는 값은 unsigned integer임에 유의해야 한다.)

\begin{cases} e' = e + 127 & (single) \\ e' = e + 1023 & (double)\end{cases}

💡

127은

(2^8 - 2) / 2

에서 유래한 값이고, 1023은

(2^{11} - 1) / 2

에서 유래한 값이다. 지수 중 2개는 예약되어 subnormal 형태나 NaN, infinity 형태로 특수하게 사용되기 때문이다.

Representation of numbers in IEEE standard

Floating-point Arithmetic

Addition or subtraction : Shifting mantissa to make exponents match may cause loss of some digits of smaller number
💡
Exponent가 큰 수 기준으로 exponent를 조정한다.

Multiplication : Product of two $p$ -digit mantissa contains up to $2p$ digits, so result may not be representable

Division : Quotient of two $p$ -digit mantissa may contain more than $p$ digits, such as nonterminating binary expansion of $1/10$

💡

즉 Floating-point 끼리 연산을 수행하는 과정에서 오류가 발생할 수 있다.

Example

Guard Digit

💡

즉, guard digit를 사용했을 때 relative error를 줄일 수 있다.

Cancellation error

Cancellation error (losing significant digits) occurs in the subtraction of two close numbers

💡

즉, 초반부 소수점들을 표현하는 정보를 사용하지 못함으로써 발생하는 error라고 생각해주면 된다.

Example 1

Example 2 : Approximation of $\pi$ in double precision

이렇게 되는 이유는 위의 빨간색이 빼기가 있기 때문이다.

k가 커짐에 따라서 $2^{-k}p_{k - 1}$ 이 0으로 수렴하고 결과적으로 매우 가까운 2개의 수를 빼는 것으로 수렴한다. 두 식은 수학적으로는 동일하지만, 컴퓨터로 처리하는 과정에서 큰 차이를 발생시킨다.

💡

파란색이 converge하는 이유는

u

로 수렴하기 때문이다.

Conditioning and Stability

The input data of a model (problem) is always subjected to some perturbations (like as roundoff errors)

💡

즉, 컴퓨터는 숫자를 정확히 표현을 할 수 없기 때문에 input을 입력하는 과정에서 perturbation이 들어갈 수 밖에 없다.

Therefore, the fundamental question then is “How much the solution of the perturbed problem is close to the solution of the original problem?”

💡

즉, input의 perturbation이 존재하더라도 output이 조금만 변했으면 좋겠다는 것이다.

well-conditioned : sensitivity is low (즉 input에 noise가 들어가도 output이 크게 변동되지 않는다는 의미이다.)

ill-conditioned : sensitivity is high (input에 noise가 들어가면 output이 크게 변동된다는 의미이다)

If we replace the term “problem” with “algorithm” then we say that the algorithm is either stable or unstable

Example : Hilbert matrix

💡

즉, Hilbert matrix는 ill-conditioned한 것의 대표적인 예시이다.

Condition number of a matrix

Consider the system $Ax = b$ and the perturbed system $A(x + \delta x ) = b + \delta b$ . We get $\delta x = A^{-1}(\delta b)$ (Assuming perturbation only in $b$ for simplicity)

Take the norm from both sides

\|\delta x\|\le \|A^{-1}\|\|\delta b\|

From original system $Ax = b$ we have $\|b\| \le \|A\|\|x\|,$ or

\frac{1}{\|x\|}\le \frac{\|A\|}{\|b\|}

Multiplying both sides of recent equations we get

\frac{\|\delta x\|}{\|x\|} \le \|A\|\cdot\|A^{-1}\frac{\|\delta b\|}{\|b\|} = |\delta| (\|A\|\cdot \|A^{-1}\|)

At the worst case the error in the output is amplified by $\|A\| \cdot \|A^{-1}\|$ . So the condition number of A is defined as

\text{cond}(A) := \|A\|\cdot \|A^{-1}\|

💡

Vandermonde matrix 도 ill-conditioned matrix이다.

💡

만약

A

가 non-singular해서 invertible하지 않으면 condition number는 SVD를 활용해서 정의된다. 해당하는 내용은 나중에 다룬다.

Contents

당신이 좋아할만한 콘텐츠

1. Fundamentals of Computing 2023.09.04

새소식

인기 검색어

2. IEEE Standard

IEEE standard (IEEE 754)

Single precision

Double precision

Biased exponent

Representation of numbers in IEEE standard

Floating-point Arithmetic

Example

Guard Digit

Cancellation error

Example 1

Example 2 : Approximation of $\pi$ in double precision

Conditioning and Stability

Example : Hilbert matrix

Condition number of a matrix

당신이 좋아할만한 콘텐츠

티스토리툴바

2. IEEE Standard

IEEE standard (IEEE 754)

Single precision

Double precision

Biased exponent

Representation of numbers in IEEE standard

Floating-point Arithmetic

Example

Guard Digit

Cancellation error

Example 1

Example 2 : Approximation of @import url('https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.2/katex.min.css')π\piπ﻿ in double precision

Conditioning and Stability

Example : Hilbert matrix

Condition number of a matrix

당신이 좋아할만한 콘텐츠

티스토리툴바

Example 2 : Approximation of $\pi$ in double precision