Computer Science/Computer Architecture

1. Computer Abstractions and Technology

728x90

Definition of Computer Architecture

A set of rules and methods which describe the functionality, organization, and implementation of computer systems.

→ Design the internals of computer systems

Definition of computer systems

A complete computer including the hardware, operating system, and peripheral equipment required for performing full operations.

💡

하드웨어, 소프트웨어를 아우르는 영역, 어플리케이션 레벨이 아니라 하드웨어 레벨에서 어떻게 다룰 것인 가에 관련한 소프트웨어 (ex: Operating system, Compiler, computer architecture)

Definition of computers

arithmetic or logic operations를 순차적으로 할 수 있는 기계 (automatically via computer programming. 즉 computer programming을 통해 기능을 바꿀 수 있다.)

Execute arithmetic or logic operations

컴퓨터가 이해할 수 있게끔 하는 arithmetic or logic operation으로 변환시켜줘야함 (컴파일러가 하는 역할) 그러기 위해서는 컴파일러가 instruction으로 어떻게 바꿔야할 것인가에 대한 약속을 정해야함.

→ instruction이 컴퓨터가 이해할 수 있는 유일한 언어

Execute the operations “automatically”

→ 중요한 지점 중 하나임, 컴퓨터가 알아서 다다닥할 수 있어야함.

→ 이런 것을 하기 위해서 논리회로 설계가 필요하게 됨

Traditional classes of computers

여러 개의 컴퓨터가 존재함.

Q : 근데 왜 여러 개로 나뉘어야 함?

A : 다양성을 무시하게 되면, 원하는 기능을 구현하기가 어려움. 예를 들어 데스크탑과 랩탑에서 원하는게 다름. 데스크탑의 경우 가벼운 것보다는 그래픽 처리가 더 우선시되고, 모바일 기기의 경우 표면 온도가 중요한 척도로 작용된다. 즉, 원하는 목표(performance)가 다양하기 때문에 여러 종류의 컴퓨터가 존재하는 것.

Personal Computers
Good performance to single users at low cost

Servers (기본적으로 혼자 쓰는 것이 아니라, 여러 명의 유저가 같이 사용하게 됨, 그리고 비싸기 때문에 최대한 많은 사람들이 사용가능하게끔 해야함)
Greater computing, storage, and input/output capacity

Supercomputers (서버보다도 한층 더 컴퓨팅 파워가 필요한 대상)
The high-end extreme of servers

Embedded Computers → mobile/embedded computers
Run one application or one set of related applications typically integrated with the hardware.
→ 한 사람이 쓴다는 측면에서 pc랑 성향이 비슷함. 온도가 지나치게 높아질 수 없다는 등의 차이가 pc와 달리 존재한다.

Our Focus : “ Make Programs Fast”

Concerned about the performanceof our applications.

Computer architecture 측면에서는 어떤 명령어들을 실행하는데 시간이 오래 걸리는지를 분석하는 것

→ 유독 시간이 많이 걸리는 명렁어(performance bottlenecks)들을 뽑아내서, 무슨 짓을 하길래 느리게 실행되는 것인지 분석

→ 해당 명령어들을 처리하는데 시간을 줄이기 위해 하드웨어를 어떻게 바꿔야할지 고민

Performance bottle neck의 변화

1960~1970 : PC들의 main memory 및 디스크의 크기가 작았음. 명령어들을 전부 main memory에 못올렸음. 그래서 이 시대에는 이 지점이 bottleneck으로 작용함

→ 이제는 이 부분이 해결되었고, processor가 병렬화되고 메모리의 계층적 구조가 주요 bottleneck으로 작용함.

성능 병목점은 하드웨어에 있을 수 있고, 소프트웨어에 있을 수 있음

하드웨어 측면 : 하드웨어가 잘못 만들어져서 특정 명령어를 효율적으로 처리를 못하는 상황

소프트웨어 측면 : 소프트웨어(특히 컴파일러)를 짠 사람이 cpu 명령어들을 잘 사용을 못하는 것. 다른 예시로는 소프트웨어가 코어나 쓰레드를 충분히 사용하지 않는 것.

어떤 프로그래밍 언어를 사용하는 지에 따라서도 성능에 영향을 줄 수 있다.

Eight Great Ideas in Computer Architecture

Moore’s Law

Integrated circuit resources double every 18-24 months.

→ 한정된 칩에 담을 수 있는 트랜지스터의 개수(트랜지스터의 집적도)가 18-24개월마다 2배씩 증가하더라

→ 이에 따라 20%정도 성능이 계속 향상되어옴.

Abstraction

lower-level에 대한 detail은 숨기고 higher-level의 형태로 추상화시키는 것

Make the common Case Fast (Amdahl’s Law)

*Important

If functions A and B take 80% and 20% of the time, focus on function A rather than function B.

→ 시간이 오래걸리는 것을 최적화해야 한다. (Make the common case fast)

Parallelism

여러 cpu들을 잘 써먹으려면 그게 가능하게끔 소프트웨어가 잘 짜여져야 한다.

Pipelining

*important

divide laundry into three steps

서로 다른 명렁어들의 실행을 겹쳐서, 전체 걸리는 시간을 줄이고자 하는 것.

Prediction

확실히 알 때까지 기다리는 것보다는 일반적으로 추측을 하고 진행을 하는 것이 더 빠른 경우가 존재한다.

Hierarchy of Memories

L1 cache, L2 cache, DRAM, storage로 가는 흐름을 생각해봤을 때 user로 하여금 최대한 느린 속도를 체감하지 못하게끔하려면 메모리를 계층화해야한다.

Dependability via Redundancy

Redundant한 component를 포함해서 신뢰성을 높이고, 실패로부터 복구할 수 있게끔 (RAID 생각)

Below Your Program

Abstracting a computer System

Application software

High-level software running on top of the systems software

ex) web browsers, game

System software

Low-level software for executing the application software on the hardware

Compiler: CPU의 명령어들로 바꾸는 software

Os : CPU의 명령어들을 어떻게 실행시킬 것인지를 관리

ex) Operating systems, compilers

Hardware

The unerlying hardware that execute

Operating systems & compilers

Two types of system software tightly coupled to the underlying hardware

Operating systems

Interface between user applications and the hardware

Compilers

Translates a program written in a high-level language into instructions

Compiling a User Program

Assembly language

CPU마다 Assembly language가 다름

Utilizes symbolic notations to abstract the machine language

Binary machine language

Only consists of bits

The language electronic hardware can understand

→ 직접적으로 cpu가 이해할 수 있는 유일한 언어

The Hardware/Software Interface

Instruction set Architectures(ISA)

하드웨어와 lowest-level software 사이의 인터페이스를 담당

사실상 binary machine language(assembly language)를 정의하는 것. 추가적으로 binary machine language와 어셈블리코드가 1-1대응이 됨 (사실상 0101으로 코딩한 것과 같음)

Application Binary Interfaces (ABI)

OS가 I/O나 메모리 할당, 혹은 low-level system function을 abstraction해서 제공한다. 이러한 운영 체제에 의해서 제공되는 기본적인 instruction set을 ABI라고 한다.

💡

system call과 관련이 되어있다.

How do we store data?

Volatile memory

전원을 껐을 때 내용물이 다 날아감. 대표적인 예시가 DRAM (Dynamic Random Access Memory)

데이터 접근 속도가 빠르지만(몇 십 나노sec), 작고 비쌈

프로세스는 주로 임시로 volatile memory에 저장해 놓고, 필요할 때 사용하게 됨. 따라서 main/primary memory라고 부름.

Non-volatile memory (Secondary memory)

대표적인 예시가 HDD, SSD

크고 비싸지만, 느림(몇 십 마이크로 sec : volatile memory보다 1000배 정도 느림)

Manufacturing Integrated circuits

An integrated Circuit consists of transistors

Performance

서로 다른 cpu들의 성능을 어떻게 비교할 것인가.

Defining Performance

Execution time(Wall clock time) (단위 : sec)
The total time required for a computer to complete a task.
Useful for single-user desktops

Throughput (or bandwidth) : 대역폭 / 단위시간당 처리량 (단위 : instruction / sec)
The total amount of instructions done in a given time.
단위 시간당 cpu가 처리할 수 있는 instruction의 개수
서버에 적합한 metric

💡

execution time은 하나의 프로그램을 기준으로 측정하는 것이고, throughput은 프로세스가 여러개여도 상관이 없음.

Computing the performance

Let’s say fast execution time == high performance

Performance_X = \frac{1}{\text{Execution Time}_x}

So, X is faster than Y == The execution time on Y is longer than that on X.

💡

Throughput을 metric으로 두고 싶은 경우

Performance_X = \text{Through put}_X

Example

X is n times faster than Y if

\frac{Performance_X}{Performance_Y} = n \\

\frac{Performance_X}{Performance_Y} = \frac{\text{Execution Time}_Y}{\text{Execution Time}_X} = n

CPU execution time

Problem : Multiple tasks run in parallel even on single-user PCs.

즉, 해당 프로그램만 실행하는 것이 아니기 때문에, 해당 프로그램을 완료되는데까지 걸리는 시간에 영향을 주게 된다. (즉 execution time으로 performance를 정의하는 것은 문제가 있다는 것을 인식한 것)

따라서 CPU execution time이라는 개념을 정의.

Abstraction

CPU execution time

User CPU time

System CPU time

CPU execution time (CPU time)

성능을 재고 싶은 프로세스만을 실행시키는 데 걸리는(쓴) 시간

이거를 더 나눌 수 있음

1. User CPU time

실제로 application 코드를 실행시키는데 걸리는 시간. 순수하게 어플리케이션 코드를 돌리는 시간

→ 운영체제 관점으로는 user space 상에서 cpu가 돌고 있는 것

2. System CPU time

OS가 뭘 하는 시간. 프로세스를 실행시키기 위해 os가 무슨 일을 수행해야한다고 하면, 이 시간은 system cpu time에 들어가게 됨. (kernel이 잡고 있는 시간으로 생각해도 무방할 듯)

→ 운영체제 관점으로는 kernel space 상에서 cpu가 돌고 있는 것

주의

만약 네트워크 프로그램을 짰을 때

os에서 네트워크 패킷을 처리하는데 걸리는 시간이 요구되는데, 일반적으로 패킷 처리를 하는데 걸리는 시간이 굉장히 더 오래걸린다.

이러한 소프트웨어의 경우 (os측면에서 코드를 돌리기 위해 해야하는 양이 많은 경우) user와 system time 모두를 고려하는 것이 타당하다.

Re-defining performance by using clock

디지털 회로의 경우 초로 세는 것보다는 clock를 기준으로 measure하는 것이 타당하다. (왜냐하면 digital signal의 경우 clock을 기준으로 움직이기 때문이다.)

→ 따라서 clock period/clock cycle time(주기)가 중요하게 되는 것

💡

직관적으로 clock period는 clock이 한번 오는데 까지 걸리는 시간

그리고 clock rate/frequency와는 구분해야 한다.

(clock rate/frequency : 1초에 몇번 clock cycle이 있는지 )

💡

clock rate는 진동수이고, 1초당 얼마나 clock이 오는지를 나타낸다. clock period와는 역수 관계이다. (ex : 1GHz)

\text{CPU execution time} = \text{CPU clock cycle} \times \text{CPU cycle time} \\ = \text{CPU clock cycles} / \text{Clock rate}

→ 상당히 직관적이다. CPU execution time은 당연히 소요된 clock의 개수와 해당 clock을 돌리기 위해서 필요한 시간을 곱해주면 된다.

💡

진동수 (clock rate)로 나온 경우는 무조건 역수를 취해주면 헷갈리지 않는다.

Example

Our program runs in 10 seconds on computer A having a 2GHz clock.

Q : How many clock cycles does the program take?

CPU time

💡

주의 : 변환할 때 G는

10^9

, M는

10^6

, K는

10^3

으로 변환할 것 (진동수 표기할 때 사용)

💡

주의 : ms는

10^{-3}

\mu

s는

10^{-6}

, ns는

10^{-9}

, ps는

10^{-12}

(주기 표기할 때 사용)

위 공식을 이용하여 무조건 초 단위로 맞출 것

Adding Instructions to Performance

명령어를 실행시간을 계산할 때 고려하겠다는 것. 프로그램은 결국 n개들의 명령어로 구성됨. 명령어를 실행시키는데 얼마나 걸렸는지 판단하는 것이 조금 더 적절한 성능 평가수단일 수 있다.

Cycle Per Instruction : CPI (중요)

명령어 당 몇 사이클이 걸렸는지 (낮을수록 좋음)

$\text{CPU clock cycle}=\text{Instruction Count}\times \text{Average CPI}$

왜 Average를 고려하는 것인지?
A : 덧셈 명령어에 비해 부동소수점 명령어가 하드웨어 입장에서 clock cycle이 더 요구된다. 그래서 Average를 치는 것. (단순히 명령어 1개만 고려하지는 않음)

The Classic Performance Equation

1st equation

\text{CPU execution time} = \text{cycle num}\times \text{cycle time} \\ = \text{instruction count} \times \text{CPI}\times \text{cycle time}

2nd equation

\text{CPU execution time} = \frac{\text{total time}}{program} \\ = \frac{\text{clock cycles}}{program} \times \frac{\text{total time}}{\text{clock cycles}} \\ =\frac{instruction}{program}\times\frac{\text{clock cycles}}{instruction}\times\frac{\text{total time}}{\text{clock cycles}}

1번째 요인 : Compiler 최적화와 관련 (instructions/program)

2번째 요인 : Architecture의 레벨과 관련. 특히 디지털 회로를 만들어내는 사람의 역량과 관련. 하드웨어를 잘만들면 줄음

3번쨰 요인 : Clock cycle은 전기과가 담당.

💡

ISA를 바꿔서 instruction의 개수를 줄여도 slower clock cycle을 유발하거나 CPI가 높아질 수 있다. 따라서 그래서 시간을 기준으로 performance를 비교해야 한다.

The Power Wall & The Switch to Multiprocessors

Power consumption이 커지면 발열이 커짐. 그래서 Multiprocessor로 옮겨가는 것.

clock frequency를 줄이고, multiprocessor를 늘리기 시작

최근에는 반도체 공정이 좋아져서 clock frequency를 늘려도 발열이 잡힐 수 있음

(35페이지는 굳이 알 필요는 없음)

Uniprocessor까지는 clock frequency를 올리는 것이 목표

하지만, 발열 이슈로 인해 clock frequency를 올리는 것이 한계가 있음을 직감

multiprocessor 형태로 나아가게 됨.

(이 때문에 성능 향상폭이 줄고, 또한 코더 입장에서도 게속 최적화를 시켜주어야 한다.)

Fallacies & Pitfalls

Make the Common Case fast

“Improving one aspect of a computer would increase the overall performance for the computer!”

→ No! It’s related with Amdahl’s Law

Improvement = \frac{\text{Execution time affected by improvement}}{\text{Amount of improvement}} + \text{Execution time unaffected}

How About Using MIPS?

70~80 년도에 유행했던 척도 Million Instructions Per Second (MIPS)

MIPS = \frac{\text{Instcution count}}{\text{Execution Time} \times 10^6}

Faster computers have higher MIPS ratings (초당 몇 백만개의 명령어를 처리할 수 있는지)

Three problems
1. Doesn’t consider the capabilities of the instructions. (각 instruction마다 성능이 다름)
1. Varies between programs on the same computer
1. Can vary independently from performance.

결론 : 성능 척도가 원하는 것을 잘 반영하는 지를 잘 고려해야 한다.

Contents

새소식

인기 검색어