Computer Science/Database

5. Normalization

  • -
728x90
반응형

Issues with DBMS Design

  • Data integrity failure : e.g. key doesn’t exists
💡
앞 장에서 다룬 integrity constraint의 경우에는 위의 data integrity가 깨지지 않는 것을 보장하는 것이고, normalization은 data integrity가 보존되게끔 data base를 design한다는 의미라고 이해하면 된다.

Normalization

  • The process of breaking bad relations/tables smaller good relations

    → leads to a better DB design

  • Avoids updating, deleting, and inserting anomalies
  • The normalization process involves achieving different normal forms step by step
    • 1NF (First Normal Form)
    • 2NF (Second Normal Form)
    • 3NF (Third Normal Form)
💡
The more normal form we process, the better the relational database we produce
  • 2 Tools for achieving 3 different Normal Forms
    • Key : what is the key/PK of a relations?
    • Functional Dependency

Functional Dependency

  • A set of attributes X functionally determines a set of attributes Y if the values of X determine a unique value for Y
    • Simple saying, X determines Y; written as X->Y
    • Possible FD: Text → Course, [what is special about Text?]
    • Not FDs : Teacher → Course, Teacher → Text, Couse → Text
    • Does Text→ Teacher? : Not possible

    → 반드시 X가 primary key일 필요는 없다. 물론 primary key면 당연히 만족한다.

💡
쉽게 말해서 XX를 domain으로 하게 끔하는 함수를 잡을 수 있냐는 것이다. 만약 이게 가능하면 YY is functionally dependent from XX라고 한다.
💡
추가적으로 multiple column이 다른 multiple column에 functional dependency가 있는지 여부를 논의하는 것도 가능하다.

Example

B → C, C → B

1NF

  1. Composite Attributes or Attributes with different data types are not allowed in a relation
  1. A relation must have a primary key
  1. Nested relations are not allowed
  1. Multivalued Attributes is not allowed

Summary

  1. Must have a primary key
  1. No Composite attribute
  1. No Multivalued attribute
  1. No Nested relationship

Issues with DBMS Design

Redundant Information in Tuples

Composite PK = (Player_ID, Item_Type)

Delete Anomaly : what if we delete ‘gilal9’?

Update Anomaly : what if we change Plyaer_Rating of ‘jdog21’?

Insert Anomaly : what if we insert ‘tine42’ with Player_rating = Beginner?

Delete Anomaly

Delete anomalies occur when deleting a certain row inadvertently leads to the deletion of other important data. For example, if a department is shut down and all rows containing that department are deleted, the data of employees working solely in that department will also be deleted.

Example : what if we delete ‘gilal9’?

The Player_Rating of gilal9 (i.e., Beginner) will be deleted, which is
not the case for other players.

💡
gilal9는 다른 player와 달리 완전히 지워지는 문제점이 발생하는 것이다.

Update Anomaly

Update anomalies occur when the same data is repeated in multiple rows, and changes are made in some but not all instances. For example, if an employee's address changes and the update is made in one row but not in others, the database will contain inconsistent data.

Example : what if we change Player_Rating of ‘jdong21’?

→ The Player_Rating might display two different ratings

💡
업데이트 할거면 동시에 업데이트 해야되는 정보이다.

Insert Anomaly

Insert anomalies occur when certain attributes cannot be inserted into the database due to missing additional data. For example, if a new employee is not assigned a department, their data cannot be inserted into the table if the department field does not allow null values.

Example : what if we insert ‘tine42’ with Player_Rating = Beginner?

→ Not possible as part of PK (Item_Type) is missing

💡
즉, primary key가 채워지지 않은 것이 존재하는 경우 insert를 할 수 없다.

2NF

  • 1NF + Each non-key attribute must be depended on the entire primary key : avoid Partial Dependencies

Functional Dependency :

{Player_ID, Item_Type} → {Item_Quantity} : in 2NF

{Player_ID} → {Player_Rating} : not in 2NF

How to achieve 2NF?

  1. Draw the Functional Dependencies
    💡
    In this relation, the primary keys are Ssn and Pnumber
  1. Decompose it into different relations

3NF

Think about this case.

What if we UPDATE Player_Skill_Level of jdog21 from 4 to 8?

It could cause some inconsistency

💡
이러한 문제는 Transitive Dependency 때문에 발생한다. 즉 non key attribute가 다른 non key attribute에 dependency를 가지는 경우 이러한 문제가 발생하게 된다.

Solution

  • 2NF + Every non-key attribute depends on only the Key

    → Avoid Transitive Dependency

💡
즉, Key에만 의존하게끔 만드는 것이다.

How to achieve 3NF?

  • Draw Functional Dependencies and find the Transitive Dependencies
  • Decompose Transitive dependency into different relations

Conclusion

  • 1NF : All attributes depends on the key
  • 2NF : All attributes depend on the whole key
  • 3NF : All attributes depend on nothing but the key

Formal Normalization Jargons

  • Candidate key: an attribute (or multiple attributes) uniquely identifies a row
  • Prime Attributes: an attribute belongs to at least one candidate key
  • Non-prime attribute: an attribute that doesn’t belong to any candidate key

  • Candidate key : {Player_ID, Item_Type}
  • Prime Attributes : (1) Player_ID, (2) Item_Type
  • Non-prime attributes : (1) Item_Quantity, (2) Player_Rating

Formal Definition of 2NF and 3NF

  1. 2NF : A relation schema R is in second
    normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
  1. 3NF : A relation schema R is in third
    normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
반응형
Contents

포스팅 주소를 복사했습니다

이 글이 도움이 되었다면 공감 부탁드립니다.