앞 장에서 다룬 integrity constraint의 경우에는 위의 data integrity가 깨지지 않는 것을 보장하는 것이고, normalization은 data integrity가 보존되게끔 data base를 design한다는 의미라고 이해하면 된다.
Normalization
The process of breaking bad relations/tables smaller good relations
→ leads to a better DB design
Avoids updating, deleting, and inserting anomalies
The normalization process involves achieving different normal forms step by step
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
💡
The more normal form we process, the better the relational database we produce
2 Tools for achieving 3 different Normal Forms
Key : what is the key/PK of a relations?
Functional Dependency
Functional Dependency
A set of attributes X functionally determines a set of attributes Y if the values of X determine a unique value for Y
Simple saying, X determines Y; written as X->Y
Possible FD: Text → Course, [what is special about Text?]
Not FDs : Teacher → Course, Teacher → Text, Couse → Text
Does Text→ Teacher? : Not possible
→ 반드시 X가 primary key일 필요는 없다. 물론 primary key면 당연히 만족한다.
💡
쉽게 말해서 X를 domain으로 하게 끔하는 함수를 잡을 수 있냐는 것이다. 만약 이게 가능하면 Y is functionally dependent from X라고 한다.
💡
추가적으로 multiple column이 다른 multiple column에 functional dependency가 있는지 여부를 논의하는 것도 가능하다.
Example
B → C, C → B
1NF
Composite Attributes or Attributes with different data types are not allowed in a relation
A relation must have a primary key
Nested relations are not allowed
Multivalued Attributes is not allowed
Summary
Must have a primary key
No Composite attribute
No Multivalued attribute
No Nested relationship
Issues with DBMS Design
Redundant Information in Tuples
Composite PK = (Player_ID, Item_Type)
→ Delete Anomaly : what if we delete ‘gilal9’?
→ Update Anomaly : what if we change Plyaer_Rating of ‘jdog21’?
→ Insert Anomaly : what if we insert ‘tine42’ with Player_rating = Beginner?
Delete Anomaly
Delete anomalies occur when deleting a certain row inadvertently leads to the deletion of other important data. For example, if a department is shut down and all rows containing that department are deleted, the data of employees working solely in that department will also be deleted.
Example : what if we delete ‘gilal9’?
The Player_Rating of gilal9 (i.e., Beginner) will be deleted, which is not the case for other players.
💡
gilal9는 다른 player와 달리 완전히 지워지는 문제점이 발생하는 것이다.
Update Anomaly
Update anomalies occur when the same data is repeated in multiple rows, and changes are made in some but not all instances. For example, if an employee's address changes and the update is made in one row but not in others, the database will contain inconsistent data.
Example : what if we change Player_Rating of ‘jdong21’?
→ The Player_Rating might display two different ratings
💡
업데이트 할거면 동시에 업데이트 해야되는 정보이다.
Insert Anomaly
Insert anomalies occur when certain attributes cannot be inserted into the database due to missing additional data. For example, if a new employee is not assigned a department, their data cannot be inserted into the table if the department field does not allow null values.
Example : what if we insert ‘tine42’ with Player_Rating = Beginner?
→ Not possible as part of PK (Item_Type) is missing
💡
즉, primary key가 채워지지 않은 것이 존재하는 경우 insert를 할 수 없다.
2NF
1NF + Each non-key attribute must be depended on the entire primary key : avoid Partial Dependencies
Functional Dependency :
{Player_ID, Item_Type} → {Item_Quantity} : in 2NF
{Player_ID} → {Player_Rating} : not in 2NF
How to achieve 2NF?
Draw the Functional Dependencies
💡
In this relation, the primary keys are Ssn and Pnumber
Decompose it into different relations
3NF
Think about this case.
What if we UPDATE Player_Skill_Level of jdog21 from 4 to 8?
It could cause some inconsistency
💡
이러한 문제는 Transitive Dependency 때문에 발생한다. 즉 non key attribute가 다른 non key attribute에 dependency를 가지는 경우 이러한 문제가 발생하게 된다.
Solution
2NF + Every non-key attribute depends on only the Key
→ Avoid Transitive Dependency
💡
즉, Key에만 의존하게끔 만드는 것이다.
How to achieve 3NF?
Draw Functional Dependencies and find the Transitive Dependencies
Decompose Transitive dependency into different relations
Conclusion
1NF : All attributes depends on the key
2NF : All attributes depend on the whole key
3NF : All attributes depend on nothing but the key
Formal Normalization Jargons
Candidate key: an attribute (or multiple attributes) uniquely identifies a row
Prime Attributes: an attribute belongs to at least one candidate key
Non-prime attribute: an attribute that doesn’t belong to any candidate key