Index
Motivation
MySQL database, 10,000,000 people
- How long does it take to find name and phone number for person with ID 4,857,845?
- How long does it take to find the phone number for all people named Kristin Elvik?
The result of the second experiment is very slow. Why is it slow and what can we do about it?
Explanation
The slow query from the experiment performs a full table scan
each row in the table is read and checked if it meets the condition
Index
Index is a persistent data structure which can be used to quickly locate rows in a table based on the values of one or several given attributes.
Cost of an Index
- extra storage space (usually not a problem)
- Index creation might take time (but it is done only once)
- Index maintenance - index must be updated when the data in the table changes
Creating and Dropping Indexes
- Unique index doesn’t allow duplicate values
- Indexes on the primary key attributes are created automatically
→ Some RDBMS (e.g. MySQL) create indexes on the foreign key attributes automatically as well
Different SQL syntax for dropping an index in different RDBMS
Experiment revisited
How does an index work?
Two main data structures being used for indexes
- B and B+ trees : useful for conditions with =,<,≤,>,≥ and BETWEEN
- Hash tables : useful for conditions with = only
Clustered vs Non-clustered Indexes
- Non-clustered index
- Physical order of the rows is not the same as the index order
- Leaf level of the index points to where the data is stored
- Clustered index
- Physical order of the rows is the same as the index order
- Used for indexing of primary keys
- There might only be one clustered index per table
- Fast retrieval of data when multiple rows match the condition
Deciding what to Index
- Which queries and insert/update/delete statements will be executed?
- How often?
- What are the time constraints?
- How many rows are in the relevant tables?
Quiz 1
Answer : NO
Quiz 2
Answer : 3 & 4
Other methods of performance optimization
- Denormalization
- Storing derived attributes
- Storing summary statistics
- Vertical partitioning
- Horizontal partitioning
- Hardware tuning
Transaction
Motivation
Think about the following SQL
If
- The first SQL gets executed, followed by a software, hardware or network failure
- Both SQLs get executed, but the hardware failure occurs before the changes are permanently stored on a storage device.
Transaction
a series of operations that need to be executed as a single unit
of work and that transforms the database from one consistent state to another
Failure (or a user’s decision to abort the transaction) must be treated as if the transaction never happened.
- Commit : making the “effects of the transaction” permanent (successful executing the transaction)
- Rollback : aborting the transaction (because of a failure or user’s decision)
Example
Jan has a subscription from Netflix which withdraws 100 SEK each month from Jan’s checking account.
What can go wrong if this transfer happens to be executed at the same time as the transfer of money to Jan’s savings account?
What are the possible balances at Jan’s checking account after executing both transfers?
One of possible schedules:
Lost update problem
The update of a in T1 is lost, a is overwritten by T2 which read the state before updating and storing a in T1.
Concurrency
Databases are usually accessed by many clients at the same time.
To avoid problems with concurrent execution, a transaction schedule must be serializable
.
→ Execution of concurrent transaction must be equivalent to a serial execution of the transactions.
ACID
a set of required properties of transactions
- Atomicity : all statements in a transaction are executed or none
- Consistency : all database constraints are satisfied after a transaction is executed
- Isolation : the result of executing concurrent transaction is the same as if the transactions were executed serially
- Durability : once a transaction finishes, effects of the transaction are permanently stored in the database.
Transactions in SQL
- To start a transaction : START TRANSACTION or BEGIN
- To commit the transaction : COMMIT
- To roll back the transaction : ROLLBACK
Example
Isolation Levels
The ACID isolation property is often relaxed to reduce the overhead and increase the concurrency.
There are weaker isolation levels that allow some serializability violations in order to achieve higher performance.
Isolation level is set per transaction and it reflects what read phenomena might occur in the current transaction
Read phenomena :
- Dirty reads
- Non-repeatable reads
- Phantom reads
- Default level for InnoDB in MySQL is
REPEATABLE READ
Dirty reads
Non-repeatable read
phantom read
Quiz 1
Answer : 1
Quiz 2
Answer : 1, 2
Phantom read에 의해서 1, 2가 발생할 수 있다. 추가적으로 REPEATABLE READ의 경우에는 PHANTOM READ만 가능하므로 데이터가 느는 상황만 가능하다. 따라서 3은 아니다.
마지막으로 격리 수준이 트랜잭션의 중첩된 쿼리의 허용 여부에 영향을 미치지는 않는다. nested query의 사용 가능성은 데이터베이스 시스템의 구성 및 쿼리 구조에 따라 달라지는 것이다.