SQL Server Storage Internals Part 1 – Basics

While learning query tuning topics, I got a chance to understand anatomy of data page and different types and levels of Index pages.

This learning helped me to understand concepts better and to give better performance tuning solutions later.

This is my attempt to share my learning with beginners by using some simple scripts in step by step approach.

Suggest all SQL Server DBAs to spend sometime to understand anatomy of different types of Pages. This would help DBAs to take better decisions when DBCC CHECKDB throws warning messages, to provide better performance tuning solutions etc.

While writing an article in this topic, Its unavoidable to refer some Undocumented commands. I strongly recommend beginners to try all given scripts only in local or development enviornments.

Those who are new to system tables and SQL Server 2005/2008 System views,
take some time to refer “Mapping SQL Server 2000 System Tables to SQL Server 2005 System Views” topic in below url.

http://msdn.microsoft.com/en-us/library/ms187997(SQL.90).aspx

Understaning system tables (SQL 2000) or system views (SQL 2005/2008) always helps you to make your day to day work easier.

This article is to make beginners comfortable in SQL Server storage internals. I ‘ve delibrately avoided some accuracies in some places for better understanding.

Without wasting time, lets quickly jump into the topic.

To test concepts related to storage internals, lets create a database named LearningInternals

CREATE DATABASE LearningInternals

GO

USE LearningInternals
GO

Some key points on data and log files:

A database have three types of files:

  1. Primary data file (.MDF) – Every database has one primary data file and is the starting point of database. All user tables are created in PRIMARY filegroup if no other secondary files are present)
  2. Secondary data file (.NDF) – A database can have optionally zero or more Secondary data files, created for better performance and to manage tables efficiently.
  3. Transaction Log file (.LDF) – A database by default will have one log file, used to record changes made in tables. This can be backed up and used later in case of data loss.

 Here is the query to list all filegroups (primary and secondary data files) in current database

select * from sys.filegroups
go

Below query lists all types of files and its related details:

select database_id, file_id, type, type_desc, name, physical_name, state_desc, size, is_read_only, backup_lsn, differential_base_guid from sys.master_files where DB_NAME(database_id) = ‘LearningInternals’
go

Recomendation:

Its recommended to keep data and log files in different drives for better performance. And also its recommended not to keep Tempdb in drives where user database files are placed.

Before moving further into Storage internals, its important to understand some key internal components.

Pages

Is the fundamental unit of data storage in SQL Server. Once database is created, data space in primary and secondary files are logically divided 8k blocks called Pages. Page numbers are refered along with data file number. For example, first page of Primary file group is named as 1:0 (1 represents filegroup and 0 represents page number).
Page size : 8 k (8 * 1024 = 8192 bytes. 92 bytes for header. 8060 bytes for data)

Extents

Extents are collection of 8 pages. Pages in an extent can be owned by one or upto 8 tables.

There are two types of extents

  1. Uniform extent – all 8 pages are belongs to one table
  2. Mixed extent – 8 pages are shared by two or more tables

 when a table is created and a row is inserted, table gets 1 page in mixed extent. When a table grows, table is moved to uniform extent. This logic is to manage space efficiently.

 There are different types of pages.

 For our convenience We can logically divide pages into

  1. System pages – pages used by Storage engine to manage user data
  2. Data & index pages – Pages holds user created table data or index values

 System pages:

 

Slno Page Description
1 Header Page Stores detals about data files like size and type etc.
2 GAM Global Allocation Map. GAM is the 3rd page (1:2) in each data file. It’s a bitmap with appx. 64000 bits (8k page * 8 bits per byte). each bit tells whether the associated extent is uniform or not.  With one GAM, Storage engine can  identify upto 64000 uniform extents. Meaning one GAM  will cover upto 4 GB data space. What if data file size is greater than 4GB? Simple. another GAM will cover next 4 GB portion
3 SGAM Shared Global Allocation Map. SGAM is the 4th page (1:3) in each data file. It’s a bitmap with appx. 64000 bits (8k page * 8 bits per byte) each bit tells whether the associated extent is mixed extent or not.  With one SGAM, Storage engine can  identify upto 64000 mixed extents. Meaning one SGAM  will cover upto 4 GB data space. What if data file size is greater than 4GB? Simple. another SGAM will cover next 4 GB portion
4 IAM Index Allocation Map. One IAM page for each table to identify pages associated with a table (there can be many IAM for one table if table is partitioned and/or a table have row overflow pages/LOB data etc. im leaving them unexplained as of now).  It’s a bitmap with appx. 64000 bits. each bit tells whether the associated extent is allocated to a table or not.  With one IAM, Storage engine can  identify upto 64000 extents. Meaning one IAM will cover upto 4 GB data space. What if a table size is greater than 4GB? Simple. another IAM will cover next 4 GB portion.
5 PFS Page Free Space. PFS is used for storing space availability of each pages.  One byte for each page. There will be one PFS for every 8000 pages.
6 DCM Differential Change Map. DCM is a bitmap page helps differential backups to identify the extents changed since Full backup.
7 BCM Bulk change map. Used to monitor extents affected by bulk operations. Used when database in bulk logged recovery mode.

Data and Index Pages:

 

Slno Page Description
1 Data page Holds table rows. When an Insert statement is issued, PFS helps to identify the right page to hold the record.
2 Row Overflow page From SQL 2005, you have have 4 varchar(5000) columns, meaning  maximum 8000 bytes per row limitation is avoided with Row overflow page. This page is used when a record size exceeds 8000 row size.
3 LOB LOB data like text, varchar(max) pages are stored seperately.

 When a database is created, a data file will have pages like this,

 

Page Header PFS GAM SGAM Unused Unused DCM BCM …. Data Data Data Data Data Data Data
Page No.

1:0

1:1

1:2

1:3

1:4

1:5

1:6

1:7

….

1:15001

1:15002

1:15003

1:15004

1:15005

1:15006

1:15007

 Here is the query to list all tables and pages allocated to each table.

select so.name, so.object_id, sp.index_id, sp.partition_id, sp.hobt_id, sa.container_id, internals.total_pages, internals.used_pages, internals.data_pages, first_page, root_page, first_iam_page
from sys.objects so
inner join sys.partitions sp on so.object_id = sp.object_id
inner join sys.allocation_units sa on sa.container_id = sp.hobt_id
inner join sys.system_internals_allocation_units internals on internals.container_id = a.container_id
where so.name not like ‘sys%’

Summary:

When a database is created, by default one primary data file and a log file is created.
Data file is logically divided in to hundreds of 8k blocks called pages. Some system related pages like GAM, SGAM, IAM and PFS are used to manage data pages.

In my next article, I am going to cover,

  1. The relationship among sys.objects, sys.partitions and sys.allocation_units system views
  2. Anatomy of a page.

What happens when a query is submitted?

Here is high level details on life cycle of a query:

1. SQL Server Network Interface (SNI) a protocol layer on Client establish connection to Server SNI using Network protocol such as TCP/IP

2. Query is encapsulated in TDS  (Tabular Data Stream) packet

3. SNI encapsulates the TDS (Tabular Data Stream) packet inside a standard communication protocol such as TCP/IP or named pipes

4. Server side SNI receives TDP packet on port 1433, unpack the query and send the query to Command Parser

5. Parser first checks the plan in PLAN CACHE for existing query plan. If available plan will be used

6. If Query plan is not available in Plan Cache, Parser create QUERY TREE based on SQL command and send it to QUERY OPTIMIZER

7. Query Optimizers task is to find the good acceptable plan in less possible time

8. Query Optimizer has its own Rules Processor to examine the query. Query Optimizer go thru
    Multiple stages for generating a good acceptable plan

      Stage 1 (Pre Optimization Stage):

      8.1 Optimizer check possibility of TRIVIAL PLAN
            Examples:
              a. Table Scan if no index is available
              b. Index Seek if usable index is available for given search condition

     Stage 2 (Simple to Full optimization stages)

     8.2   In this stage, Optimizer will generate and evaluate many plans and will choose plan with LOWEST COST
  
     Note:

     a. Behavior of Optimizer is, this won’t spend much time to find the best plan. Instead, Optimizer choose good acceptable plan with less cost in its threshold
     b. Optimizer have COST THRESHOLD to go for parallism (default  5 seconds) if more than 1 processor is available
     c. Query processor picks an index based on multiple factors like, Uniqueness of Key, number of records, Predicates in WHERE clause. Optimizer refers Statistics  to evaluate these.
     d. Query optimizer most likely will ignore index if duplicates are huge

9. Execution plan is stored in Plan cache for reuse and passed to Query Executor

10. Query Executor checks PAGE availability in DATA CACHE. If not available PAGE is retrieved from Disk (System pages (like GAM, SGAM, IAM) and Index pages etc are used to manage, identify pages quicker)

11. Result set is formatted as relational table (as XML if FOR XML is specified) and encapsulated in TDS packet

12. Server side SNI encapsulates the TDS (Tabular Data Stream) packet inside a standard communication protocol, such as TCP/IP or named pipes and move it to Client

13.   Client SNI receives the TDS packet (Default port 1433), unpack the result set and show it in Grid

Here is some database level factors influence query performance

1. Usable Index
2. Sargable arguments
3. Statistics
4. Database configurations
5. Query Hints

Please share your comments on this post. your comments will help to improve content in coming days.