PG可见性判断

PG可见性判断是怎么进行的?

前言

前备知识:MVCC && 事务隔离级别

场景:heapgetpage函数中,为了获取一个块内的所有可见tuple,遍历tuple时都要对它的可见性进行判断,该函数为HeapTupleSatisfiesMVCC。

事务具有xid,PG还有clog记录事务是否提交,PG还维护全局活跃的事务数组。

  • xmin:当前活跃事务最小的xid
  • xmax:当前活跃事务最大的xid+1

回到事务读元组的场景,如果元组的xid<xmin,那么这条元组可见;如果元组的xid>xmax,那么这条元组不可见。但是PG的元组头中有t_xmin、t_max等字段,这些字段怎么一起控制元组可见性呢?

CLog

PostgreSQL在提交日志(Commit Log, clog)中保存事务的状态。提交日志(通常称为clog)分配于共享内存中,并用于事务处理过程的全过程。

1
2
3
4
#define TRANSACTION_STATUS_IN_PROGRESS 0x00
#define TRANSACTION_STATUS_COMMITTED 0x01
#define TRANSACTION_STATUS_ABORTED 0x02
#define TRANSACTION_STATUS_SUB_COMMITTED 0x03

四种事务状态,其中sub_committed与子事务相关,暂时略过。

提交日志(下称clog)在逻辑上是一个数组,由共享内存中一系列8KB页面组成,以页为单位。数组下标是事务ID,参考TransactionIdGetStatus函数;数组内容是事务状态,每个事务状态占用2bit。一个页面8K,可以存储8K*8/2=32K个事务状态。

Clog buffer大小为Min(128, Max(4, NBuffers / 512)),初始化函数为CLOGShmemInit。启动时会从pg_xact读取事务状态加载到内存。

Hint Bits

为了避免多次访问clog造成的性能瓶颈,PG在元组头部设立了提示字段。具体而言,在元组的t_informask字段中设立如下的标记:

1
2
3
4
5
6
#define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
#define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
#define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */
#define HEAP_XMAX_IS_MULTI 0x1000 /* t_xmax is a MultiXactId */

访问元组头部就可以知道元组对应的t_xmin、t_max事务状态,不需要再次访问clog。

HeapTupleSatisfiesMVCC

可见情况:

  • 在创建快照时所有已提交的事务
  • 本事务之前执行的命令

不可见情况:

  • 在创建快照时尚活跃的事务
  • 在创建快照后启动的事务
  • 当前命令造成的变化(changes made by the current command)

这个函数比较复杂,参考博客了解详情。下面是归纳的一份逻辑流程:大体是先判断xmin再判断xmax,根据xmin的状态分三条线走。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* t_xmin status = ABORTED */
Rule 1: IF t_xmin status is 'ABORTED' THEN
RETURN 'Invisible'
END IF
/* t_xmin status = IN_PROGRESS */
IF t_xmin status is 'IN_PROGRESS' THEN
IF t_xmin = current_txid THEN
Rule 2: IF t_xmax = INVALID THEN
RETURN 'Visible'
Rule 3: ELSE /* this tuple has been deleted or updated by the current transaction itself. */
RETURN 'Invisible'
END IF
Rule 4: ELSE /* t_xmin ≠ current_txid */
RETURN 'Invisible'
END IF
END IF
/* t_xmin status = COMMITTED */
IF t_xmin status is 'COMMITTED' THEN
Rule 5: IF t_xmin is active in the obtained transaction snapshot THEN
RETURN 'Invisible'
Rule 6: ELSE IF t_xmax = INVALID OR status of t_xmax is 'ABORTED' THEN
RETURN 'Visible'
ELSE IF t_xmax status is 'IN_PROGRESS' THEN
Rule 7: IF t_xmax = current_txid THEN
RETURN 'Invisible'
Rule 8: ELSE /* t_xmax ≠ current_txid */
RETURN 'Visible'
END IF
ELSE IF t_xmax status is 'COMMITTED' THEN
Rule 9: IF t_xmax is active in the obtained transaction snapshot THEN
RETURN 'Visible'
Rule 10: ELSE
RETURN 'Invisible'
END IF
END IF
END IF

SnapShot

快照是如何产生的呢?主要看GetTransactionSnapshot这个函数,它的主要逻辑是:

  • 根据FirstsnapshotSet判断是否是第一次生成快照
  • 根据隔离级别决定要生成的快照
  • 通过GetSnapshotData这个函数生成具体的快照

FirstsnapshotSet并不是强依赖于事务的变量,只是snapmgr.c的一个全局变量。根据不同隔离级别,这个变量设置也会有不同的时机。

初始值为false,生成snapshot后设置为true。如果隔离级别是read-commit,则当sql执行完成后会将FirstSnapshotSet设为false,如果是repeatable-read只有在事务结束后才会将FirstSnapshotSet设为false。通过监视FirstSnapshotSet变量的变化(添加数据断点&FirstSnapshotSet)可以观察语句结束和事务提交的情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Snapshot GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

/* First call in transaction? */
if (!FirstSnapshotSet)
{
/*
* Don't allow catalog snapshot to be older than xact snapshot. Must
* do this first to allow the empty-heap Assert to succeed.
*/
InvalidateCatalogSnapshot();

Assert(pairingheap_is_empty(&RegisteredSnapshots));
Assert(FirstXactSnapshot == NULL);

if (IsInParallelMode())
elog(ERROR,
"cannot take query snapshot during a parallel operation");

/*
* In transaction-snapshot mode, the first snapshot must live until
* end of xact regardless of what the caller does with it, so we must
* make a copy of it rather than returning CurrentSnapshotData
* directly. Furthermore, if we're running in serializable mode,
* predicate.c needs to wrap the snapshot fetch in its own processing.
*/
if (IsolationUsesXactSnapshot())
{
/* First, create the snapshot in CurrentSnapshotData */
if (IsolationIsSerializable())
CurrentSnapshot = GetSerializableTransactionSnapshot(&CurrentSnapshotData);
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
/* Make a saved copy */
CurrentSnapshot = CopySnapshot(CurrentSnapshot);
FirstXactSnapshot = CurrentSnapshot;
/* Mark it as "registered" in FirstXactSnapshot */
FirstXactSnapshot->regd_count++;
pairingheap_add(&RegisteredSnapshots, &FirstXactSnapshot->ph_node);
}
else
CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);

FirstSnapshotSet = true;
return CurrentSnapshot;
}

if (IsolationUsesXactSnapshot())
return CurrentSnapshot;

/* Don't allow catalog snapshot to be older than xact snapshot. */
InvalidateCatalogSnapshot();

CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);

return CurrentSnapshot;
}

参考

https://blog.csdn.net/Hehuyi_In/article/details/127344822

https://blog.csdn.net/obvious__/article/details/120710977

https://pg-internal.vonng.com/

作者

Desirer

发布于

2024-10-04

更新于

2024-11-15

许可协议