MIT6.S081 xv6book chapter6
第六章主要是讲并发编程,为什么要用锁、什么时候使用锁、锁范围、加锁顺序、死锁、可重入锁等知识,还介绍了xv6中自旋锁的实现。
特别要注意xv6中持有锁就不允许中断;内存屏障用于避免指令重排,这些都是锁实现的细节。
本节融合了lec13的内容,总体上属于并发编程入门,信号量、条件变量等多进程同步机制没有介绍,后续章节会涉及。
竞态条件
A race condition is a situation in which a memory location is accessed concurrently, and at least one access is a write.
锁是如何避免race condition的,这里有两个很好的描述词:序列化、原子化,
You can think of a lock as serializing concurrent critical sections so that they run one at a time, and thus preserve invariants (assuming the critical sections are correct in isolation).
You can also think of critical sections guarded by the same lock as being atomic with respect to each other, so that each sees only the complete set of changes from earlier critical sections, and never sees partially-completed updates.
锁的使用
什么时候使用锁、使用多少个锁:
A hard part about using locks is deciding how many locks to use and which data and invariants each lock should protect. There are a few basic principles. First, any time a variable can be written by one CPU at the same time that another CPU can read or write it, a lock should be used to keep the two operations from overlapping. Second, remember that locks protect invariants: if an invariant involves multiple memory locations, typically all of them need to be protected by a single lock to ensure the invariant is maintained
大内核锁:
A simple kernel can do this on a multiprocessor by having a single lock that must be acquired on entering the kernel and released on exiting the kernel (though system calls such as pipe reads or wait would pose a problem). Many uniprocessor operating systems have been converted to run on multiprocessors using this approach, sometimes called a “big kernel lock”, but the approach sacrifices parallelism: only one CPU can execute in the kernel at a time.
如果内核中只有一把大锁,我们暂时将之称为big kernel lock。基本上所有的系统调用都会被这把大锁保护而被序列化。系统调用会按照这样的流程处理:一个系统调用获取到了big kernel lock,完成自己的操作,之后释放这个big kernel lock,再返回到用户空间,之后下一个系统调用才能执行。这样的话,如果我们有一个应用程序并行的调用多个系统调用,这些系统调用会串行的执行,
死锁与锁顺序
it is important that all code paths acquire those locks in the same order.
获取锁的顺序很重要,如果所有获锁的代码都遵从相同的获锁顺序,那么是不会造成死锁的,但现实中获锁的顺序取决于代码逻辑。
对于一个系统设计者,你需要确定对于所有的锁对象的全局的顺序。例如在这里的例子中我们让d1一直在d2之前,这样我们在rename的时候,总是先获取排序靠前的目录的锁,再获取排序靠后的目录的锁。如果对于所有的锁有了一个全局的排序,这里的死锁就不会出现了。
不过在设计一个操作系统的时候,定义一个全局的锁的顺序会有些问题。如果一个模块m1中方法g调用了另一个模块m2中的方法f,那么m1中的方法g需要知道m2的方法f使用了哪些锁。因为如果m2使用了一些锁,那么m1的方法g必须集合f和g中的锁,并形成一个全局的锁的排序。这意味着在m2中的锁必须对m1可见,这样m1才能以恰当的方法调用m2。
但是这样又违背了代码抽象的原则。在完美的情况下,代码抽象要求m1完全不知道m2是如何实现的。但是不幸的是,具体实现中,m2内部的锁需要泄露给m1,这样m1才能完成全局锁排序。所以当你设计一些更大的系统时,锁使得代码的模块化更加的复杂了。
可重入锁
The idea is that if the lock is held by a process and if that process attempts to acquire the lock again, then the kernel could just allow this (since the process already has the lock), instead of calling panic, as the xv6 kernel does
But if re-entrant locks are allowed, and h happens to call g, call_once will be called twice.
If re-entrant locks aren’t allowed, then h calling g results in a deadlock, which is not great either.
可重入锁有优点有缺点,缺点就是它使得并发编程更加复杂了,优点是至少能避免一些死锁的情况。
自旋锁的实现:原子指令
锁的特性就是只有一个进程可以获取锁,在任何时间点都不能有超过一个锁的持有者。
1 | struct spinlock { |
实现锁的难点就在于看的动作和写的动作的不连续,中间可能被打断。
1 | void acquire(struct spinlock *lk) // does not work! |
上面这段代码并不能实现acquire语义,问题就出在两个进程可以同时进入到判断锁的那一行,此时locked都为0,两个进程会同时将locked设置为1。这其实是三个操作(读locked、判断locked、写locked)的原子性,如果这三个操作是合在一起的, 便能保证正确性。
解决的方法是依赖于一个特殊的硬件指令,这个特殊的硬件指令会保证一次test-and-set操作的原子性,在RISC-V上,这个特殊的指令就是amoswap(atomic memory swap)原子交换。
The acquire function wraps the swap in a loop, retrying (spinning) until it has acquired the lock. Each iteration swaps one into lk->locked and checks the previous value;
- if the previous value is zero, then we’ve acquired the lock, and the swap will have set lk->locked to one.
- If the previous value is one, then some other CPU holds the lock, and the fact that we atomically swapped one into lk->locked didn’t change its value.
在获取锁时用1去交换,然后判断获取的旧值是否为0,为0说明获得了锁,为1说明此时有其他进程获得了锁,那么交换的1没有改变着之前的值。
这其实就是保证了锁的唯一性。将1看成苹果,将0看成锁,锁放在桌子上。每个都用苹果换桌子上的东西,如果换到锁,那么说明我拿到锁了,并且这个锁不会再被别人拿到。如果换到苹果,那也只是等价交换,桌子上还是苹果。
acquire
1 | void acquire(struct spinlock *lk) { |
这里先忽略push_off的作用,可以看到在函数中有一个while循环,这就是刚刚提到的test-and-set循环。实际上C的标准库已经定义了这些原子操作,所以C标准库中已经有一个函数__sync_lock_test_and_set。
1 | while(__sync_lock_test_and_set(&lk->locked, 1) != 0) |
release
1 | void release(struct spinlock *lk) { |
释放锁的过程就是将locked字段原子更新为0的过程。为什么需要原子更新?
因为更新的操作其实有三个步骤:读地址值到寄存器、修改寄存器的值、再将寄存器的值写回到内存。
锁与中断处理程序
1 | // Per-CPU state. |
acquire calls push_off (kernel/spinlock.c:89) and release calls pop_off (kernel/spinlock.c:100) to track the nesting level of locks on the current CPU. When that count reaches zero, pop_off restores the interrupt enable state that existed at the start of the outermost critical section.
第二个细节是,在acquire函数的最开始,会先关闭中断。为什么会是这样呢?先来假设acquire在一开始并没有关闭中断。在uartputc函数中,首先会acquire锁,如果不关闭中断会发生什么呢?uartputc函数会acquire锁,UART本质上就是传输字符,当UART完成了字符传输它会做什么?是的,它会产生一个中断之后会运行uartintr函数,在uartintr函数中,会获取同一把锁,但是这把锁正在被uartputc持有。如果这里只有一个CPU的话,那这里就是死锁。
所以spinlock需要处理两类并发,一类是不同CPU之间的并发,一类是相同CPU上中断和普通程序之间的并发。
更深层次的原因是:锁会在各种各样的地方被用到,从用户程序到中断处理程序。而中断又是时时刻刻发生的,如果用户程序持有锁的同时发生中断,中断处理程序又要求获得同一把锁,就会发生死锁。
xv6解决的方法粗暴有效:when a CPU acquires any lock, xv6 always disables interrupts on that CPU. 哪个CPU持有锁就不允许那个CPU处理中断。
还要注意点的是:
- 关中断在获锁前,开中断在释放锁后;
- noff追踪了锁嵌套的层次(track the nesting level of locks on the current CPU),只有最后一个锁释放后才能开中断。
指令重排
It is natural to think of programs executing in the order in which source code statements appear. Many compilers and CPUs, however, execute code out of order to achieve higher performance.The CPU’s ordering rules are called the memory model.
简单来说,CPU或编译器为了更好的执行性能,通常会调整一些代码的执行顺序。而这会使得锁失效,因为关键区的代码可能会被移到关键区外。
避免指令重排是通过一条硬件指令,内存屏障(memory fence或者叫做synchronize指令)来确定指令的移动范围。对于synchronize指令,任何在它之前的load/store指令,都不能移动到它之后。
那么,通过两个内存屏障,加锁时一个,释放锁时一个,就能避免指令乱排带来的后果。
学生提问:有没有可能在锁acquire之前的一条指令被移到锁release之后?或者说这里会有一个界限不允许这么做?
Frans教授:在这里的例子中,acquire和release都有自己的界限(注,也就是__sync_synchronize函数的调用点)。所以发生在锁acquire之前的指令不会被移到acquire的__sync_synchronize函数调用之后,这是一个界限。在锁的release函数中有另一个界限。所以在第一个界限之前的指令会一直在这个界限之前,在两个界限之间的指令会保持在两个界限之间,在第二个界限之后的指令会保持在第二个界限之后。
学生提问:在一个处理器上运行多个线程与在多个处理器上运行多个进程是否一样?
Frans教授:差不多吧,如果你有多个线程,但是只有一个CPU,那么你还是会想要特定内核代码能够原子执行。所以你还是需要有critical section的概念。你或许不需要锁,但是你还是需要能够对特定的代码打开或者关闭中断。如果你查看一些操作系统的内核代码,通常它们都没有锁的acquire,因为它们假定自己都运行在单个处理器上,但是它们都有开关中断的操作。
Sleep locks
Holding a spinlock that long would lead to waste if another process wanted to acquire it, since the acquiring process would waste CPU for a long time while spinning.
Another drawback of spinlocks is that a process cannot yield the CPU while retaining a spinlock; we’d like to do this so that other processes can use the CPU while the process with the lock waits for the disk.
Yielding while holding a spinlock is illegal because it might lead to deadlock if a second thread then tried to acquire the spinlock; since acquire doesn’t yield the CPU, the second thread’s spinning might prevent the first thread from running and releasing the lock.
Yielding while holding a lock would also violate the requirement that interrupts must be off while a spinlock is held. Thus we’d like a type of lock that yields the CPU while waiting to acquire, and allows yields (and interrupts) while the lock is held.
自旋锁的坏处在于尝试获取锁时必须让CPU自旋重试,因此自旋锁不能用于持锁时间长的场景。那么我们会想要这么一种锁:尝试获锁能够让出CPU,而在持有锁时又允许中断,这将会在后来介绍。
MIT6.S081 xv6book chapter6
https://xyz.desirer233.fun/2024/01/13/MIT6.S081/book/chapter6/