理解 Memory barrier（內(nèi)存屏障） | Name5566

老匹夫 2014-09-12

展開全文

參考文獻(xiàn)列表：
http://en./wiki/Memory_barrier
http://en./wiki/Out-of-order_execution
https://www./doc/Documentation/memory-barriers.txt

本文例子均在 Linux（g++）下驗(yàn)證通過，CPU 為 X86-64 處理器架構(gòu)。所有羅列的 Linux 內(nèi)核代碼也均在（或只在）X86-64 下有效。

本文首先通過范例（以及內(nèi)核代碼）來解釋 Memory barrier，然后介紹一個(gè)利用 Memory barrier 實(shí)現(xiàn)的無鎖環(huán)形緩沖區(qū)。

Memory barrier 簡(jiǎn)介

程序在運(yùn)行時(shí)內(nèi)存實(shí)際的訪問順序和程序代碼編寫的訪問順序不一定一致，這就是內(nèi)存亂序訪問。內(nèi)存亂序訪問行為出現(xiàn)的理由是為了提升程序運(yùn)行時(shí)的性能。內(nèi)存亂序訪問主要發(fā)生在兩個(gè)階段：

編譯時(shí)，編譯器優(yōu)化導(dǎo)致內(nèi)存亂序訪問（指令重排）
運(yùn)行時(shí)，多 CPU 間交互引起內(nèi)存亂序訪問

Memory barrier 能夠讓 CPU 或編譯器在內(nèi)存訪問上有序。一個(gè) Memory barrier 之前的內(nèi)存訪問操作必定先于其之后的完成。Memory barrier 包括兩類：

編譯器 barrier
CPU Memory barrier

很多時(shí)候，編譯器和 CPU 引起內(nèi)存亂序訪問不會(huì)帶來什么問題，但一些特殊情況下，程序邏輯的正確性依賴于內(nèi)存訪問順序，這時(shí)候內(nèi)存亂序訪問會(huì)帶來邏輯上的錯(cuò)誤，例如：

// thread 1
while (!ok);
do(x);
 
// thread 2
x = 42;
ok = 1;

此段代碼中，ok 初始化為 0，線程 1 等待 ok 被設(shè)置為 1 后執(zhí)行 do 函數(shù)。假如說，線程 2 對(duì)內(nèi)存的寫操作亂序執(zhí)行，也就是 x 賦值后于 ok 賦值完成，那么 do 函數(shù)接受的實(shí)參就很可能出乎程序員的意料，不為 42。

編譯時(shí)內(nèi)存亂序訪問

在編譯時(shí)，編譯器對(duì)代碼做出優(yōu)化時(shí)可能改變實(shí)際執(zhí)行指令的順序（例如 gcc 下 O2 或 O3 都會(huì)改變實(shí)際執(zhí)行指令的順序）：

// test.cpp
int x, y, r;
void f()
{
    x = r;
    y = 1;
}

編譯器優(yōu)化的結(jié)果可能導(dǎo)致 y = 1 在 x = r 之前執(zhí)行完成。首先直接編譯此源文件：

g++ -S test.cpp

得到相關(guān)的匯編代碼如下：

movl    r(%rip), %eax
movl    %eax, x(%rip)
movl    $1, y(%rip)

這里我們看到，x = r 和 y = 1 并沒有亂序?，F(xiàn)使用優(yōu)化選項(xiàng) O2（或 O3）編譯上面的代碼（g++ -O2 -S test.cpp），生成匯編代碼如下：

movl    r(%rip), %eax
movl    $1, y(%rip)
movl    %eax, x(%rip)

我們可以清楚的看到經(jīng)過編譯器優(yōu)化之后 movl $1, y(%rip) 先于 movl %eax, x(%rip) 執(zhí)行。避免編譯時(shí)內(nèi)存亂序訪問的辦法就是使用編譯器 barrier（又叫優(yōu)化 barrier）。Linux 內(nèi)核提供函數(shù) barrier() 用于讓編譯器保證其之前的內(nèi)存訪問先于其之后的完成。內(nèi)核實(shí)現(xiàn) barrier() 如下（X86-64 架構(gòu)）：

#define barrier() __asm__ __volatile__("" ::: "memory")

現(xiàn)在把此編譯器 barrier 加入代碼中：

int x, y, r;
void f()
{
    x = r;
    __asm__ __volatile__("" ::: "memory");
    y = 1;
}

這樣就避免了編譯器優(yōu)化帶來的內(nèi)存亂序訪問的問題了（如果有興趣可以再看看編譯之后的匯編代碼）。本例中，我們還可以使用 volatile 這個(gè)關(guān)鍵字來避免編譯時(shí)內(nèi)存亂序訪問（而無法避免后面要說的運(yùn)行時(shí)內(nèi)存亂序訪問）。volatile 關(guān)鍵字能夠讓相關(guān)的變量之間在內(nèi)存訪問上避免亂序，這里可以修改 x 和 y 的定義來解決問題：

volatile int x, y;
int r;
void f()
{
    x = r;
    y = 1;
}

現(xiàn)加上了 volatile 關(guān)鍵字，這使得 x 相對(duì)于 y、y 相對(duì)于 x 在內(nèi)存訪問上有序。在 Linux 內(nèi)核中，提供了一個(gè)宏 ACCESS_ONCE 來避免編譯器對(duì)于連續(xù)的 ACCESS_ONCE 實(shí)例進(jìn)行指令重排。其實(shí) ACCESS_ONCE 實(shí)現(xiàn)源碼如下：

#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

此代碼只是將變量 x 轉(zhuǎn)換為 volatile 的而已?，F(xiàn)在我們就有了第三個(gè)修改方案：

int x, y, r;
void f()
{
    ACCESS_ONCE(x) = r;
    ACCESS_ONCE(y) = 1;
}

到此基本上就闡述完了我們的編譯時(shí)內(nèi)存亂序訪問的問題。下面開始介紹運(yùn)行時(shí)內(nèi)存亂序訪問。

運(yùn)行時(shí)內(nèi)存亂序訪問

在運(yùn)行時(shí)，CPU 雖然會(huì)亂序執(zhí)行指令，但是在單個(gè) CPU 的上，硬件能夠保證程序執(zhí)行時(shí)所有的內(nèi)存訪問操作看起來像是按程序代碼編寫的順序執(zhí)行的，這時(shí)候 Memory barrier 沒有必要使用（不考慮編譯器優(yōu)化的情況下）。這里我們了解一下 CPU 亂序執(zhí)行的行為。在亂序執(zhí)行時(shí)，一個(gè)處理器真正執(zhí)行指令的順序由可用的輸入數(shù)據(jù)決定，而非程序員編寫的順序。
早期的處理器為有序處理器（In-order processors），有序處理器處理指令通常有以下幾步：

指令獲取
如果指令的輸入操作對(duì)象（input operands）可用（例如已經(jīng)在寄存器中了），則將此指令分發(fā)到適當(dāng)?shù)墓δ軉卧小Ｈ绻粋€(gè)或者多個(gè)操作對(duì)象不可用（通常是由于需要從內(nèi)存中獲?。瑒t處理器會(huì)等待直到它們可用
指令被適當(dāng)?shù)墓δ軉卧獔?zhí)行
功能單元將結(jié)果寫回寄存器堆（Register file，一個(gè) CPU 中的一組寄存器）

相比之下，亂序處理器（Out-of-order processors）處理指令通常有以下幾步：

指令獲取
指令被分發(fā)到指令隊(duì)列
指令在指令隊(duì)列中等待，直到輸入操作對(duì)象可用（一旦輸入操作對(duì)象可用，指令就可以離開隊(duì)列，即便更早的指令未被執(zhí)行）
指令被分配到適當(dāng)?shù)墓δ軉卧?zhí)行
執(zhí)行結(jié)果被放入隊(duì)列（而不立即寫入寄存器堆）
只有所有更早請(qǐng)求執(zhí)行的指令的執(zhí)行結(jié)果被寫入寄存器堆后，指令執(zhí)行的結(jié)果才被寫入寄存器堆（執(zhí)行結(jié)果重排序，讓執(zhí)行看起來是有序的）

從上面的執(zhí)行過程可以看出，亂序執(zhí)行相比有序執(zhí)行能夠避免等待不可用的操作對(duì)象（有序執(zhí)行的第二步）從而提高了效率?，F(xiàn)代的機(jī)器上，處理器運(yùn)行的速度比內(nèi)存快很多，有序處理器花在等待可用數(shù)據(jù)的時(shí)間里已經(jīng)可以處理大量指令了。
現(xiàn)在思考一下亂序處理器處理指令的過程，我們能得到幾個(gè)結(jié)論：

對(duì)于單個(gè) CPU 指令獲取是有序的（通過隊(duì)列實(shí)現(xiàn)）
對(duì)于單個(gè) CPU 指令執(zhí)行結(jié)果也是有序返回寄存器堆的（通過隊(duì)列實(shí)現(xiàn)）

由此可知，在單 CPU 上，不考慮編譯器優(yōu)化導(dǎo)致亂序的前提下，多線程執(zhí)行不存在內(nèi)存亂序訪問的問題。我們從內(nèi)核源碼也可以得到類似的結(jié)論（代碼不完全的摘錄）：

#ifdef CONFIG_SMP
#define smp_mb() mb()
#else
#define smp_mb() barrier()
#endif

這里可以看到，如果是 SMP 則使用 mb，mb 被定義為 CPU Memory barrier（后面會(huì)講到），而非 SMP 時(shí)，直接使用編譯器 barrier。

在多 CPU 的機(jī)器上，問題又不一樣了。每個(gè) CPU 都存在 cache（cache 主要是為了彌補(bǔ) CPU 和內(nèi)存之間較慢的訪問速度），當(dāng)一個(gè)特定數(shù)據(jù)第一次被特定一個(gè) CPU 獲取時(shí)，此數(shù)據(jù)顯然不在 CPU 的 cache 中（這就是 cache miss）。此 cache miss 意味著 CPU 需要從內(nèi)存中獲取數(shù)據(jù)（這個(gè)過程需要 CPU 等待數(shù)百個(gè)周期），此數(shù)據(jù)將被加載到 CPU 的 cache 中，這樣后續(xù)就能直接從 cache 上快速訪問。當(dāng)某個(gè) CPU 進(jìn)行寫操作時(shí)，它必須確保其他的 CPU 已經(jīng)將此數(shù)據(jù)從它們的 cache 中移除（以便保證一致性），只有在移除操作完成后此 CPU 才能安全的修改數(shù)據(jù)。顯然，存在多個(gè) cache 時(shí)，我們必須通過一個(gè) cache 一致性協(xié)議來避免數(shù)據(jù)不一致的問題，而這個(gè)通訊的過程就可能導(dǎo)致亂序訪問的出現(xiàn)，也就是這里說的運(yùn)行時(shí)內(nèi)存亂序訪問。這里不再深入討論整個(gè)細(xì)節(jié)，這是一個(gè)比較復(fù)雜的問題，有興趣可以研究 http://www./users/paulmck/scalability/paper/whymb.2010.06.07c.pdf 一文，其詳細(xì)的分析了整個(gè)過程。

現(xiàn)在通過一個(gè)例子來說明多 CPU 下內(nèi)存亂序訪問：

// test2.cpp
#include <pthread.h>
#include <assert.h>
 
// -------------------
int cpu_thread1 = 0;
int cpu_thread2 = 1;
 
volatile int x, y, r1, r2;
 
void start()
{
    x = y = r1 = r2 = 0;
}
 
void end()
{
    assert(!(r1 == 0 && r2 == 0));
}
 
void run1()
{
    x = 1;
    r1 = y;
}
 
void run2()
{
    y = 1;
    r2 = x;
}
 
// -------------------
static pthread_barrier_t barrier_start;
static pthread_barrier_t barrier_end;
 
static void* thread1(void*)
{
    while (1) {
        pthread_barrier_wait(&barrier_start);
        run1();
        pthread_barrier_wait(&barrier_end);
    }
 
    return NULL;
}
 
static void* thread2(void*)
{
    while (1) {
        pthread_barrier_wait(&barrier_start);
        run2();
        pthread_barrier_wait(&barrier_end);
    }
 
    return NULL;
}
 
int main()
{
    assert(pthread_barrier_init(&barrier_start, NULL, 3) == 0);
    assert(pthread_barrier_init(&barrier_end, NULL, 3) == 0);
 
    pthread_t t1;
    pthread_t t2;
    assert(pthread_create(&t1, NULL, thread1, NULL) == 0);
    assert(pthread_create(&t2, NULL, thread2, NULL) == 0);
 
    cpu_set_t cs;
    CPU_ZERO(&cs);
    CPU_SET(cpu_thread1, &cs);
    assert(pthread_setaffinity_np(t1, sizeof(cs), &cs) == 0);
    CPU_ZERO(&cs);
    CPU_SET(cpu_thread2, &cs);
    assert(pthread_setaffinity_np(t2, sizeof(cs), &cs) == 0);
 
    while (1) {
        start();
        pthread_barrier_wait(&barrier_start);
        pthread_barrier_wait(&barrier_end);
        end();
    }
 
    return 0;
}

這里創(chuàng)建了兩個(gè)線程來運(yùn)行測(cè)試代碼（需要測(cè)試的代碼將放置在 run 函數(shù)中）。我使用了 pthread barrier（區(qū)別于本文討論的 Memory barrier）主要為了讓兩個(gè)子線程能夠同時(shí)運(yùn)行它們的 run 函數(shù)。此段代碼不停的嘗試同時(shí)運(yùn)行兩個(gè)線程的 run 函數(shù)，以便得出我們期望的結(jié)果。在每次運(yùn)行 run 函數(shù)前會(huì)調(diào)用一次 start 函數(shù)（進(jìn)行數(shù)據(jù)初始化），run 運(yùn)行后會(huì)調(diào)用一次 end 函數(shù)（進(jìn)行結(jié)果檢查）。run1 和 run2 兩個(gè)函數(shù)運(yùn)行在哪個(gè) CPU 上則通過 cpu_thread1 和 cpu_thread2 兩個(gè)變量控制。
先編譯此程序：g++ -lpthread -o test2 test2.cpp（這里未優(yōu)化，目的是為了避免編譯器優(yōu)化的干擾）。需要注意的是，兩個(gè)線程運(yùn)行在兩個(gè)不同的 CPU 上（CPU 0 和 CPU 1）。只要內(nèi)存不出現(xiàn)亂序訪問，那么 r1 和 r2 不可能同時(shí)為 0，因此斷言失敗表示存在內(nèi)存亂序訪問。編譯之后運(yùn)行此程序，會(huì)發(fā)現(xiàn)存在一定概率導(dǎo)致斷言失敗。為了進(jìn)一步說明問題，我們把 cpu_thread2 的值改為 0，換而言之就是讓兩個(gè)線程跑在同一個(gè) CPU 下，再運(yùn)行程序發(fā)現(xiàn)斷言不再失敗。

最后，我們使用 CPU Memory barrier 來解決內(nèi)存亂序訪問的問題（X86-64 架構(gòu)下）：

int cpu_thread1 = 0;
int cpu_thread2 = 1;
 
void run1()
{
    x = 1;
    __asm__ __volatile__("mfence" ::: "memory");
    r1 = y;
}
 
void run2()
{
    y = 1;
    __asm__ __volatile__("mfence" ::: "memory");
    r2 = x;
}

準(zhǔn)備使用 Memory barrier

Memory barrier 常用場(chǎng)合包括：

實(shí)現(xiàn)同步原語（synchronization primitives）
實(shí)現(xiàn)無鎖數(shù)據(jù)結(jié)構(gòu)（lock-free data structures）
驅(qū)動(dòng)程序

實(shí)際的應(yīng)用程序開發(fā)中，開發(fā)者可能完全不知道 Memory barrier 就可以開發(fā)正確的多線程程序，這主要是因?yàn)楦鞣N同步機(jī)制中已經(jīng)隱含了 Memory barrier（但和實(shí)際的 Memory barrier 有細(xì)微差別），這就使得不直接使用 Memory barrier 不會(huì)存在任何問題。但是如果你希望編寫諸如無鎖數(shù)據(jù)結(jié)構(gòu)，那么 Memory barrier 還是很有用的。

通常來說，在單個(gè) CPU 上，存在依賴的內(nèi)存訪問有序：

Q = P;
D = *Q;

這里內(nèi)存操作有序。然而在 Alpha CPU 上，存在依賴的內(nèi)存讀取操作不一定有序，需要使用數(shù)據(jù)依賴 barrier（由于 Alpha 不常見，這里就不詳細(xì)解釋了）。

在 Linux 內(nèi)核中，除了前面說到的編譯器 barrier — barrier() 和 ACCESS_ONCE()，還有 CPU Memory barrier：

通用 barrier，保證讀寫操作有序的，mb() 和 smp_mb()
寫操作 barrier，僅保證寫操作有序的，wmb() 和 smp_wmb()
讀操作 barrier，僅保證讀操作有序的，rmb() 和 smp_rmb()

注意，所有的 CPU Memory barrier（除了數(shù)據(jù)依賴 barrier 之外）都隱含了編譯器 barrier。這里的 smp 開頭的 Memory barrier 會(huì)根據(jù)配置在單處理器上直接使用編譯器 barrier，而在 SMP 上才使用 CPU Memory barrier（也就是 mb()、wmb()、rmb()，回憶上面相關(guān)內(nèi)核代碼）。

最后需要注意一點(diǎn)的是，CPU Memory barrier 中某些類型的 Memory barrier 需要成對(duì)使用，否則會(huì)出錯(cuò)，詳細(xì)來說就是：一個(gè)寫操作 barrier 需要和讀操作（或數(shù)據(jù)依賴）barrier 一起使用（當(dāng)然，通用 barrier 也是可以的），反之依然。

Memory barrier 的范例

讀內(nèi)核代碼進(jìn)一步學(xué)習(xí) Memory barrier 的使用。
Linux 內(nèi)核實(shí)現(xiàn)的無鎖（只有一個(gè)讀線程和一個(gè)寫線程時(shí)）環(huán)形緩沖區(qū) kfifo 就使用到了 Memory barrier，實(shí)現(xiàn)源碼如下：

/*
 * A simple kernel FIFO implementation.
 *
 * Copyright (C) 2004 Stelian Pop <stelian@popies.net>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 *
 */
 
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/err.h>
#include <linux/kfifo.h>
#include <linux/log2.h>
 
/**
 * kfifo_init - allocates a new FIFO using a preallocated buffer
 * @buffer: the preallocated buffer to be used.
 * @size: the size of the internal buffer, this have to be a power of 2.
 * @gfp_mask: get_free_pages mask, passed to kmalloc()
 * @lock: the lock to be used to protect the fifo buffer
 *
 * Do NOT pass the kfifo to kfifo_free() after use! Simply free the
 * &struct kfifo with kfree().
 */
struct kfifo *kfifo_init(unsigned char *buffer, unsigned int size,
                         gfp_t gfp_mask, spinlock_t *lock)
{
    struct kfifo *fifo;
 
    /* size must be a power of 2 */
    BUG_ON(!is_power_of_2(size));
 
    fifo = kmalloc(sizeof(struct kfifo), gfp_mask);
    if (!fifo)
        return ERR_PTR(-ENOMEM);
 
    fifo->buffer = buffer;
    fifo->size = size;
    fifo->in = fifo->out = 0;
    fifo->lock = lock;
 
    return fifo;
}
EXPORT_SYMBOL(kfifo_init);
 
/**
 * kfifo_alloc - allocates a new FIFO and its internal buffer
 * @size: the size of the internal buffer to be allocated.
 * @gfp_mask: get_free_pages mask, passed to kmalloc()
 * @lock: the lock to be used to protect the fifo buffer
 *
 * The size will be rounded-up to a power of 2.
 */
struct kfifo *kfifo_alloc(unsigned int size, gfp_t gfp_mask, spinlock_t *lock)
{
    unsigned char *buffer;
    struct kfifo *ret;
 
    /*
     * round up to the next power of 2, since our 'let the indices
     * wrap' technique works only in this case.
     */
    if (!is_power_of_2(size)) {
        BUG_ON(size > 0x80000000);
        size = roundup_pow_of_two(size);
    }
 
    buffer = kmalloc(size, gfp_mask);
    if (!buffer)
        return ERR_PTR(-ENOMEM);
 
    ret = kfifo_init(buffer, size, gfp_mask, lock);
 
    if (IS_ERR(ret))
        kfree(buffer);
 
    return ret;
}
EXPORT_SYMBOL(kfifo_alloc);
 
/**
 * kfifo_free - frees the FIFO
 * @fifo: the fifo to be freed.
 */
void kfifo_free(struct kfifo *fifo)
{
    kfree(fifo->buffer);
    kfree(fifo);
}
EXPORT_SYMBOL(kfifo_free);
 
/**
 * __kfifo_put - puts some data into the FIFO, no locking version
 * @fifo: the fifo to be used.
 * @buffer: the data to be added.
 * @len: the length of the data to be added.
 *
 * This function copies at most @len bytes from the @buffer into
 * the FIFO depending on the free space, and returns the number of
 * bytes copied.
 *
 * Note that with only one concurrent reader and one concurrent
 * writer, you don't need extra locking to use these functions.
 */
unsigned int __kfifo_put(struct kfifo *fifo,
                         const unsigned char *buffer, unsigned int len)
{
    unsigned int l;
 
    len = min(len, fifo->size - fifo->in + fifo->out);
 
    /*
     * Ensure that we sample the fifo->out index -before- we
     * start putting bytes into the kfifo.
     */
 
    smp_mb();
 
    /* first put the data starting from fifo->in to buffer end */
    l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
    memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);
 
    /* then put the rest (if any) at the beginning of the buffer */
    memcpy(fifo->buffer, buffer + l, len - l);
 
    /*
     * Ensure that we add the bytes to the kfifo -before-
     * we update the fifo->in index.
     */
 
    smp_wmb();
 
    fifo->in += len;
 
    return len;
}
EXPORT_SYMBOL(__kfifo_put);
 
/**
 * __kfifo_get - gets some data from the FIFO, no locking version
 * @fifo: the fifo to be used.
 * @buffer: where the data must be copied.
 * @len: the size of the destination buffer.
 *
 * This function copies at most @len bytes from the FIFO into the
 * @buffer and returns the number of copied bytes.
 *
 * Note that with only one concurrent reader and one concurrent
 * writer, you don't need extra locking to use these functions.
 */
unsigned int __kfifo_get(struct kfifo *fifo,
                         unsigned char *buffer, unsigned int len)
{
    unsigned int l;
 
    len = min(len, fifo->in - fifo->out);
 
    /*
     * Ensure that we sample the fifo->in index -before- we
     * start removing bytes from the kfifo.
     */
 
    smp_rmb();
 
    /* first get the data from fifo->out until the end of the buffer */
    l = min(len, fifo->size - (fifo->out & (fifo->size - 1)));
    memcpy(buffer, fifo->buffer + (fifo->out & (fifo->size - 1)), l);
 
    /* then get the rest (if any) from the beginning of the buffer */
    memcpy(buffer + l, fifo->buffer, len - l);
 
    /*
     * Ensure that we remove the bytes from the kfifo -before-
     * we update the fifo->out index.
     */
 
    smp_mb();
 
    fifo->out += len;
 
    return len;
}
EXPORT_SYMBOL(__kfifo_get);

為了更好的理解上面的源碼，這里順帶說一下此實(shí)現(xiàn)使用到的一些和本文主題無關(guān)的技巧：

使用與操作來求取環(huán)形緩沖區(qū)的下標(biāo)，相比取余操作來求取下標(biāo)的做法效率要高不少。使用與操作求取下標(biāo)的前提是環(huán)形緩沖區(qū)的大小必須是 2 的 N 次方，換而言之就是說環(huán)形緩沖區(qū)的大小為一個(gè)僅有一個(gè) 1 的二進(jìn)制數(shù)，那么 index & (size – 1) 則為求取的下標(biāo)（這不難理解）
使用了 in 和 out 兩個(gè)索引且 in 和 out 是一直遞增的（此做法比較巧妙），這樣能夠避免一些復(fù)雜的條件判斷（某些實(shí)現(xiàn)下，in == out 時(shí)還無法區(qū)分緩沖區(qū)是空還是滿）

這里，索引 in 和 out 被兩個(gè)線程訪問。in 和 out 指明了緩沖區(qū)中實(shí)際數(shù)據(jù)的邊界，也就是 in 和 out 同緩沖區(qū)數(shù)據(jù)存在訪問上的順序關(guān)系，由于未使用同步機(jī)制，那么保證順序關(guān)系就需要使用到 Memory barrier 了。索引 in 和 out 都分別只被一個(gè)線程修改，而被兩個(gè)線程讀取。__kfifo_put 先通過 in 和 out 來確定可以向緩沖區(qū)中寫入數(shù)據(jù)量的多少，這時(shí)，out 索引應(yīng)該先被讀取后才能真正的將用戶 buffer 中的數(shù)據(jù)寫入緩沖區(qū)，因此這里使用到了 smp_mb()，對(duì)應(yīng)的，__kfifo_get 也使用 smp_mb() 來確保修改 out 索引之前緩沖區(qū)中數(shù)據(jù)已經(jīng)被成功讀取并寫入用戶 buffer 中了。對(duì)于 in 索引，在 __kfifo_put 中，通過 smp_wmb() 保證先向緩沖區(qū)寫入數(shù)據(jù)后才修改 in 索引，由于這里只需要保證寫入操作有序，故選用寫操作 barrier，在 __kfifo_get 中，通過 smp_rmb() 保證先讀取了 in 索引（這時(shí)候 in 索引用于確定緩沖區(qū)中實(shí)際存在多少可讀數(shù)據(jù)）才開始讀取緩沖區(qū)中數(shù)據(jù)（并寫入用戶 buffer 中），由于這里只需要保證讀取操作有序，故選用讀操作 barrier。

到這里，Memory barrier 就介紹完畢了。

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自：老匹夫 > 《Lock》

舉報(bào)/認(rèn)領(lǐng)