內(nèi)核等待隊列機制原理分析linux kernel 2009-04-24 10:52:58 閱讀26 評論0 字號:大中小 訂閱 本文轉(zhuǎn)自網(wǎng)友
1. 等待隊列數(shù)據(jù)結(jié)構(gòu) 等待隊列由雙向鏈表實現(xiàn),,其元素包括指向進(jìn)程描述符的指針。每個等待隊列都有一個等待隊列頭(wait queue head),等待隊列頭是一個類型為wait_queque_head_t的數(shù)據(jù)結(jié)構(gòu): struct __wait_queue_head { spinlock_t lock; struct list_head task_list; }; typedef struct __wait_queue_head wait_queue_head_t; 其中,,lock是用來防止并發(fā)訪問,task_list字段是等待進(jìn)程鏈表的頭,。
等待隊列鏈表中的元素類型為wait_queue_t,我們可以稱之為等待隊列項: struct __wait_queue { unsigned int flags; #define WQ_FLAG_EXCLUSIVE 0x01 void *private; wait_queue_func_t func; struct list_head task_list; }; typedef struct __wait_queue wait_queue_t; 每一個等待隊列項代表一個睡眠進(jìn)程,,該進(jìn)程等待某一事件的發(fā)生。它的描述符地址通常放在private字段中,。Task_list字段中包含的是指針,,由這個指針把一個元素鏈接到等待相同事件的進(jìn)程鏈表中。 等待隊列元素的func字段用來表示等待隊列中睡眠進(jìn)程應(yīng)該用什么方式喚醒(互斥方式和非互斥方式),。
整個等待隊列的結(jié)構(gòu)如下圖所示:
下面看看等待隊列的工作原理,。
2. 等待隊列的睡眠過程 使用等待隊列前通常先定義一個等待隊列頭:static wait_queue_head_t wq ,然后調(diào)用wait_event_*函數(shù)將等待某條件condition的當(dāng)前進(jìn)程插入到等待隊列wq中并睡眠,一直等到condition條件滿足后,,內(nèi)核再將睡眠在等待隊列wq上的某一進(jìn)程或所有進(jìn)程喚醒,。
定義等待隊列頭沒什么好講的,下面從調(diào)用wait_event_*開始分析: 這里我們舉比較常用的wait_event_interruptible: /** * wait_event_interruptible - sleep until a condition gets true * @wq: the waitqueue to wait on * @condition: a C expression for the event to wait for * * The process is put to sleep (TASK_INTERRUPTIBLE) until the * @condition evaluates to true or a signal is received. * The @condition is checked each time the waitqueue @wq is woken up. * * wake_up() has to be called after changing any variable that could * change the result of the wait condition. * * The function will return -ERESTARTSYS if it was interrupted by a * signal and 0 if @condition evaluated to true. */ #define wait_event_interruptible(wq, condition) \ ({ \ int __ret = 0; \ if (!(condition)) \ __wait_event_interruptible(wq, condition, __ret); \ __ret; \ }) 這里很簡單,,判斷一下condition條件是否滿足,,如果不滿足則調(diào)用__wait_event_interruptible函數(shù)。
#define __wait_event_interruptible(wq, condition, ret) \ do { \ DEFINE_WAIT(__wait); \ \ for (;;) { \ prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \ if (condition) \ break; \ if (!signal_pending(current)) { \ schedule(); \ continue; \ } \ ret = -ERESTARTSYS; \ break; \ } \ finish_wait(&wq, &__wait); \ } while (0)
__wait_event_interruptible首先定義了一個wait_queue_t類型的等待隊列項__wait : #define DEFINE_WAIT(name) \ wait_queue_t name = { \ .private = current, \ .func = autoremove_wake_function, \ .task_list = LIST_HEAD_INIT((name).task_list), \ } 可以發(fā)現(xiàn),,這里__wait的private成員(通常用來存放進(jìn)程的描述符)已經(jīng)被初始化為current, 表示該等待隊列項對應(yīng)為當(dāng)前進(jìn)程,。func成員為該等待隊列項對應(yīng)的喚醒函數(shù),該進(jìn)程被喚醒后會執(zhí)行它,,已經(jīng)被初始化為默認(rèn)的autoremove_wake_function函數(shù),。
然后在一個for (;;) 循環(huán)內(nèi)調(diào)用prepare_to_wait函數(shù): void fastcall prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state) { unsigned long flags;
wait->flags &= ~WQ_FLAG_EXCLUSIVE; spin_lock_irqsave(&q->lock, flags); if (list_empty(&wait->task_list)) __add_wait_queue(q, wait); /* * don't alter the task state if this is just going to * queue an async wait queue callback */ if (is_sync_wait(wait)) set_current_state(state); spin_unlock_irqrestore(&q->lock, flags); } prepare_to_wait做如下兩件事,將先前定義的等待隊列項__wait插入到等待隊列頭wq,,然后將當(dāng)前進(jìn)程設(shè)為TASK_INTERRUPTIBLE狀態(tài),。prepare_to_wait執(zhí)行完后立馬再檢查一下condition有沒有滿足,如果此時碰巧滿足了則不必要在睡眠了,。如果還沒有滿足,,則準(zhǔn)備睡眠。
睡眠是通過調(diào)用schedule()函數(shù)實現(xiàn)的,,由于之前已經(jīng)將當(dāng)前進(jìn)程設(shè)置為TASK_INTERRUPTIBLE狀態(tài),,因而這里再執(zhí)行schedule()進(jìn)行進(jìn)程切換的話,之后就永遠(yuǎn)不會再調(diào)度到該進(jìn)程運行的,,直到該進(jìn)程被喚醒(即更改為TASK_RUNNING狀態(tài)),。 這里在執(zhí)行schedule()切換進(jìn)程前會先判斷一下有沒signal過來,,如果有則立即返回ERESTARTSYS。沒有的話則執(zhí)行schedule()睡眠去了,。
for (;;) 循環(huán)的作用是讓進(jìn)程被喚醒后再一次去檢查一下condition是否滿足,。主要是為了防止等待隊列上的多個進(jìn)程被同時喚醒后有可能其他進(jìn)程已經(jīng)搶先把資源占有過去造成資源又變?yōu)椴豢捎茫虼俗詈迷倥袛嘁幌隆?/span>(當(dāng)然,,內(nèi)核也提供了僅喚醒一個或多個進(jìn)程(獨占等待進(jìn)程)的方式,,有興趣的可以參考相關(guān)資料)
進(jìn)程被喚醒后最后一步是調(diào)用finish_wait(&wq, &__wait)函數(shù)進(jìn)行清理工作。finish_wait將進(jìn)程的狀態(tài)再次設(shè)為TASK_RUNNING并從等待隊列中刪除該進(jìn)程,。 void fastcall finish_wait(wait_queue_head_t *q, wait_queue_t *wait) { unsigned long flags;
__set_current_state(TASK_RUNNING);
if (!list_empty_careful(&wait->task_list)) { spin_lock_irqsave(&q->lock, flags); list_del_init(&wait->task_list); spin_unlock_irqrestore(&q->lock, flags); } }
再往后就是返回你先前調(diào)用wait_event_interruptible(wq, condition)被阻塞的地方繼續(xù)往下執(zhí)行,。
3. 等待隊列的喚醒過程 直到這里我們明白等待隊列是如何睡眠的,下面我們分析等待隊列的喚醒過程,。 使用等待隊列有個前提,,必須得有人喚醒它,如果沒人喚醒它,,那么同眠在該等待隊列上的所有進(jìn)程豈不是變成“僵尸進(jìn)程”了,。
對于設(shè)備驅(qū)動來講,通常是在中斷處理函數(shù)內(nèi)喚醒該設(shè)備的等待隊列,。驅(qū)動程序通常會提供一組自己的讀寫等待隊列以實現(xiàn)上層(user level)所需的BLOCK和O_NONBLOCK操作,。當(dāng)設(shè)備資源可用時,如果驅(qū)動發(fā)現(xiàn)有進(jìn)程睡眠在自己的讀寫等待隊列上便會喚醒該等待隊列,。
喚醒一個等待隊列是通過wake_up_*函數(shù)實現(xiàn)的,。這里我們舉對應(yīng)的wake_up_interruptible作為例子分析。定義如下: #define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL) 這里的參數(shù)x即要喚醒的等待隊列對應(yīng)的等待隊列頭,。喚醒TASK_INTERRUPTIBLE類型的進(jìn)程并且默認(rèn)喚醒該隊列上所有非獨占等待進(jìn)程和一個獨占等待進(jìn)程,。
__wake_up定義如下: /** * __wake_up - wake up threads blocked on a waitqueue. * @q: the waitqueue * @mode: which threads * @nr_exclusive: how many wake-one or wake-many threads to wake up * @key: is directly passed to the wakeup function */ void fastcall __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, void *key) { unsigned long flags;
spin_lock_irqsave(&q->lock, flags); __wake_up_common(q, mode, nr_exclusive, 1, key); spin_unlock_irqrestore(&q->lock, flags); preempt_check_resched_delayed(); } __wake_up 簡單的調(diào)用__wake_up_common進(jìn)行實際喚醒工作。 __wake_up_common定義如下: /* * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve * number) then we wake all the non-exclusive tasks and one exclusive task. * * There are circumstances in which we can try to wake a task which has already * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns * zero in this (rare) case, and we handle it by continuing to scan the queue. */ static void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync, void *key) { struct list_head *tmp, *next;
list_for_each_safe(tmp, next, &q->task_list) { wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list); unsigned flags = curr->flags;
if (curr->func(curr, mode, sync, key) && (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) break; } } __wake_up_common循環(huán)遍歷等待隊列內(nèi)的所有元素,,分別執(zhí)行其對應(yīng)的喚醒函數(shù)。 這里的喚醒函數(shù)即先前定義等待隊列項DEFINE_WAIT(__wait)時默認(rèn)初始化的autoremove_wake_function函數(shù),。autoremove_wake_function最終會調(diào)用try_to_wake_up函數(shù)將進(jìn)程置為TASK_RUNNING狀態(tài),。這樣后面的進(jìn)程調(diào)度便會調(diào)度到該進(jìn)程,從而喚醒該進(jìn)程繼續(xù)執(zhí)行,。
Reference: 1) OReilly.Understanding.the.Linux.Kernel.3rd.Edition.Nov.2005.HAPPY.NEW.YEAR 2) Linux 2.6.18_Pro500 (Montavista) |
|