久久国产成人av_抖音国产毛片_a片网站免费观看_A片无码播放手机在线观看,色五月在线观看,亚洲精品m在线观看,女人自慰的免费网址,悠悠在线观看精品视频,一级日本片免费的,亚洲精品久,国产精品成人久久久久久久

分享

Linux文件系統(tǒng)之文件的讀寫

 szhlwang 2011-10-16
------------------------------------------
本文系本站原創(chuàng),歡迎轉(zhuǎn)載!
轉(zhuǎn)載請注明出處:http://ericxiao./
------------------------------------------
一:前言
文件的讀寫是文件系統(tǒng)中最核心也是最復雜的一部份,它牽涉到了很多的概念.之前分析文件系統(tǒng)其它操作的時候,遇到與文件系統(tǒng)相關(guān)的讀寫部份都忽略過去了.在這一節(jié)里,來討論一下文件的讀寫是怎樣實現(xiàn)的.
二:I/O請求的概述
如之前所提到的,為了提高文件的操作效率,文件系統(tǒng)中的內(nèi)容都是緩存在內(nèi)存里的.每當發(fā)起一個Rear/Write請求的時候,都會到頁面高速緩存中尋找具體的頁面.如果頁面不存在,則在頁面高速緩存中建立相關(guān)頁面的緩存.如果當前的頁面不是最新的.那就必須要到具體的文件系統(tǒng)中讀取數(shù)據(jù)了.一般來說,內(nèi)核提供了這樣的界面:它產(chǎn)生一個I/O請求.這個界面為上層隱藏了下層的不同實現(xiàn).在這個界面中,將產(chǎn)生的I/O請求提交給I/O調(diào)度.再與I/O調(diào)度調(diào)用具體的塊設(shè)備驅(qū)動程序.
整個過程如下圖所示:
上圖中的Generic Block Layer就是上面描述中所說的I/O的界面.
接下來我們以上圖從下到上的層次進行討論.
三:塊設(shè)備驅(qū)動
塊設(shè)備與字符設(shè)備的區(qū)別在于:塊設(shè)備可以隨機的訪問,例如磁盤.正是因為它可以隨機訪問,內(nèi)核才需要一個高效的手段去管理每一個塊設(shè)備.例如對磁盤的操作,每次移動磁針都需要花不少的時候,所以盡量讓其處理完相同磁道內(nèi)的請求再將磁針移動到另外的磁道.而對于字符設(shè)備來說,不存在這樣的顧慮,只需按順序從里面讀/寫就可以了.
先來看一下塊設(shè)備驅(qū)動所涉及到的數(shù)據(jù)結(jié)構(gòu).
3.1: block_device結(jié)構(gòu):
struct block_device {
     //主次驅(qū)備號
     dev_t              bd_dev;  /* not a kdev_t - it's a search key */
     //指向bdev文件系統(tǒng)中塊設(shè)備對應的文件索引號
     struct inode *         bd_inode; /* will die */
     //計數(shù)器,統(tǒng)計塊驅(qū)備被打開了多少次
     int           bd_openers;
     // 塊設(shè)備打開和關(guān)閉的信號量
     struct semaphore   bd_sem;  /* open/close mutex */
     //禁止在塊設(shè)備上建行新安裝的信號量
     struct semaphore   bd_mount_sem; /* mount mutex */
     //已打開的塊設(shè)備文件inode鏈表
     struct list_head   bd_inodes;
     //塊設(shè)備描述符的當前擁有者
     void *             bd_holder;
     //統(tǒng)計字段,統(tǒng)計對bd_holder進行更改的次數(shù)
     int           bd_holders;
     //如果當前塊設(shè)備是一個分區(qū),此成員指向它所屬的磁盤的設(shè)備
     //否則指向該描述符的本身
     struct block_device *  bd_contains;
     //塊大小
     unsigned      bd_block_size;
     //指向分區(qū)描述符的指針
     struct hd_struct * bd_part;
     /* number of times partitions within this device have been opened. */
     //統(tǒng)計字段,統(tǒng)計塊設(shè)備分區(qū)被打開的次數(shù)
     unsigned      bd_part_count;
     //讀取塊設(shè)備分區(qū)表時設(shè)置的標志
     int           bd_invalidated;
     //指向塊設(shè)備所屬磁盤的gendisk
     struct gendisk *   bd_disk;
     //指向塊設(shè)備描述符鏈表的指針
     struct list_head   bd_list;
     //指向塊設(shè)備的專門描述符backing_dev_info
     struct backing_dev_info *bd_inode_backing_dev_info;
     /*
      * Private data.  You must have bd_claim'ed the block_device
      * to use this.  NOTE:  bd_claim allows an owner to claim
      * the same device multiple times, the owner must take special
      * care to not mess up bd_private for that case.
      */
      //塊設(shè)備的私有區(qū)
     unsigned long      bd_private;
}
通常,對于塊設(shè)備來說還涉及到一個分區(qū)問題.分區(qū)在內(nèi)核中是用hd_struct來表示的.
3.2: hd_struct結(jié)構(gòu):
struct hd_struct {
     //磁盤分區(qū)的起始扇區(qū)
     sector_t start_sect;
     //分區(qū)的長度,即扇區(qū)的數(shù)目
     sector_t nr_sects;
     //內(nèi)嵌的kobject
     struct kobject kobj;
     //分區(qū)的讀操作次數(shù),讀取扇區(qū)數(shù),寫操作次數(shù),寫扇區(qū)數(shù)
     unsigned reads, read_sectors, writes, write_sectors;
     //policy:如果分區(qū)是只讀的,置為1.否則為0
     //partno:磁盤中分區(qū)的相對索引
     int policy, partno;
}
每個具體的塊設(shè)備都會都應一個磁盤,在內(nèi)核中磁盤用gendisk表示.
3.3: gendisk結(jié)構(gòu):
struct gendisk {
     //磁盤的主驅(qū)備號
     int major;             /* major number of driver */
     //與磁盤關(guān)聯(lián)的第一個設(shè)備號
     int first_minor;
     //與磁盤關(guān)聯(lián)的設(shè)備號范圍
     int minors;                     /* maximum number of minors, =1 for
                                         * disks that can't be partitioned. */
     //磁盤的名字
     char disk_name[32];         /* name of major driver */
     //磁盤的分區(qū)描述符數(shù)組                                      
     struct hd_struct **part;    /* [indexed by minor] */
     //塊設(shè)備的操作指針
     struct block_device_operations *fops;
     //指向磁盤請求隊列指針
     struct request_queue *queue;
     //塊設(shè)備的私有區(qū)
     void *private_data;
     //磁盤內(nèi)存區(qū)大?。ㄉ葏^(qū)數(shù)目)
     sector_t capacity;
     //描述磁盤類型的標志
     int flags;
     //devfs 文件系統(tǒng)中的名字
     char devfs_name[64];        /* devfs crap */
     //不再使用
     int number;            /* more of the same */
     //指向磁盤中硬件設(shè)備的device指針
     struct device *driverfs_dev;
     //內(nèi)嵌kobject指針
     struct kobject kobj;
     //記錄磁盤中斷定時器
     struct timer_rand_state *random;
     //如果只讀,此值為1.否則為0
     int policy;
     //寫入磁盤的扇區(qū)數(shù)計數(shù)器
     atomic_t sync_io;      /* RAID */
     //統(tǒng)計磁盤隊列使用情況的時間戳
     unsigned long stamp, stamp_idle;
     //正在進行的I/O操作數(shù)
     int in_flight;
     //統(tǒng)計每個CPU使用磁盤的情況
#ifdef   CONFIG_SMP
     struct disk_stats *dkstats;
#else
     struct disk_stats dkstats;
#endif
}
以上三個數(shù)據(jù)結(jié)構(gòu)的關(guān)系,如下圖所示:
如上圖所示:
每個塊設(shè)備分區(qū)的bd_contains會指它的總塊設(shè)備節(jié)點,它的bd_part會指向它的分區(qū)表.bd_disk會指向它所屬的磁盤.
從上圖中也可以看出:每個磁盤都會對應一個request_queue.對于上層的I/O請求就是通過它來完成的了.它的結(jié)構(gòu)如下:
3.4:request_queue結(jié)構(gòu):
struct request_queue
{
     /*
      * Together with queue_head for cacheline sharing
      */
      //待處理請求的鏈表
     struct list_head   queue_head;
     //指向隊列中首先可能合并的請求描述符
     struct request         *last_merge;
     //指向I/O調(diào)度算法指針
     elevator_t         elevator;
 
     /*
      * the queue request freelist, one for reads and one for writes
      */
      //為分配請請求描述符所使用的數(shù)據(jù)結(jié)構(gòu)
     struct request_list    rq;
 
     //驅(qū)動程序策略例程入口點的方法
     request_fn_proc        *request_fn;
     //檢查是否可能將bio合并到請求隊列的最后一個請求的方法
     merge_request_fn   *back_merge_fn;
     //檢查是否可能將bio合并到請求隊列的第一個請求中的方法
     merge_request_fn   *front_merge_fn;
     //試圖合并兩個相鄰請求的方法
     merge_requests_fn  *merge_requests_fn;
     //將一個新請求插入請求隊列時所調(diào)用的方法
     make_request_fn        *make_request_fn;
     //該方法反這個處理請求的命令發(fā)送給硬件設(shè)備
     prep_rq_fn         *prep_rq_fn;
     //去掉塊設(shè)備方法
     unplug_fn     *unplug_fn;
     //當增加一個新段時,該方法駝回可插入到某個已存在的bio  結(jié)構(gòu)中的字節(jié)數(shù)
     merge_bvec_fn      *merge_bvec_fn;
     //將某個請求加入到請求隊列時,會調(diào)用此方法
     activity_fn        *activity_fn;
     //刷新請求隊列時所調(diào)用的方法
     issue_flush_fn         *issue_flush_fn;
 
     /*
      * Auto-unplugging state
      */
      //插入設(shè)備時所用到的定時器
     struct timer_list  unplug_timer;
     //如果請求隊列中待處理請求數(shù)大于該值,將立即去掉請求設(shè)備
     int           unplug_thresh;     /* After this many requests */
     //去掉設(shè)備之間的延遲
     unsigned long      unplug_delay; /* After this many jiffies */
     //去掉設(shè)備時使用的操作隊列
     struct work_struct unplug_work;
     //
     struct backing_dev_info backing_dev_info;
 
     /*
      * The queue owner gets to use this for whatever they like.
      * ll_rw_blk doesn't touch it.
      */
      //指向塊設(shè)備驅(qū)動程序中的私有數(shù)據(jù)
     void          *queuedata;
     //activity_fn()所用的參數(shù)
     void          *activity_data;
 
     /*
      * queue needs bounce pages for pages above this limit
      */
      //如果頁框號大于該值,將使用回彈緩存沖
     unsigned long      bounce_pfn;
     //回彈緩存區(qū)頁面的分配標志
     int           bounce_gfp;
 
     /*
      * various queue flags, see QUEUE_* below
      */
      //描述請求隊列的標志
     unsigned long      queue_flags;
 
     /*
      * protects queue structures from reentrancy
      */
      //指向請求隊列鎖的指針
     spinlock_t         *queue_lock;
 
     /*
      * queue kobject
      */
      //內(nèi)嵌的kobject
     struct kobject kobj;
 
     /*
      * queue settings
      */
      //請求隊列中允許的最大請求數(shù)
     unsigned long      nr_requests;  /* Max # of requests */
     //如果待請求的數(shù)目超過了該值,則認為該隊列是擁擠的
     unsigned int       nr_congestion_on;
     //如果待請求數(shù)目在這個閥值下,則認為該隊列是不擁擠的
     unsigned int       nr_congestion_off;
 
     //單個請求所能處理的最大扇區(qū)(可調(diào)的)
     unsigned short         max_sectors;
     //單個請求所能處理的最大扇區(qū)(硬約束)
     unsigned short         max_hw_sectors;
     //單個請求所能處理的最大物理段數(shù)
     unsigned short         max_phys_segments;
     //單個請求所能處理的最大物理段數(shù)(DMA的約束)
     unsigned short         max_hw_segments;
     //扇區(qū)中以字節(jié) 為單位的大小
     unsigned short         hardsect_size;
     //物理段的最大長度(以字節(jié)為單位)
     unsigned int       max_segment_size;
     //段合并的內(nèi)存邊界屏弊字
     unsigned long      seg_boundary_mask;
     //DMA緩沖區(qū)的起始地址和長度的對齊
     unsigned int       dma_alignment;
     //空閑/忙標記的位圖.用于帶標記的請求
     struct blk_queue_tag   *queue_tags;
     //請求隊列的引用計數(shù)
     atomic_t      refcnt;
     //請求隊列中待處理的請求數(shù)
     unsigned int       in_flight;
 
     /*
      * sg stuff
      */
      //用戶定義的命令超時
     unsigned int       sg_timeout;
     //Not Use
     unsigned int       sg_reserved_size;
}
request_queue表示的是一個請求隊列,每一個請求都是用request來表示的.
3.5: request結(jié)構(gòu):
struct request {
     //用來形成鏈表
     struct list_head queuelist; /* looking for ->queue? you must _not_
                        * access it directly, use
                        * blkdev_dequeue_request! */
     //請求描述符的標志              
     unsigned long flags;        /* see REQ_ bits below */
 
     /* Maintain bio traversal state for part by part I/O submission.
      * hard_* are block layer internals, no driver should touch them!
      */
     //要傳送的下一個扇區(qū)
     sector_t sector;       /* next sector to submit */
     //要傳送的扇區(qū)數(shù)目
     unsigned long nr_sectors;   /* no. of sectors left to submit */
     /* no. of sectors left to submit in the current segment */
     //當前bio段傳送扇區(qū)的數(shù)目
     unsigned int current_nr_sectors;
     //要傳送的下一個扇區(qū)號
     sector_t hard_sector;       /* next sector to complete */
     //整個過程中要傳送的扇區(qū)號
     unsigned long hard_nr_sectors;   /* no. of sectors left to complete */
     /* no. of sectors left to complete in the current segment */
     //當前bio段要傳送的扇區(qū)數(shù)目
     unsigned int hard_cur_sectors;
 
     /* no. of segments left to submit in the current bio */
     //
     unsigned short nr_cbio_segments;
     /* no. of sectors left to submit in the current bio */
     unsigned long nr_cbio_sectors;
 
     struct bio *cbio;      /* next bio to submit */
     //請求中第一個沒有完成的bio
     struct bio *bio;       /* next unfinished bio to complete */
     //最后的bio
     struct bio *biotail;
     //指向I/O調(diào)度的私有區(qū)
     void *elevator_private;
     //請求的狀態(tài)
     int rq_status;     /* should split this into a few status bits */
     //請求所引用的磁盤描述符
     struct gendisk *rq_disk;
     //統(tǒng)計傳送失敗的計數(shù)
     int errors;
     //請求開始的時間
     unsigned long start_time;
 
     /* Number of scatter-gather DMA addr+len pairs after
      * physical address coalescing is performed.
      */
      //請求的物理段數(shù)
     unsigned short nr_phys_segments;
 
     /* Number of scatter-gather addr+len pairs after
      * physical and DMA remapping hardware coalescing is performed.
      * This is the number of scatter-gather entries the driver
      * will actually have to deal with after DMA mapping is done.
      */
      //請求的硬段數(shù)
     unsigned short nr_hw_segments;
     //與請求相關(guān)的標識
     int tag;
     //數(shù)據(jù)傳送的緩沖區(qū),如果是高端內(nèi)存,此成員值為NULL
     char *buffer;
     //請求的引用計數(shù)
     int ref_count;
     //指向包含請求的請求隊列描述符
     request_queue_t *q;
     struct request_list *rl;
     //指向數(shù)據(jù)傳送終止的completion
     struct completion *waiting;
     //對設(shè)備發(fā)達“特殊請求所用到的指針”
     void *special;
 
     /*
      * when request is used as a packet command carrier
      */
      //cmd中的數(shù)據(jù)長度
     unsigned int cmd_len;
     //請求類型
     unsigned char cmd[BLK_MAX_CDB];
     //data中的數(shù)據(jù)長度
     unsigned int data_len;
     //為了跟蹤所傳輸?shù)臄?shù)據(jù)而使用的指針
     void *data;
     //sense字段的數(shù)據(jù)長度
     unsigned int sense_len;
     //指向輸出sense緩存區(qū)
     void *sense;
     //請求超時
     unsigned int timeout;
 
     /*
      * For Power Management requests
      */
      //指向電源管理命令所用的結(jié)構(gòu)
     struct request_pm_state *pm;
}
請求隊列描述符與請求描述符都很復雜,為了簡化驅(qū)動的設(shè)計,內(nèi)核提供了一個API,供塊設(shè)備驅(qū)動程序來初始化一個請求隊列.這就是blk_init_queue().它的代碼如下:
//rfn:驅(qū)動程序自動提供的操作I/O的函數(shù).對應請求隊列的request_fn
//lock:驅(qū)動程序提供給請求隊列的自旋鎖
request_queue_t *blk_init_queue(request_fn_proc *rfn, spinlock_t *lock)
{
     request_queue_t *q;
     static int printed;
     //申請請求隊列描述符
     q = blk_alloc_queue(GFP_KERNEL);
     if (!q)
         return NULL;
     //初始化q->request_list
     if (blk_init_free_list(q))
         goto out_init;
 
     if (!printed) {
         printed = 1;
         printk("Using %s io scheduler\n", chosen_elevator->elevator_name);
     }
 
     //初始化請求隊列描述符中的各項操作函數(shù)
     q->request_fn      = rfn;
     q->back_merge_fn       = ll_back_merge_fn;
     q->front_merge_fn      = ll_front_merge_fn;
     q->merge_requests_fn   = ll_merge_requests_fn;
     q->prep_rq_fn      = NULL;
     q->unplug_fn       = generic_unplug_device;
     q->queue_flags         = (1 << QUEUE_FLAG_CLUSTER);
     q->queue_lock      = lock;
 
    
     blk_queue_segment_boundary(q, 0xffffffff);
     //設(shè)置q->make_request_fn函數(shù),初始化等待隊對列的定時器和等待隊列
     blk_queue_make_request(q, __make_request);
     //設(shè)置max_segment_size,max_hw_segments,max_phys_segments
     blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);
     blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
     blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
 
     /*
      * all done
      */
      //設(shè)置等待隊列的I/O調(diào)度程序
     if (!elevator_init(q, chosen_elevator))
         return q;
     //失敗的處理
     blk_cleanup_queue(q);
out_init:
     kmem_cache_free(requestq_cachep, q);
     return NULL;
}
這個函數(shù)中初始化了很多操作指針,這個函數(shù)在所有塊設(shè)備中都是一樣的,這樣就為通用塊設(shè)備層提供了一個統(tǒng)一的接口.對于塊設(shè)備驅(qū)動的接口就是我們在blk_init_queue中設(shè)置的策略例程了.留意一下關(guān)于請求隊列的各操作的設(shè)置,這在后續(xù)的分析中會用到.
另外,在請求結(jié)構(gòu)中涉及到了bio結(jié)構(gòu).bio表示一個段.目前內(nèi)核中關(guān)于I/O的所有操作都是由它來表示的.它的結(jié)構(gòu)如下所示:
struct bio {
     //段的起始扇區(qū)
     sector_t      bi_sector;
     //下一個bio
     struct bio         *bi_next; /* request queue link */
     //段所在的塊設(shè)備
     struct block_device    *bi_bdev;
     //bio的標志
     unsigned long      bi_flags; /* status, command, etc */
     //Read/Write
     unsigned long      bi_rw;        /* bottom bits READ/WRITE,
                             * top bits priority
                             */
     //bio_vec的項數(shù)
     unsigned short         bi_vcnt; /* how many bio_vec's */
     //當前正在操作的bio_vec
     unsigned short         bi_idx;       /* current index into bvl_vec */
 
     /* Number of segments in this BIO after
      * physical address coalescing is performed.
      */
      //結(jié)合后的片段數(shù)目
     unsigned short         bi_phys_segments;
 
     /* Number of segments after physical and DMA remapping
      * hardware coalescing is performed.
      */
      //重映射后的片段數(shù)目
     unsigned short         bi_hw_segments;
     //I/O計數(shù)
     unsigned int       bi_size; /* residual I/O count */
 
     /*
      * To keep track of the max hw size, we account for the
      * sizes of the first and last virtually mergeable segments
      * in this bio
      */
      //第一個可以合并的段大小
     unsigned int       bi_hw_front_size;
     //最后一個可以合并的段大小
     unsigned int       bi_hw_back_size;
     //最大的bio_vec項數(shù)
     unsigned int       bi_max_vecs;  /* max bvl_vecs we can hold */
     //bi_io_vec數(shù)組
     struct bio_vec         *bi_io_vec;   /* the actual vec list */
     //I/O完成的方法
     bio_end_io_t       *bi_end_io;
     //使用計數(shù)
     atomic_t      bi_cnt;       /* pin count */
     //擁有者的私有區(qū)
     void          *bi_private;
     //銷毀此bio的方法
     bio_destructor_t   *bi_destructor;    /* destructor */
}
bio_vec的結(jié)構(gòu)如下:
struct bio_vec {
     //bi_vec所表示的頁面
     struct page   *bv_page;
     //數(shù)據(jù)區(qū)的長度
     unsigned int  bv_len;
     //在頁面中的偏移量
     unsigned int  bv_offset;
}
關(guān)于bio與bio_vec的關(guān)系,用下圖表示:
現(xiàn)在,我們來思考一個問題:
當一個I/O請求提交給請求隊列后,它是怎么去調(diào)用塊設(shè)備驅(qū)動的策略例程去完成這次I/O的呢?還有,當一個I/O請求被提交給請求隊列時,會不會立即調(diào)用驅(qū)動中的策略例程去完成這次I/O呢,?
實際上,為了提高效率,所有的I/O都會在一個特定的延時之后才會調(diào)用策略例程去完成本次I/O.我們來看一個反面的例子,假設(shè)I/O在被提交后馬上得到執(zhí)行.例如.磁盤有磁針在磁盤12.現(xiàn)在有一個磁道1的請求.就會將磁針移動到磁道1.操作完后,又有一個請求過來了,它要操作磁道11.然后又會將磁針移到磁道11.操作完后,又有一個請求過來,要求操作磁道4.此時會將磁針移到磁道4.這個例子中,磁針移動的位置是:12->1->11->4.實際上,磁針的定位是一個很耗時的操作.這樣下去,毫無疑問會影響整個系統(tǒng)的效率.我們可以在整個延時內(nèi),將所有I/O操作按順序排列在一起,然后再調(diào)用策略例程.于是上例的磁針移動就會變成12->11->4->1.此時磁針只會往一個方向移動.
至于怎么樣排列請求和選取哪一個請求進行操作,這就是I/O調(diào)度的任務了.這部份我們在通用塊層再進行分析.
內(nèi)核中有兩個操作會完成上面的延時過程.即:激活塊設(shè)備驅(qū)動程序和撤消塊設(shè)備驅(qū)動程序.
3.6:塊設(shè)備驅(qū)動程序的激活和撤消
激活塊設(shè)備驅(qū)動程序和撤消塊設(shè)備驅(qū)動程序在內(nèi)核中對應的接口為blk_plug_device()和blk_remove_plug().分別看下它們的操作:
void blk_plug_device(request_queue_t *q)
{
     WARN_ON(!irqs_disabled());
 
     /*
      * don't plug a stopped queue, it must be paired with blk_start_queue()
      * which will restart the queueing
      */
 
     //如果設(shè)置了QUEUE_FLAG_STOPPED.直接退出
     if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
         return;
 
     //為請求隊列設(shè)置QUEUE_FLAG_PLUGGED.
     if (!test_and_set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags))
         //如果之前請求隊列的狀態(tài)不為QUEUE_FLAG_PLUGGED,則設(shè)置定時器超時時間
         mod_timer(&q->unplug_timer, jiffies + q->unplug_delay);
}
 
int blk_remove_plug(request_queue_t *q)
{
     WARN_ON(!irqs_disabled());
 
     //將隊列QUEUE_FLAG_PLUGGED狀態(tài)清除
     if (!test_and_clear_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags))
         //如果請求隊列之前不為QUEUE_FLAG_PLUGGED標志,直接返回
         return 0;
//如果之前是QUEUE_FLAG_PLUGGED標志,則將定時器刪除
     del_timer(&q->unplug_timer);
     return 1;
}
如果請求隊列狀態(tài)為QUEUE_FLAG_PLUGGED,且定時器超時,會有什么樣的操作呢,?
回憶在請求隊列初始化函數(shù)中,blk_init_queue()會調(diào)用blk_queue_make_request().它的代碼如下:
void blk_queue_make_request(request_queue_t * q, make_request_fn * mfn)
{
     ……
     ……
     q->unplug_delay = (3 * HZ) / 1000;   /* 3 milliseconds */
     if (q->unplug_delay == 0)
         q->unplug_delay = 1;
 
     INIT_WORK(&q->unplug_work, blk_unplug_work, q);
 
     q->unplug_timer.function = blk_unplug_timeout;
     q->unplug_timer.data = (unsigned long)q;
     ……
     ……
}
上面設(shè)置了定時器的時間間隔為(3*HZ)/1000.定時器超時的處理函數(shù)為blk_unplug_timeout().參數(shù)為請求隊列本身.
blk_unplug_timeout()的代碼如下:
static void blk_unplug_timeout(unsigned long data)
{
     request_queue_t *q = (request_queue_t *)data;
 
     kblockd_schedule_work(&q->unplug_work);
}
從上面的代碼看出,定時器超時之后,會喚醒q->unplug_work這個工作對列.
在blk_queue_make_request()中,對這個工作隊列的初始化為:
INIT_WORK(&q->unplug_work, blk_unplug_work, q)
即工作隊列對應的函數(shù)為blk_unplug_work().對應的參數(shù)為請求隊列本身.代碼如下:
static void blk_unplug_work(void *data)
{
     request_queue_t *q = data;
 
     q->unplug_fn(q);
}
到此,就會調(diào)用請求隊列的unplug_fn()操作.
在blk_init_queue()對這個成員的賦值如下所示:
     q->unplug_fn       = generic_unplug_device;
generic_unplug_device()對應的代碼如下:
void __generic_unplug_device(request_queue_t *q)
{
     //如果請求隊列是QUEUE_FLAG_STOPPED 狀態(tài),返回
     if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
         return;
     //如果請求隊列的狀態(tài)是QUEUE_FLAG_PLUGGED.就會返回1
     if (!blk_remove_plug(q))
         return;
 
     /*
      * was plugged, fire request_fn if queue has stuff to do
      */
      //如果請求對列中的請求,則調(diào)用請求隊列的reauest_fn函數(shù).也就是驅(qū)動程序的
      //策略例程
     if (elv_next_request(q))
         q->request_fn(q);
}
blk_remove_plug()在上面已經(jīng)分析過了.這里不再贅述.
歸根到底,最后的I/O完成操作都會調(diào)用塊設(shè)備驅(qū)動的策略例程來完成.
四:I/O調(diào)度層
I/O調(diào)度對應的結(jié)構(gòu)如下所示:
struct elevator_s
{
     //當要插入一個bio時會調(diào)用
     elevator_merge_fn *elevator_merge_fn;
     elevator_merged_fn *elevator_merged_fn;
     elevator_merge_req_fn *elevator_merge_req_fn;
     //取得下一個請求
     elevator_next_req_fn *elevator_next_req_fn;
     //往請求隊列中增加請求
     elevator_add_req_fn *elevator_add_req_fn;
     elevator_remove_req_fn *elevator_remove_req_fn;
     elevator_requeue_req_fn *elevator_requeue_req_fn;
 
     elevator_queue_empty_fn *elevator_queue_empty_fn;
     elevator_completed_req_fn *elevator_completed_req_fn;
 
     elevator_request_list_fn *elevator_former_req_fn;
     elevator_request_list_fn *elevator_latter_req_fn;
 
     elevator_set_req_fn *elevator_set_req_fn;
     elevator_put_req_fn *elevator_put_req_fn;
 
     elevator_may_queue_fn *elevator_may_queue_fn;
    
     //初始化與退出操作
     elevator_init_fn *elevator_init_fn;
     elevator_exit_fn *elevator_exit_fn;
 
     void *elevator_data;
 
     struct kobject kobj;
     struct kobj_type *elevator_ktype;
     //調(diào)度算法的名字
     const char *elevator_name;
}
我們以最簡單的NOOP算法為例進行分析.
NOOP算法只是做簡單的請求合并的操作.的定義如下:
elevator_t elevator_noop = {
     .elevator_merge_fn     = elevator_noop_merge,
     .elevator_merge_req_fn      = elevator_noop_merge_requests,
     .elevator_next_req_fn       = elevator_noop_next_request,
     .elevator_add_req_fn        = elevator_noop_add_request,
     .elevator_name              = "noop",
}
挨個分析里面的各項操作:
elevator_noop_merge():在請求隊列中尋找能否有可以合并的請求.代碼如下:
int elevator_noop_merge(request_queue_t *q, struct request **req,
              struct bio *bio)
{
     struct list_head *entry = &q->queue_head;
     struct request *__rq;
     int ret;
 
     //如果請求隊列中有l(wèi)ast_merge項.則判斷l(xiāng)ast_merge項是否能夠合并
     //在NOOP中一般都不會設(shè)置last_merge
     if ((ret = elv_try_last_merge(q, bio))) {
         *req = q->last_merge;
         return ret;
     }
 
     //遍歷請求隊列中的請求
     while ((entry = entry->prev) != &q->queue_head) {
         __rq = list_entry_rq(entry);
 
         if (__rq->flags & (REQ_SOFTBARRIER | REQ_HARDBARRIER))
              break;
         else if (__rq->flags & REQ_STARTED)
              break;
         //如果不是一個fs類型的請求?
         if (!blk_fs_request(__rq))
              continue;
         //判斷能否與這個請求合并   
         if ((ret = elv_try_merge(__rq, bio))) {
              *req = __rq;
              q->last_merge = __rq;
              return ret;
         }
     }
 
     return ELEVATOR_NO_MERGE;
}
Elv_try_merge()用來判斷能否與請求合并,它的代碼如下:
inline int elv_try_merge(struct request *__rq, struct bio *bio)
{
     int ret = ELEVATOR_NO_MERGE;
 
     /*
      * we can merge and sequence is ok, check if it's possible
      */
      //判斷rq與bio是否為同類型的請求
     if (elv_rq_merge_ok(__rq, bio)) {
         //如果請求描述符中的起始扇區(qū)+ 扇區(qū)數(shù)= bio的起始扇區(qū)
         //則將bio加到_rq的后面.
         //返回ELEVATOR_BACK_MERGE
         if (__rq->sector + __rq->nr_sectors == bio->bi_sector)
              ret = ELEVATOR_BACK_MERGE;
         //如果請求描述符中的起始扇區(qū)- 扇區(qū)數(shù)=bio的起始扇區(qū)
         //則將bio加到_rq的前面
          //返回ELEVATOR_FRONT_MERGE
         else if (__rq->sector - bio_sectors(bio) == bio->bi_sector)
              ret = ELEVATOR_FRONT_MERGE;
     }
 
     //如果不可以合并,返回ELEVATOR_NO_MERGE (值為0)
     return ret;
}
elv_rq_merge_ok()代碼如下:
inline int elv_rq_merge_ok(struct request *rq, struct bio *bio)
{
     //判斷rq是否可用
     if (!rq_mergeable(rq))
         return 0;
 
     /*
      * different data direction or already started, don't merge
      */
      //操作是否相同
     if (bio_data_dir(bio) != rq_data_dir(rq))
         return 0;
 
     /*
      * same device and no special stuff set, merge is ok
      */
      //要操作的對象是否一樣
     if (rq->rq_disk == bio->bi_bdev->bd_disk &&
         !rq->waiting && !rq->special)
         return 1;
 
     return 0;
}
注意:如果檢查成功返回1.失敗返回0.
 
elevator_noop_merge_requests():將next 從請求隊列中取出.代碼如下:
void elevator_noop_merge_requests(request_queue_t *q, struct request *req,
                     struct request *next)
{
     list_del_init(&next->queuelist);
}
從上面的代碼中看到,NOOP算法從請求隊列中取出請求,只需要取鏈表結(jié)點即可.不需要進行額外的操作.
 
elevator_noop_next_request():取得下一個請求.代碼如下:
struct request *elevator_noop_next_request(request_queue_t *q)
{
     if (!list_empty(&q->queue_head))
         return list_entry_rq(q->queue_head.next);
 
     return NULL;
}
很簡單,取鏈表的下一個結(jié)點.
 
elevator_noop_add_request():往請求隊列中插入一個請求.代碼如下:
void elevator_noop_add_request(request_queue_t *q, struct request *rq,
                     int where)
{
     //默認是將rq插和到循環(huán)鏈表末尾
     struct list_head *insert = q->queue_head.prev;
     //如果要插到請求隊列的前面
     if (where == ELEVATOR_INSERT_FRONT)
         insert = &q->queue_head;
 
     //不管是什么樣的操作,都將新的請求插入到請求隊列的末尾
     list_add_tail(&rq->queuelist, &q->queue_head);
 
     /*
      * new merges must not precede this barrier
      */
     if (rq->flags & REQ_HARDBARRIER)
         q->last_merge = NULL;
     else if (!q->last_merge)
         q->last_merge = rq;
}
 
五:通用塊層的處理
通用塊層的入口點為generic_make_request().它的代碼如下:
void generic_make_request(struct bio *bio)
{
     request_queue_t *q;
     sector_t maxsector;
     //nr_sectors:要操作的扇區(qū)數(shù)
     int ret, nr_sectors = bio_sectors(bio);
 
     //可能會引起睡眠
     might_sleep();
     /* Test device or partition size, when known. */
     //最大扇區(qū)數(shù)目
     maxsector = bio->bi_bdev->bd_inode->i_size >> 9;
     if (maxsector) {
         //bio操作的起始扇區(qū)
         sector_t sector = bio->bi_sector;
 
         //如果最大扇區(qū)數(shù)<要操作的扇區(qū)數(shù)or 最大扇區(qū)數(shù)與起始扇區(qū)的差值小于要操作的扇區(qū)數(shù)
         //非法的情況
         if (maxsector < nr_sectors ||
             maxsector - nr_sectors < sector) {
              char b[BDEVNAME_SIZE];
              /* This may well happen - the kernel calls
               * bread() without checking the size of the
               * device, e.g., when mounting a device. */
              printk(KERN_INFO
                     "attempt to access beyond end of device\n");
              printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n",
                     bdevname(bio->bi_bdev, b),
                     bio->bi_rw,
                     (unsigned long long) sector + nr_sectors,
                     (long long) maxsector);
 
              set_bit(BIO_EOF, &bio->bi_flags);
              goto end_io;
         }
     }
 
     /*
      * Resolve the mapping until finished. (drivers are
      * still free to implement/resolve their own stacking
      * by explicitly returning 0)
      *
      * NOTE: we don't repeat the blk_size check for each new device.
      * Stacking drivers are expected to know what they are doing.
      */
     do {
         char b[BDEVNAME_SIZE];
         //取得塊設(shè)備的請求對列
         q = bdev_get_queue(bio->bi_bdev);
     if (!q) {
              //請求隊列不存在
              printk(KERN_ERR
                     "generic_make_request: Trying to access "
                   "nonexistent block-device %s (%Lu)\n",
                   bdevname(bio->bi_bdev, b),
                   (long long) bio->bi_sector);
end_io:
              //最終會調(diào)用bio->bi_end_io
              bio_endio(bio, bio->bi_size, -EIO);
              break;
         }
 
         //非法的情況
         if (unlikely(bio_sectors(bio) > q->max_hw_sectors)) {
              printk("bio too big device %s (%u > %u)\n",
                   bdevname(bio->bi_bdev, b),
                   bio_sectors(bio),
                   q->max_hw_sectors);
              goto end_io;
         }
 
         //如果請求隊列為QUEUE_FLAG_DEAD
         //退出
         if (test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))
              goto end_io;
 
         /*
          * If this device has partitions, remap block n
          * of partition p to block n+start(p) of the disk.
          */
          //如果當前塊設(shè)備是一個分區(qū),則轉(zhuǎn)到分區(qū)所屬的塊設(shè)備
         blk_partition_remap(bio);
         //調(diào)用請求隊列的make_request_fn()
         ret = q->make_request_fn(q, bio);
     } while (ret);
}
 
在blk_init_queue()中對請求隊列的make_request_fn的設(shè)置如下所示:
blk_init_queue()—> blk_queue_make_request(q, __make_request)
void blk_queue_make_request(request_queue_t * q, make_request_fn * mfn)
{
     ……
     ……
     q->make_request_fn = mfn;
     ……
}
這里,等待隊對的make_request_fn就被設(shè)置為了__make_request.這個函數(shù)的代碼如下:
static int __make_request(request_queue_t *q, struct bio *bio)
{
     struct request *req, *freereq = NULL;
     int el_ret, rw, nr_sectors, cur_nr_sectors, barrier, err;
     sector_t sector;
 
     //bio的起始扇區(qū)
     sector = bio->bi_sector;
     //扇區(qū)數(shù)目
     nr_sectors = bio_sectors(bio);
     //當前bio中的bio_vec的扇區(qū)數(shù)目
     cur_nr_sectors = bio_cur_sectors(bio);
     //讀/寫
     rw = bio_data_dir(bio);
 
     /*
      * low level driver can indicate that it wants pages above a
      * certain limit bounced to low memory (ie for highmem, or even
      * ISA dma in theory)
      */
      //建立一個彈性回環(huán)緩存
     blk_queue_bounce(q, &bio);
 
     spin_lock_prefetch(q->queue_lock);
 
     barrier = bio_barrier(bio);
     if (barrier && !(q->queue_flags & (1 << QUEUE_FLAG_ORDERED))) {
         err = -EOPNOTSUPP;
         goto end_io;
     }
 
again:
     spin_lock_irq(q->queue_lock);
 
     //請求隊列是空的
     if (elv_queue_empty(q)) {
         //激活塊設(shè)備驅(qū)動
         blk_plug_device(q);
         goto get_rq;
     }
     if (barrier)
         goto get_rq;
     //調(diào)用I/O調(diào)度的elevator_merge_fn方法,判斷這個bio能否和其它請求合并
     //如果可以合并,req參數(shù)將返回與之合并的請求描述符
     el_ret = elv_merge(q, &req, bio);
     switch (el_ret) {
         //可以合并.且bio加到req的后面
         case ELEVATOR_BACK_MERGE:
              BUG_ON(!rq_mergeable(req));
 
              if (!q->back_merge_fn(q, req, bio))
                   break;
 
              req->biotail->bi_next = bio;
              req->biotail = bio;
              req->nr_sectors = req->hard_nr_sectors += nr_sectors;
              drive_stat_acct(req, nr_sectors, 0);
              if (!attempt_back_merge(q, req))
                   elv_merged_request(q, req);
              goto out;
         //可以合并.且bio加到req的前面
         case ELEVATOR_FRONT_MERGE:
              BUG_ON(!rq_mergeable(req));
 
              if (!q->front_merge_fn(q, req, bio))
                   break;
 
              bio->bi_next = req->bio;
              req->cbio = req->bio = bio;
              req->nr_cbio_segments = bio_segments(bio);
              req->nr_cbio_sectors = bio_sectors(bio);
 
              /*
               * may not be valid. if the low level driver said
               * it didn't need a bounce buffer then it better
               * not touch req->buffer either...
               */
              req->buffer = bio_data(bio);
              req->current_nr_sectors = cur_nr_sectors;
              req->hard_cur_sectors = cur_nr_sectors;
              req->sector = req->hard_sector = sector;
              req->nr_sectors = req->hard_nr_sectors += nr_sectors;
              drive_stat_acct(req, nr_sectors, 0);
              if (!attempt_front_merge(q, req))
                   elv_merged_request(q, req);
              goto out;
 
         /*
          * elevator says don't/can't merge. get new request
          */
          //不可以合并.申請一個新的請求,將且加入請求隊列
         case ELEVATOR_NO_MERGE:
              break;
 
         default:
              printk("elevator returned crap (%d)\n", el_ret);
              BUG();
     }
 
     /*
      * Grab a free request from the freelist - if that is empty, check
      * if we are doing read ahead and abort instead of blocking for
      * a free slot.
      */
get_rq:
     //freereq:是新分配的請求描述符
     if (freereq) {
         req = freereq;
         freereq = NULL;
     } else {
         //分配一個請求描述符
         spin_unlock_irq(q->queue_lock);
         if ((freereq = get_request(q, rw, GFP_ATOMIC)) == NULL) {
              /*
               * READA bit set
               */
               //分配失敗
               err = -EWOULDBLOCK;
              if (bio_rw_ahead(bio))
                   goto end_io;
    
              freereq = get_request_wait(q, rw);
         }
         goto again;
     }
 
     req->flags |= REQ_CMD;
 
     /*
      * inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
      */
     if (bio_rw_ahead(bio) || bio_failfast(bio))
         req->flags |= REQ_FAILFAST;
 
     /*
      * REQ_BARRIER implies no merging, but lets make it explicit
      */
     if (barrier)
         req->flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
 
     //初始化新分配的請求描述符
     req->errors = 0;
     req->hard_sector = req->sector = sector;
     req->hard_nr_sectors = req->nr_sectors = nr_sectors;
     req->current_nr_sectors = req->hard_cur_sectors = cur_nr_sectors;
     req->nr_phys_segments = bio_phys_segments(q, bio);
     req->nr_hw_segments = bio_hw_segments(q, bio);
     req->nr_cbio_segments = bio_segments(bio);
     req->nr_cbio_sectors = bio_sectors(bio);
     req->buffer = bio_data(bio);     /* see ->buffer comment above */
     req->waiting = NULL;
     //將bio 關(guān)聯(lián)到請求描述符
     req->cbio = req->bio = req->biotail = bio;
     req->rq_disk = bio->bi_bdev->bd_disk;
     req->start_time = jiffies;
     //請將求描述符添加到請求隊列中
     add_request(q, req);
out: (R)
     if (freereq)
         __blk_put_request(q, freereq);
     //如果定義了BIO_RW_SYNC.
     //將調(diào)用__generic_unplug_device將塊設(shè)備驅(qū)動,它會直接調(diào)用驅(qū)動程序的策略例程
     if (bio_sync(bio))
         __generic_unplug_device(q);
 
     spin_unlock_irq(q->queue_lock);
     return 0;
 
end_io:
     bio_endio(bio, nr_sectors << 9, err);
     return 0;
}
這個函數(shù)的邏輯比較簡單,它判斷bio能否與請求隊列中存在的請求合并,如果可以合并,將其它合并到現(xiàn)有的請求.如果不能合并,則新建一個請求描述符,然后把它插入到請求隊列中.上面的代碼可以結(jié)合之前分析的NOOP算法進行理解.
重點分析一下請求描述符的分配過程:
分配一個請求描述符的過程如下所示:
         if ((freereq = get_request(q, rw, GFP_ATOMIC)) == NULL) {
              /*
               * READA bit set
               */
               //分配失敗
               err = -EWOULDBLOCK;
              if (bio_rw_ahead(bio))
                   goto end_io;
    
              freereq = get_request_wait(q, rw);
         }
在分析這段代碼之前,先來討論一下關(guān)于請求描述符的分配方式.記得我們在分析請求隊列描述符的時候,request_queue中有一個成員:struct request_list  rq;
它的數(shù)據(jù)結(jié)構(gòu)如下:
struct request_list {
     //讀/寫請求描述符的分配計數(shù)
     int count[2];
     //分配緩存池
     mempool_t *rq_pool;
     //如果沒有空閑內(nèi)存時.讀/寫請求的等待隊列
     wait_queue_head_t wait[2];
};
如果當前空閑內(nèi)存不夠.則會將請求的進程掛起.如果分配成功,則將請求隊列的rl字段指向這個分配的request_list.
釋放一個請求描述符,將會將其歸還給指定的內(nèi)存池.
request_list結(jié)構(gòu)還有一個避免請求擁塞的作用:
每個請求隊列都有一個允許處理請求的最大值(request_queue->nr_requests).如果隊列中的請求超過了這個數(shù)值,則將隊列置為QUEUE_FLAG_READFULL/QUEUE_FLAG_WRITEFULL.后續(xù)試圖加入到隊列的進程就會被放置到request_list結(jié)構(gòu)所對應的等待隊列中睡眠.如果一個隊列中的睡眠進程過程也多也會影響系統(tǒng)的效率.如果待處理的請求大于request_queue-> nr_congestion_on就會認為這個隊列是擁塞的.就會試圖降低新請求的創(chuàng)建速度.如果待處理請求小于request_queue->nr_congestion_off.則會認為當前隊列是不擁塞的.
get_request()的代碼如下:
static struct request *get_request(request_queue_t *q, int rw, int gfp_mask)
{
     struct request *rq = NULL;
     struct request_list *rl = &q->rq;
     struct io_context *ioc = get_io_context(gfp_mask);
 
     spin_lock_irq(q->queue_lock);
     //如果請求數(shù)超過了請求隊列允許的最大請求值(q->nr_requests)
     //就會將后續(xù)的請求進程投入睡眠
    
     if (rl->count[rw]+1 >= q->nr_requests) {
         /*
          * The queue will fill after this allocation, so set it as
          * full, and mark this process as "batching". This process
          * will be allowed to complete a batch of requests, others
          * will be blocked.
          */
          //判斷是否將隊列置為了QUEUE_FLAG_READFULL/QUEUE_FLAG_WRITEFULL
          //如果沒有,則置此標志.并且設(shè)置當前進程為batching
         if (!blk_queue_full(q, rw)) {
              ioc_set_batching(ioc);
              blk_set_queue_full(q, rw);
         }
     }
 
     //如果隊列滿了,進程不為batching 且I/O調(diào)度程序不能忽略它
     //不能分配.直接返回
     if (blk_queue_full(q, rw)
              && !ioc_batching(ioc) && !elv_may_queue(q, rw)) {
         /*
          * The queue is full and the allocating process is not a
          * "batcher", and not exempted by the IO scheduler
          */
         spin_unlock_irq(q->queue_lock);
         goto out;
     }
 
     //要分配請求描述符了,遞增計數(shù)
     rl->count[rw]++;
     //如果待請求數(shù)量超過了request_queue-> nr_congestion_on
     //則隊列是阻塞的,設(shè)置阻塞標志
     if (rl->count[rw] >= queue_congestion_on_threshold(q))
         set_queue_congested(q, rw);
     spin_unlock_irq(q->queue_lock);
 
     //分配請求描述符
     rq = blk_alloc_request(q, gfp_mask);
     if (!rq) {
         /*
          * Allocation failed presumably due to memory. Undo anything
          * we might have messed up.
          *
          * Allocating task should really be put onto the front of the
          * wait queue, but this is pretty rare.
          */
         spin_lock_irq(q->queue_lock);
         //分配失敗了,要減小分配描述的引用計數(shù)
         freed_request(q, rw);
         spin_unlock_irq(q->queue_lock);
         goto out;
     }
 
     if (ioc_batching(ioc))
         ioc->nr_batch_requests--;
 
     //初始化請求的各字段
     INIT_LIST_HEAD(&rq->queuelist);
 
     /*
      * first three bits are identical in rq->flags and bio->bi_rw,
      * see bio.h and blkdev.h
      */
     rq->flags = rw;
 
     rq->errors = 0;
     rq->rq_status = RQ_ACTIVE;
     rq->bio = rq->biotail = NULL;
     rq->buffer = NULL;
     rq->ref_count = 1;
     rq->q = q;
     rq->rl = rl;
     rq->waiting = NULL;
     rq->special = NULL;
     rq->data_len = 0;
     rq->data = NULL;
     rq->sense = NULL;
 
out:
     //減少ioc的引用計數(shù)
     put_io_context(ioc);
     return rq;
}
由于在分配之前遞增了統(tǒng)計計數(shù),所以在分配失敗后,要把這個統(tǒng)計計數(shù)減下來,這是由freed_request()完成的.它的代碼如下:
static void freed_request(request_queue_t *q, int rw)
{
     struct request_list *rl = &q->rq;
 
     rl->count[rw]--;
     //如果分配計數(shù)小于request_queue->nr_congestion_off.隊列已經(jīng)不擁塞了
     if (rl->count[rw] < queue_congestion_off_threshold(q))
         clear_queue_congested(q, rw);
     //如果計數(shù)小于允許的最大值.那可以分配請求了,將睡眠的進程喚醒
     if (rl->count[rw]+1 <= q->nr_requests) {
         //喚醒等待進程
         if (waitqueue_active(&rl->wait[rw]))
              wake_up(&rl->wait[rw]);
         //清除QUEUE_FLAG_READFULL/QUEUE_FLAG_WRITEFULL
         blk_clear_queue_full(q, rw);
     }
}
在這里我們可以看到,如果待處理請求小于請求隊列所允許的最大值,就會將睡眠的進程喚醒.
如果請求描述符分配失敗,會怎么樣呢,?我們接著看__make_request()中的代碼:
         if ((freereq = get_request(q, rw, GFP_ATOMIC)) == NULL) {
              /*
               * READA bit set
               */
               //分配失敗
               err = -EWOULDBLOCK;
              //如果此次操作是一次預讀,且不阻塞
              if (bio_rw_ahead(bio))
                   goto end_io;
              //掛起進程
              freereq = get_request_wait(q, rw);
         }
如果分配失敗,會調(diào)用get_request_wait()將進程掛起.它的代碼如下:
static struct request *get_request_wait(request_queue_t *q, int rw)
{
     //初始化一個等待隊列
     DEFINE_WAIT(wait);
     struct request *rq;
     struct io_context *ioc;
 
     //撤消塊設(shè)備驅(qū)動.這里會直接調(diào)用塊設(shè)備驅(qū)動的策略例程
     generic_unplug_device(q);
     ioc = get_io_context(GFP_NOIO);
     do {
         struct request_list *rl = &q->rq;
 
         //將當前進程加入等待隊列.并設(shè)置進程狀態(tài)為TASK_UNINTERRUPTIBLE
         prepare_to_wait_exclusive(&rl->wait[rw], &wait,
                   TASK_UNINTERRUPTIBLE);
         //再次獲得等待隊列
         rq = get_request(q, rw, GFP_NOIO);
 
         if (!rq) {
             
              //如果還是失敗了,睡眠
              io_schedule();
 
              /*
               * After sleeping, we become a "batching" process and
               * will be able to allocate at least one request, and
               * up to a big batch of them for a small period time.
               * See ioc_batching, ioc_set_batching
               */
               //這里是被喚醒之后運行
              ioc_set_batching(ioc);
         }
         //將進程從等待隊列中刪除
         finish_wait(&rl->wait[rw], &wait);
     } while (!rq);
     put_io_context(ioc);
 
     return rq;
}
這段代碼比較簡單,相似的代碼我們在之前已經(jīng)分析過很多次了.這里不做重點分析.
 
此外.在__make_request()中還需要注意一件事情.在bio中的內(nèi)存可能是高端內(nèi)存的.但是內(nèi)核不能直接訪問,這里就必須要對處理高端內(nèi)存的bio_vec做下處理.即將它臨時映射之后copy到普通內(nèi)存區(qū).這就是所謂的彈性回環(huán)緩存.相關(guān)的操作是在blk_queue_bounce()中完成的.這個函數(shù)比較簡單,可以自行分析.
到這里,通用塊層的處理分析就結(jié)束了.我們繼續(xù)分析其它的層次.

    本站是提供個人知識管理的網(wǎng)絡存儲空間,,所有內(nèi)容均由用戶發(fā)布,,不代表本站觀點,。請注意甄別內(nèi)容中的聯(lián)系方式、誘導購買等信息,,謹防詐騙,。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點擊一鍵舉報,。
    轉(zhuǎn)藏 分享 獻花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多