http://blog./uid-9185047-id-445133.html 2010 R.wen 一,、綜述 圖1是塊設(shè)備操作的一個分層實現(xiàn)圖,。當(dāng)一個進程調(diào)用read讀取一個文件時,,內(nèi)核執(zhí)行如下一個過程:首先,,它通過VFS層去讀取要到的文件塊有沒有已經(jīng)被cache了,,這個cache由一個buffer_head結(jié)構(gòu)讀取,。如果要讀取的文件塊還沒有被cache,,則就要從文件系統(tǒng)中去讀取了,,這就是文件系統(tǒng)的映射層,它通過一個address_space結(jié)構(gòu)來引用,,然后調(diào)用文件系統(tǒng)讀函數(shù)(readpage)去讀取一個頁面大小的數(shù)據(jù),,這個讀函數(shù)對于不同的文件系統(tǒng)來說,是不一樣的,。當(dāng)它從磁盤中讀出數(shù)據(jù)時,,它會將數(shù)據(jù)頁鏈入cache中,當(dāng)下次再讀取時,,就不需要再次從磁盤出去讀了,。Readpage()函數(shù)并不是直接去操作磁盤,而只是將請求初始化成一個bio結(jié)構(gòu),,并提交給通用塊層(generic block layer),。 它就通過submit_bio()去完成的。通用塊層再調(diào)用相應(yīng)設(shè)備的IO調(diào)度器,,通過這個調(diào)度器的調(diào)度算法,,將這個bio或合并到已存在的request中,或創(chuàng)建一個新的request,,并將這個新創(chuàng)建的request插入到設(shè)備的請求隊列中去,。這就完成了IO調(diào)度層的工作,。最后就是塊設(shè)備驅(qū)動所做的工作了。IO調(diào)度器傳遞給塊驅(qū)動的是一個請求隊列,,塊驅(qū)動就是要處理這個隊列中的請求,,直到這個隊列為空為止。 二,、通用塊層(generic block layer) 通用塊層操作的是一個bio結(jié)構(gòu),,這個結(jié)構(gòu)主要的數(shù)據(jù)域是, unsigned short bi_vcnt; struct bio_vec *bi_io_vec; /* the actual vec list */ 這個就是要讀寫的數(shù)據(jù)向量,,且每個struct bio_vec 為一個segment,。 //這個函數(shù)主要是調(diào)用generic_make_request()去完成工作: void submit_bio(int rw, struct bio *bio) { …… generic_make_request(bio); } //這個函數(shù)的主要作用是將bio傳遞給驅(qū)動去處理 void generic_make_request(struct bio *bio) { …… do { char b[BDEVNAME_SIZE]; //取得塊設(shè)備相應(yīng)的隊列,每個設(shè)備一個 q = bdev_get_queue(bio->bi_bdev);
/* * If this device has partitions, remap block n * of partition p to block n+start(p) of the disk. */ blk_partition_remap(bio); //塊設(shè)備分區(qū)信息轉(zhuǎn)換,,如將相對于一個分區(qū)的的偏移地址轉(zhuǎn)換成相對于整個塊設(shè)備的絕對偏移等等,。
old_sector = bio->bi_sector; old_dev = bio->bi_bdev->bd_dev; …… //這個是塊設(shè)備隊列的請求處理函數(shù)。由塊設(shè)備創(chuàng)建請求隊列時初始化,。 //對于IDE等設(shè)備,,它是__make_request()。但對于ramdisk就不一樣了,。 ret = q->make_request_fn(q, bio); // __make_request()等 } while (ret); } //這要函數(shù)的主要作用就是調(diào)用IO調(diào)度算法將bio合并,,或插入到隊列中合適的位置中去 static int __make_request(request_queue_t *q, struct bio *bio) { struct request *req; int el_ret, nr_sectors, barrier, err; const unsigned short prio = bio_prio(bio); const int sync = bio_sync(bio); int rw_flags; nr_sectors = bio_sectors(bio); //用于處理高端內(nèi)存 blk_queue_bounce(q, &bio);
spin_lock_irq(q->queue_lock); //測試是否能合并,本文忽略IO調(diào)度算法 el_ret = elv_merge(q, &req, bio); switch (el_ret) { //前兩種可以合并 case ELEVATOR_BACK_MERGE: …… goto out; case ELEVATOR_FRONT_MERGE: …… goto out;
//不能合并,,需要新創(chuàng)一個request,。 /* ELV_NO_MERGE: elevator says don't/can't merge. */ default: ; } get_rq:
rw_flags = bio_data_dir(bio); if (sync) rw_flags |= REQ_RW_SYNC; //新創(chuàng)一個request req = get_request_wait(q, rw_flags, bio); //初始化這個request。 init_request_from_bio(req, bio); spin_lock_irq(q->queue_lock); if (elv_queue_empty(q)) //空隊列的處理 blk_plug_device(q); add_request(q, req); //將新請求加入隊列中去 out: if (sync) //如果需要同步,,立即處理請求 __generic_unplug_device(q); spin_unlock_irq(q->queue_lock); return 0; end_io: bio_endio(bio, nr_sectors << 9, err); return 0; } //觸發(fā)塊設(shè)備驅(qū)動進行真正的IO操作 void __generic_unplug_device(request_queue_t *q) { if (unlikely(blk_queue_stopped(q))) return; if (!blk_remove_plug(q)) return; q->request_fn(q); //設(shè)備的請求處理函數(shù),,屬于驅(qū)動層 } 三、塊設(shè)備驅(qū)動層 塊設(shè)備的關(guān)系圖如圖2,,一個分區(qū)或一個硬盤都可能是block_device,,它一個硬盤只有一個gendisk結(jié)構(gòu),且有可能有多個分區(qū)hd_struct,。 圖2 我們來看一個IDE硬盤設(shè)備的驅(qū)動,,在此我們不關(guān)心IDE總線的驅(qū)動,只是將其執(zhí)行路線列出來,。 static int ide_init_queue(ide_drive_t *drive) { request_queue_t *q; ide_hwif_t *hwif = HWIF(drive); int max_sectors = 256; int max_sg_entries = PRD_ENTRIES; //分配一個請求隊列,,由IDE總線去幫助完成,簡化了特定塊設(shè)備的工作 q = blk_init_queue_node(do_ide_request, &ide_lock, hwif_to_node(hwif)); //初始化隊列中的一些參數(shù) q->queuedata = drive; blk_queue_segment_boundary(q, 0xffff); …… blk_queue_max_hw_segments(q, max_sg_entries); blk_queue_max_phys_segments(q, max_sg_entries); /* assign drive queue */ drive->queue = q; return 0; } request_queue_t * blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) { request_queue_t *q = blk_alloc_queue_node(GFP_KERNEL, node_id); q->node = node_id; if (blk_init_free_list(q)) { kmem_cache_free(requestq_cachep, q); return NULL; } q->request_fn = rfn; //由上可以看到,,在ide-disk中,,為do_ide_request q->prep_rq_fn = NULL; q->unplug_fn = generic_unplug_device; q->queue_flags = (1 << QUEUE_FLAG_CLUSTER); q->queue_lock = lock; blk_queue_segment_boundary(q, 0xffffffff); blk_queue_make_request(q, __make_request); blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE); blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS); blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS); /* * all done */ if (!elevator_init(q, NULL)) { //設(shè)置隊列的IO調(diào)度算法 blk_queue_congestion_threshold(q); return q; } blk_put_queue(q); return NULL; } 由上可見,當(dāng)unplug一個塊設(shè)備時,,它將執(zhí)行do_ide_request(): void do_ide_request(request_queue_t *q) { ide_drive_t *drive = q->queuedata; ide_do_request(HWGROUP(drive), IDE_NO_IRQ); } static void ide_do_request (ide_hwgroup_t *hwgroup, int masked_irq) { ide_drive_t *drive; ide_hwif_t *hwif; struct request *rq; ide_startstop_t startstop; int loops = 0; /* for atari only: POSSIBLY BROKEN HERE(?) */ ide_get_lock(ide_intr, hwgroup); /* caller must own ide_lock */ BUG_ON(!irqs_disabled()); while (!hwgroup->busy) { hwgroup->busy = 1; drive = choose_drive(hwgroup); //選擇硬盤 …… again: hwif = HWIF(drive); if (hwgroup->hwif->sharing_irq && hwif != hwgroup->hwif && hwif->io_ports[IDE_CONTROL_OFFSET]) { /* set nIEN for previous hwif */ SELECT_INTERRUPT(drive); } hwgroup->hwif = hwif; hwgroup->drive = drive; drive->sleeping = 0; /* * we know that the queue isn't empty, but this can happen * if the q->prep_rq_fn() decides to kill a request */ rq = elv_next_request(drive->queue); //取下一個請求 if (!rq) { hwgroup->busy = 0; break; } hwgroup->rq = rq; local_irq_enable_in_hardirq(); /* allow other IRQs while we start this request */ startstop = start_request(drive, rq); //開始向磁盤寫入該請求 spin_lock_irq(&ide_lock); if (masked_irq != IDE_NO_IRQ && hwif->irq != masked_irq) enable_irq(hwif->irq); if (startstop == ide_stopped) hwgroup->busy = 0; } } static ide_startstop_t start_request (ide_drive_t *drive, struct request *rq) { ide_startstop_t startstop; sector_t block; block = rq->sector; if (blk_fs_request(rq) && (drive->media == ide_disk || drive->media == ide_floppy)) { block += drive->sect0; } SELECT_DRIVE(drive); if (!drive->special.all) { ide_driver_t *drv; …… if (rq->cmd_type == REQ_TYPE_ATA_CMD || rq->cmd_type == REQ_TYPE_ATA_TASK || rq->cmd_type == REQ_TYPE_ATA_TASKFILE) return execute_drive_cmd(drive, rq); else if (blk_pm_request(rq)) { …… return startstop; } drv = *(ide_driver_t **)rq->rq_disk->private_data; return drv->do_request(drive, rq, block); } } 以上均是IDE總線上設(shè)備的通用接口,,直到do_request開始才執(zhí)行特定設(shè)備的驅(qū)動,,如CD,HD, floppy等IDE設(shè)備,。我們來看一下ide-disk: 1、 首先是設(shè)備的初始化操作,。 IDE設(shè)備接口 static ide_driver_t idedisk_driver = { .gen_driver = { .owner = THIS_MODULE, .name = "ide-disk", .bus = &ide_bus_type, }, .probe = ide_disk_probe, .remove = ide_disk_remove, .shutdown = ide_device_shutdown, .version = IDEDISK_VERSION, .media = ide_disk, .supports_dsc_overlap = 0, .do_request = ide_do_rw_disk, .end_request = ide_end_request, .error = __ide_error, .abort = __ide_abort, .proc = idedisk_proc, }; static struct block_device_operations idedisk_ops = { .owner = THIS_MODULE, .open = idedisk_open, .release = idedisk_release, .ioctl = idedisk_ioctl, .getgeo = idedisk_getgeo, .media_changed = idedisk_media_changed, .revalidate_disk= idedisk_revalidate_disk }; //設(shè)備注冊 static int __init idedisk_init(void) { return driver_register(&idedisk_driver.gen_driver); } //這個probe函數(shù)是在設(shè)備注冊時由驅(qū)動模型去執(zhí)行 static int ide_disk_probe(ide_drive_t *drive) { struct ide_disk_obj *idkp; struct gendisk *g; idkp = kzalloc(sizeof(*idkp), GFP_KERNEL); //分配一個gendisk結(jié)構(gòu) g = alloc_disk_node(1 << PARTN_BITS, hwif_to_node(drive->hwif)); ide_init_disk(g, drive); //用上面的結(jié)構(gòu)注冊設(shè)備 ide_register_subdriver(drive, &idedisk_driver); kref_init(&idkp->kref); //一些初始化操作 idkp->drive = drive; idkp->driver = &idedisk_driver; idkp->disk = g; g->private_data = &idkp->driver; drive->driver_data = idkp; idedisk_setup(drive); g->minors = 1 << PARTN_BITS; g->driverfs_dev = &drive->gendev; g->flags = drive->removable ? GENHD_FL_REMOVABLE : 0; set_capacity(g, idedisk_capacity(drive)); g->fops = &idedisk_ops; add_disk(g); //插入設(shè)備,,至此,該設(shè)備可用 return 0; } 2,、 處理IDE總線發(fā)來的請求 由上可以看到,,IDE總線驅(qū)動調(diào)用設(shè)備的do_request()去處理這個請求,我們在上面的注冊中可以看到,。在ide-disk里,,它是ide_do_rw_disk(): /* * 268435455 == 137439 MB or 28bit limit * 320173056 == 163929 MB or 48bit addressing * 1073741822 == 549756 MB or 48bit addressing fake drive */ static ide_startstop_t ide_do_rw_disk (ide_drive_t *drive, struct request *rq, sector_t block) { ide_hwif_t *hwif = HWIF(drive); …… if (hwif->rw_disk) hwif->rw_disk(drive, rq); return __ide_do_rw_disk(drive, rq, block); } //以下就是特定硬盤設(shè)備的驅(qū)動了,因為我們只關(guān)心塊設(shè)備驅(qū)動編程的框架,,所以就不深入進去了,。 /* * __ide_do_rw_disk() issues READ and WRITE commands to a disk, * using LBA if supported, or CHS otherwise, to address sectors. */ static ide_startstop_t __ide_do_rw_disk(ide_drive_t *drive, struct request *rq, sector_t block) { ide_hwif_t *hwif = HWIF(drive); unsigned int dma = drive->using_dma; u8 lba48 = (drive->addressing == 1) ? 1 : 0; task_ioreg_t command = WIN_NOP; ata_nsector_t nsectors; nsectors.all = (u16) rq->nr_sectors; if (hwif->no_lba48_dma && lba48 && dma) { if (block + rq->nr_sectors > 1ULL << 28) dma = 0; else lba48 = 0; } if (drive->select.b.lba) { if (lba48) { …… } else { …… } } else { …… } if (dma) { …… /* fallback to PIO */ ide_init_sg_cmd(drive, rq); } if (rq_data_dir(rq) == READ) { if (drive->mult_count) { hwif->data_phase = TASKFILE_MULTI_IN; command = lba48 ? WIN_MULTREAD_EXT : WIN_MULTREAD; } else { hwif->data_phase = TASKFILE_IN; command = lba48 ? WIN_READ_EXT : WIN_READ; } ide_execute_command(drive, command, &task_in_intr, WAIT_CMD, NULL); return ide_started; } else { …… return pre_task_out_intr(drive, rq); } } 3、 塊設(shè)備驅(qū)動小結(jié) 我們由上看到,, 塊設(shè)備驅(qū)動編程的主要工作包括分配并初始化一個gendisk結(jié)構(gòu),,分配并初始化一個請求隊列,請求處理函數(shù)的編寫(request_fn),,還有中斷的處理等等,。但具體到不同的設(shè)備,實現(xiàn)又有一些出入,,我們在上面看到的IDE設(shè)備,,它大部分工作都是在IDE總線級上實現(xiàn)了,它做了許多繁瑣但必要的工作,,并向下層特定設(shè)備提供統(tǒng)一的接口,,這樣就大大簡化了塊設(shè)備驅(qū)動的編寫過程。 有興趣可以參考一下內(nèi)核ramdisk的實現(xiàn),。 |
|
來自: 心不留意外塵 > 《驅(qū)動協(xié)議》