android 在linux 4.12 內(nèi)核對ion驅(qū)動的api 進行了修改,,原來的一部分ioctl命令已經(jīng)不存在了,。
谷歌的ion 我個人覺的還是挺大的,system heap 內(nèi)存分配的方式,,其他的還有使用cma 分配等,,不同的分配方式會調(diào)用linux不同的接口。這篇文章值只寫下自己對system heap 的個人理解,。ion相關代碼在內(nèi)核kernel\msm-4.14\drivers\staging\android\ion 路徑下無論Android ion 最后調(diào)用那種heap 來分配內(nèi)存,。分配的buffer 都是放在linux dma-buf 這個結(jié)構中,dma-buf 是linux 中的一個框架,,具體代碼我并沒有仔細去研究,,根據(jù)ion中的使用來看,每個ion在分配的buffer 會存在dma-buf這個結(jié)構中,,然后谷歌對這個buffer還有操作函數(shù)集ops ,,也放到dma-buf中,在使用這個buffer時候?qū)嶋H上是間接調(diào)用dma-buf ops 來對這個buffer操作了,,然后這個ops 函數(shù)在去調(diào)用heap 綁定的ops去實現(xiàn),。比如system heap,heap 創(chuàng)建時綁定了alloc,。mmap,,free,shrink等函數(shù),。dma-buf ops會最終調(diào)用這些函數(shù),。
在ion.c 文件中能夠看到dma-buf ops 谷歌的實現(xiàn)
static const struct dma_buf_ops dma_buf_ops = { .map_dma_buf = ion_map_dma_buf, .unmap_dma_buf = ion_unmap_dma_buf, .release = ion_dma_buf_release, .attach = ion_dma_buf_attach, .detach = ion_dma_buf_detatch, .begin_cpu_access = ion_dma_buf_begin_cpu_access, .end_cpu_access = ion_dma_buf_end_cpu_access, .begin_cpu_access_umapped = ion_dma_buf_begin_cpu_access_umapped, .end_cpu_access_umapped = ion_dma_buf_end_cpu_access_umapped, .begin_cpu_access_partial = ion_dma_buf_begin_cpu_access_partial, .end_cpu_access_partial = ion_dma_buf_end_cpu_access_partial, .map_atomic = ion_dma_buf_kmap, .unmap_atomic = ion_dma_buf_kunmap, .unmap = ion_dma_buf_kunmap, .vmap = ion_dma_buf_vmap, .vunmap = ion_dma_buf_vunmap, .get_flags = ion_dma_buf_get_flags,
在ion.h 中能夠看到heap 必須實現(xiàn)的函數(shù)的定義
* struct ion_heap_ops - ops to operate on a given heap * @allocate: allocate memory * @map_kernel map memory to the kernel * @unmap_kernel unmap memory to the kernel * @map_user map memory to userspace * allocate, phys, and map_user return 0 on success, -errno on error. * map_dma and map_kernel return pointer on success, ERR_PTR on * error. @free will be called with ION_PRIV_FLAG_SHRINKER_FREE set in * the buffer's private_flags when called from a shrinker. In that * case, the pages being free'd must be truly free'd back to the * system, not put in a page pool or otherwise cached. int (*allocate)(struct ion_heap *heap, struct ion_buffer *buffer, unsigned long len, void (*free)(struct ion_buffer *buffer); void * (*map_kernel)(struct ion_heap *heap, struct ion_buffer *buffer); void (*unmap_kernel)(struct ion_heap *heap, struct ion_buffer *buffer); int (*map_user)(struct ion_heap *mapper, struct ion_buffer *buffer, struct vm_area_struct *vma); int (*shrink)(struct ion_heap *heap, gfp_t gfp_mask, int nr_to_scan);
在正式進入到分配內(nèi)存給ion環(huán)節(jié)前,有一些概念應該時要了解的,,struct sg_table 此結(jié)構時linux中保存物理頁面散列表的,。具體解釋建議看蝸窩科技的這篇文章Linux kernel scatterlist API介紹,簡單的接受就是此結(jié)構保存了物理頁面的散列表,,system 在分配的時候并不是分配出來的時一個連續(xù)的物理頁面,,可以不連續(xù),只要虛擬地址連續(xù)就可以,,比如camera申請了12M的buffer,,此時從伙伴中拿出來的buffer 可能時多個64K的頁面。64k內(nèi)部時連續(xù)的,,當時64k頁面之間并不是連續(xù)的,。
伙伴系統(tǒng): 這個晚上資料很多,概念也比較簡單,,伙伴系統(tǒng)通過哈希表來管理物理內(nèi)存,。分配的時候根據(jù)2的order (幾)次方分配對應的物理頁面數(shù),。
文件描述符fd,ion分配內(nèi)存后最后返回的是fd,,fd通過binder傳輸?shù)讲煌倪M程,,然后在映射成進程的虛擬地址。fd 只能在一個進程內(nèi)使用,,傳遞到其他進程時時通過Android 的binder 機制,,簡單概括就是binder首先從要從其他進程分配個fd,然后讓當前的進程fd對應的內(nèi)核的file 結(jié)構體和其他進程的fd綁定,。
1.內(nèi)存分配
ion 系統(tǒng)分配內(nèi)存時在打開設備后調(diào)用ioctl函數(shù)實現(xiàn)的
fd = ion_alloc_fd(data.allocation.len, data.allocation.heap_id_mask,
可以看到調(diào)用了ion_alloc_fd函數(shù)產(chǎn)生了一個fd,,ion_alloc_fd函數(shù)有三個參數(shù),第一個參數(shù)時分配的buffer長度,,第二個時heap的選擇,,ion中有很多heap類型,本文只將system heap(其他heap 代碼看起來比較難),,第三個參數(shù)時標志位,,在分配buffer的時候還有很多屬性通過這個標志位來判斷,比如分配的是否時camer內(nèi)存,,是否需要安全內(nèi)存分配,。函數(shù)ion_alloc_fd 實現(xiàn)如下:
int ion_alloc_fd(size_t len, unsigned int heap_id_mask, unsigned int flags) dmabuf = ion_alloc_dmabuf(len, heap_id_mask, flags); fd = dma_buf_fd(dmabuf, O_CLOEXEC);
首先是產(chǎn)生產(chǎn)生了一個dma_buf 然后將這個dma-buf 轉(zhuǎn)換成fd。dma-buf 定義位于kernel\msm-4.14\include\linux\dma-buf.h文章將中,,每個變量的含義官方有解釋:
* struct dma_buf - shared buffer object * @size: size of the buffer * @file: file pointer used for sharing buffers across, and for refcounting. * @attachments: list of dma_buf_attachment that denotes all devices attached. * @ops: dma_buf_ops associated with this buffer object. * @lock: used internally to serialize list manipulation, attach/detach and vmap/unmap * @vmapping_counter: used internally to refcnt the vmaps * @vmap_ptr: the current vmap ptr if vmapping_counter > 0 * @exp_name: name of the exporter; useful for debugging. * @name: unique name for the buffer * @ktime: time (in jiffies) at which the buffer was born * @owner: pointer to exporter module; used for refcounting when exporter is a * @list_node: node for dma_buf accounting and debugging. * @priv: exporter specific private data for this buffer object. * @resv: reservation object linked to this dma-buf * @poll: for userspace poll support * @cb_excl: for userspace poll support * @cb_shared: for userspace poll support * This represents a shared buffer, created by calling dma_buf_export(). The * userspace representation is a normal file descriptor, which can be created by * Shared dma buffers are reference counted using dma_buf_put() and * Device DMA access is handled by the separate &struct dma_buf_attachment. struct list_head attachments; const struct dma_buf_ops *ops; unsigned vmapping_counter; struct list_head list_node; struct reservation_object *resv; struct dma_buf_poll_cb_t {
struct file 這個比較重要,,這個會涉及將來的fd,,實際上fd 是和struct file 連接起來的,。 fd可以多個使用同一個struct file 者也是mmap映射fd 時候能夠映射為多個虛擬地址的原因。
ion_alloc_dmabuf函數(shù)位于kernel\msm-4.14\drivers\staging\android\ion\ion.c 文件中:
struct dma_buf *ion_alloc_dmabuf(size_t len, unsigned int heap_id_mask, struct ion_device *dev = internal_dev; struct ion_buffer *buffer = NULL; DEFINE_DMA_BUF_EXPORT_INFO(exp_info); char task_comm[TASK_COMM_LEN]; pr_debug("%s: len %zu heap_id_mask %u flags %x\n", __func__, len, heap_id_mask, flags); * traverse the list of heaps available in this system in priority * order. If the heap type is supported by the client, and matches the * request of the caller allocate from it. Repeat until allocate has * succeeded or all heaps have been tried plist_for_each_entry(heap, &dev->heaps, node) { /* if the caller didn't specify this heap id */ if (!((1 << heap->id) & heap_id_mask)) buffer = ion_buffer_create(heap, dev, len, flags); if (!IS_ERR(buffer) || PTR_ERR(buffer) == -EINTR) get_task_comm(task_comm, current->group_leader); exp_info.ops = &dma_buf_ops; exp_info.size = buffer->size; exp_info.exp_name = kasprintf(GFP_KERNEL, "%s-%s-%d-%s", KBUILD_MODNAME, heap->name, current->tgid, task_comm); dmabuf = dma_buf_export(&exp_info); _ion_buffer_destroy(buffer); kfree(exp_info.exp_name);
PAGE_ALIGN 這個宏長度的頁面對齊(向上對齊),,分配的buffer的大小假如是5K這里是將轉(zhuǎn)換成8K,,因為頁面時以4k為大小的,與之對應的還有向下對齊,,5k將轉(zhuǎn)換為4k,。
plist_for_each_entry 將從所有的heap中查找對應的heap 類型,并執(zhí)行這個heap對應的分配buffer函數(shù),,這里我們假定這個heap時system heap,。
在手機中查看system heap相關的信息,在adb shell 進入/sys/kernel/debug/ion/heaps
執(zhí)行cat system
uncached pool = 349003776 cached pool = 1063071744 secure pool = 0 pool total (uncached + cached + secure) = 1412075520
可以看到system heap中有三個pool ,,這三個pool是谷歌設置的三個存放物理頁面的池,。也可以自己加pool。
找到對應的heap后開始執(zhí)行ion_buffer_create函數(shù)創(chuàng)建ions buffer,,定義位于kernel\msm-4.14\drivers\staging\android\ion\ion.h
* struct ion_buffer - metadata for a particular buffer * @node: node in the ion_device buffers tree * @dev: back pointer to the ion_device * @heap: back pointer to the heap the buffer came from * @flags: buffer specific flags * @private_flags: internal buffer specific flags * @size: size of the buffer * @priv_virt: private data to the buffer representable as * @lock: protects the buffers cnt fields * @kmap_cnt: number of times the buffer is mapped to the kernel * @vaddr: the kernel mapping if kmap_cnt is not zero * @sg_table: the sg table for the buffer if dmap_cnt is not zero * @vmas: list of vma's mapping this buffer unsigned long private_flags; struct sg_table *sg_table; struct list_head attachments;
前面介紹的struct sg_table 就放在ion buffer中,,用來保存物理頁面散列表。
/* this function should only be called while dev->lock is held */ static struct ion_buffer *ion_buffer_create(struct ion_heap *heap, struct ion_buffer *buffer; buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); ret = heap->ops->allocate(heap, buffer, len, flags); if (!(heap->flags & ION_HEAP_FLAG_DEFER_FREE)) ion_heap_freelist_drain(heap, 0); ret = heap->ops->allocate(heap, buffer, len, flags); if (buffer->sg_table == NULL) { WARN_ONCE(1, "This heap needs to set the sgtable"); spin_lock(&heap->stat_lock); heap->num_of_alloc_bytes += len; if (heap->num_of_alloc_bytes > heap->alloc_bytes_wm) heap->alloc_bytes_wm = heap->num_of_alloc_bytes; spin_unlock(&heap->stat_lock); table = buffer->sg_table; INIT_LIST_HEAD(&buffer->attachments); INIT_LIST_HEAD(&buffer->vmas); mutex_init(&buffer->lock); if (IS_ENABLED(CONFIG_ION_FORCE_DMA_SYNC)) { * this will set up dma addresses for the sglist -- it is not * technically correct as per the dma api -- a specific * device isn't really taking ownership here. However, in * practice on our systems the only dma_address space is for_each_sg(table->sgl, sg, table->nents, i) { sg_dma_address(sg) = sg_phys(sg); sg_dma_len(sg) = sg->length; mutex_lock(&dev->buffer_lock); ion_buffer_add(dev, buffer); mutex_unlock(&dev->buffer_lock); atomic_long_add(len, &heap->total_allocated);
此函數(shù)最主要的是通過ret = heap->ops->allocate(heap, buffer, len, flags);函數(shù)調(diào)用heap對應的分配函數(shù)。其他的代碼是一鏈表和sg_table的賦值,。
systeam 的alloc函數(shù)位于kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.c中
static struct ion_heap_ops system_heap_ops = { .allocate = ion_system_heap_allocate, .free = ion_system_heap_free, .map_kernel = ion_heap_map_kernel, .unmap_kernel = ion_heap_unmap_kernel, .map_user = ion_heap_map_user, .shrink = ion_system_heap_shrink,
allocate 實現(xiàn)函數(shù)是ion_system_heap_allocate 源碼如下:
static int ion_system_heap_allocate(struct ion_heap *heap, struct ion_buffer *buffer, struct ion_system_heap *sys_heap = container_of(heap, struct sg_table table_sync = {0}; struct scatterlist *sg_sync; struct list_head pages_from_pool; struct page_info *info, *tmp_info; unsigned int nents_sync = 0; unsigned long size_remaining = PAGE_ALIGN(size); unsigned int max_order = orders[0]; int vmid = get_secure_vmid(buffer->flags); if (size / PAGE_SIZE > totalram_pages / 2) if (ion_heap_is_system_heap_type(buffer->heap->type) && is_secure_vmid_valid(vmid)) { pr_info("%s: System heap doesn't support secure allocations\n", INIT_LIST_HEAD(&pages_from_pool); while (size_remaining > 0) { if (is_secure_vmid_valid(vmid)) info = alloc_from_pool_preferred( sys_heap, buffer, size_remaining, info = alloc_largest_available( sys_heap, buffer, size_remaining, sz = (1 << info->order) * PAGE_SIZE; list_add_tail(&info->list, &pages_from_pool); list_add_tail(&info->list, &pages); ret = ion_heap_alloc_pages_mem(&data); table = kzalloc(sizeof(*table), GFP_KERNEL); goto err_free_data_pages; ret = sg_alloc_table(table, i, GFP_KERNEL); ret = sg_alloc_table(&table_sync, nents_sync, GFP_KERNEL); sg_sync = table_sync.sgl; * We now have two separate lists. One list contains pages from the * pool and the other pages from buddy. We want to merge these * together while preserving the ordering of the pages (higher order info = list_first_entry_or_null(&pages, struct page_info, list); tmp_info = list_first_entry_or_null(&pages_from_pool, if (info->order >= tmp_info->order) { i = process_info(info, sg, sg_sync, &data, i); sg_sync = sg_next(sg_sync); i = process_info(tmp_info, sg, 0, 0, i); i = process_info(info, sg, sg_sync, &data, i); sg_sync = sg_next(sg_sync); i = process_info(tmp_info, sg, 0, 0, i); ret = ion_hyp_assign_sg(&table_sync, &vmid, 1, true); buffer->sg_table = table; sg_free_table(&table_sync); ion_heap_free_pages_mem(&data); /* We failed to zero buffers. Bypass pool */ buffer->private_flags |= ION_PRIV_FLAG_SHRINKER_FREE; ion_hyp_unassign_sg(table, &vmid, 1, true, false); for_each_sg(table->sgl, sg, table->nents, i) free_buffer_page(sys_heap, buffer, sg_page(sg), sg_free_table(&table_sync); ion_heap_free_pages_mem(&data); list_for_each_entry_safe(info, tmp_info, &pages, list) { free_buffer_page(sys_heap, buffer, info->page, info->order); list_for_each_entry_safe(info, tmp_info, &pages_from_pool, list) { free_buffer_page(sys_heap, buffer, info->page, info->order);
ion_system_heap_allocate 函數(shù)比較長,,此函數(shù)的重點我覺的是 while 這塊代碼
while (size_remaining > 0) { if (is_secure_vmid_valid(vmid)) info = alloc_from_pool_preferred( sys_heap, buffer, size_remaining, info = alloc_largest_available( sys_heap, buffer, size_remaining, sz = (1 << info->order) * PAGE_SIZE; list_add_tail(&info->list, &pages_from_pool); list_add_tail(&info->list, &pages); ret = ion_heap_alloc_pages_mem(&data);
size_remaining 還是頁對齊的 unsigned long size_remaining = PAGE_ALIGN(size);
整個while函數(shù)就是不斷的從pool或者伙伴系統(tǒng)中取物理頁面,每次取完后size_remaining 減去對應的大小,,不斷的重復直到最后size_remaining 為0,,代表需要的buffer 已經(jīng)全部取出。剛開始分配buffer的時候pool中是沒有buffer進行分配的,,是調(diào)用linux函數(shù)接口從伙伴系統(tǒng)中分配的,。
while中根據(jù)is_secure_vmid_valid 進行了判斷調(diào)用了不同的分配函數(shù)alloc_from_pool_preferred函數(shù)主要是從secure pool 取分配。
static struct page_info *alloc_from_pool_preferred( struct ion_system_heap *heap, struct ion_buffer *buffer, unsigned long size, unsigned int max_order) if (buffer->flags & ION_FLAG_POOL_FORCE_ALLOC) info = kmalloc(sizeof(*info), GFP_KERNEL); for (i = 0; i < NUM_ORDERS; i++) { if (size < order_to_size(orders[i])) if (max_order < orders[i]) page = alloc_from_secure_pool_order(heap, buffer, orders[i]); INIT_LIST_HEAD(&info->list); page = split_page_from_secure_pool(heap, buffer); INIT_LIST_HEAD(&info->list); return alloc_largest_available(heap, buffer, size, max_order);
ION_FLAG_POOL_FORCE_ALLOC 判斷了是否調(diào)用強制分配,,如果強制分配會調(diào)用alloc_largest_available函數(shù)最后會直接帶調(diào)用linux 函數(shù)從伙伴系統(tǒng)中分配物理頁面,。關于struct page 這個結(jié)構體的介紹可以參考《Linux 物理內(nèi)存描述》鏈接
alloc_from_pool_preferred 核心是for循環(huán),這里通過for 尋找合理的物理頁面大小取分配,,我們知道在伙伴系統(tǒng)是哈希表維護了2 的order次方的物理頁面,,在所有的pool中頁存在這個原理,不過維護的通過數(shù)組的方式,,通常只有2 的0 次方,,和2的4次方。在
kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.h 中可以看到具體的定義
#ifndef CONFIG_ALLOC_BUFFERS_IN_4K_CHUNKS #if defined(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) static const unsigned int orders[] = {8, 4, 0}; static const unsigned int orders[] = {4, 0}; static const unsigned int orders[] = {0}; #define NUM_ORDERS ARRAY_SIZE(orders)
根據(jù)我的測試目前手機應該是走的 orders[] = {4, 0}; 也就是說申請的物理頁面時4k 或者時64k,。
回到alloc_from_pool_preferred函數(shù)中的for循環(huán),,假定時 orders[] = {4, 0}
static inline unsigned int order_to_size(int order)
{
return PAGE_SIZE << order;
}
PAGE_SIZE 是物理頁面大小,一般默認都是4k,,armv8是支持物理頁面4k,,16k,64k,。假定系統(tǒng)用的4k,,那么開始時候就是2的4次放 乘以16 就是64k。if (size < order_to_size(orders[i])) 這句代碼首先判斷了要分配的頁面大小是否小于64k,,如果小于那就不從這個order對應的數(shù)組分,。因為此order存放的都是連續(xù)的64K 的物理頁面如果分配的buffer比64k小那么以為著必須拆分才行,物理頁面分配都是已經(jīng)找最合適的大小,。所以這里size比order_to_size 小會直接continue 跳過后面?zhèn)兝^續(xù)從order中找,。64k后就是4k頁面理論上通過頁向上對齊不會有比這個頁面還小的了。如果 orders[] 不是4,,0 ,,設置更多的數(shù)16,8,,4,,for循環(huán)會遍歷查找,,如果最后不是2的 0次方,比如是2 的1次方那么還存在for循環(huán)還是找不合適的orders問題,,所以會跳出for循環(huán)進行也頁面分割,,從大的物理頁面中分出合適的。調(diào)用split_page_from_secure_pool函數(shù),。
struct page *split_page_from_secure_pool(struct ion_system_heap *heap, struct ion_buffer *buffer) mutex_lock(&heap->split_page_mutex); * Someone may have just split a page and returned the unused portion * back to the pool, so try allocating from the pool one more time * before splitting. We want to maintain large pages sizes when page = alloc_from_secure_pool_order(heap, buffer, 0); for (i = NUM_ORDERS - 2; i >= 0; i--) { page = alloc_from_secure_pool_order(heap, buffer, order); * Return the remaining order-0 pages to the pool. * SetPagePrivate flag to mark memory as secure. for (j = 1; j < (1 << order); j++) { SetPagePrivate(page + j); free_buffer_page(heap, buffer, page + j, 0); mutex_unlock(&heap->split_page_mutex);
page = alloc_from_secure_pool_order(heap, buffer, 0); 從order 數(shù)組0 中分配一個頁,,也就是此時pool中最后的物理頁面。這里的設計思想我猜是如果order[0]都無法分配出來就直接報錯,,下面for 循環(huán)應該是像注釋說的多次嘗試,。split_page 位于
kernel\msm-4.14\mm\page_alloc.c page_alloc.c 存放伙伴系統(tǒng)的核心的接口函數(shù)后面還會用里面的分配內(nèi)存的函數(shù)。split_page函數(shù)沒太看懂內(nèi)核中的實現(xiàn),。split_page_from_secure_pool 從物理頁面分割出來的出來的頁面會在最后放到info中
page = split_page_from_secure_pool(heap, buffer); INIT_LIST_HEAD(&info->list);
回到alloc_from_pool_preferred函數(shù)中繼續(xù)看alloc_from_secure_pool_order 函數(shù)的執(zhí)行
struct page *alloc_from_secure_pool_order(struct ion_system_heap *heap, struct ion_buffer *buffer, int vmid = get_secure_vmid(buffer->flags); struct ion_page_pool *pool; if (!is_secure_vmid_valid(vmid)) pool = heap->secure_pools[vmid][order_to_index(order)]; return ion_page_pool_alloc_pool_only(pool);
函數(shù)比較簡單主要是根據(jù)order找到對應的pool,,然后調(diào)用
* Tries to allocate from only the specified Pool and returns NULL otherwise struct page *ion_page_pool_alloc_pool_only(struct ion_page_pool *pool) struct page *page = NULL; if (mutex_trylock(&pool->mutex)) { page = ion_page_pool_remove(pool, true); else if (pool->low_count) page = ion_page_pool_remove(pool, false); mutex_unlock(&pool->mutex);
函數(shù)從pool中取page。這里分為高端內(nèi)存和低端,,如果是4G內(nèi)存空間 那么高端內(nèi)存是指系統(tǒng)使用的3G-4G空間,,這里使用高低內(nèi)存是在從linux 伙伴系統(tǒng)取時候賦值給pool的。
回到ion_system_heap_allocate 的while函數(shù)中,,如果不是從secure pool分配buffer,。那么會調(diào)用alloc_largest_available函數(shù)
static struct page_info *alloc_largest_available(struct ion_system_heap *heap, struct ion_buffer *buffer, info = kmalloc(sizeof(*info), GFP_KERNEL); for (i = 0; i < NUM_ORDERS; i++) { if (size < order_to_size(orders[i])) if (max_order < orders[i]) from_pool = !(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC); page = alloc_buffer_page(heap, buffer, orders[i], &from_pool); info->from_pool = from_pool; INIT_LIST_HEAD(&info->list);
這里ION_FLAG_POOL_FORCE_ALLOC也判斷了是否需要強制分配如果需要強制分配那么將不會從pool分配。然后調(diào)用alloc_buffer_page函數(shù)
static struct page *alloc_buffer_page(struct ion_system_heap *heap, struct ion_buffer *buffer, bool cached = ion_buffer_cached(buffer); struct ion_page_pool *pool; int vmid = get_secure_vmid(buffer->flags); struct device *dev = heap->heap.priv; pool = heap->secure_pools[vmid][order_to_index(order)]; pool = heap->uncached_pools[order_to_index(order)]; pool = heap->cached_pools[order_to_index(order)]; page = ion_page_pool_alloc(pool, from_pool); if ((MAKE_ION_ALLOC_DMA_READY && vmid <= 0) || !(*from_pool)) ion_pages_sync_for_device(dev, page, PAGE_SIZE << order,
這里根據(jù)從那個pool 中分配獲得了pool 然后調(diào)用了ion_page_pool_alloc函數(shù)同時將pool和是否需要從pool傳遞下去,。
struct page *ion_page_pool_alloc(struct ion_page_pool *pool, bool *from_pool) struct page *page = NULL; if (fatal_signal_pending(current)) if (*from_pool && mutex_trylock(&pool->mutex)) { page = ion_page_pool_remove(pool, true); else if (pool->low_count) page = ion_page_pool_remove(pool, false); mutex_unlock(&pool->mutex); page = ion_page_pool_alloc_pages(pool);
如果從pool中分配page失敗或者不需要從pool分配那么將會調(diào)用ion_page_pool_alloc_pages函數(shù),。ion_page_pool_alloc_pages實際上是調(diào)用了linux 伙伴系統(tǒng)分配接口
static void *ion_page_pool_alloc_pages(struct ion_page_pool *pool) struct page *page = alloc_pages(pool->gfp_mask, pool->order);
回到ion_system_heap_allocate函數(shù)中的while部分
sz = (1 << info->order) * PAGE_SIZE; list_add_tail(&info->list, &pages_from_pool); list_add_tail(&info->list, &pages);
由于分配出來的page都保存到在info中,根據(jù)是否是從pool中分配的會加入到不同的鏈表中,,info中的order 保存的是2的幾次方,,將它乘以物理頁面大小,就會得到這次分配buffer大小,,然后用總的減去這次分配出來的(size_remaining -= sz;)在while后面就是將page加入到page表中,。
這里第一次使用pool中都是沒有page 的都是從linux 伙伴系統(tǒng)中那出來,,pool 存放的page 是在釋放page 的時候保存到里面的,。
回到ion_alloc_fd 函數(shù),在產(chǎn)生dma-buf 后需要根據(jù)這個dma-buf產(chǎn)生fd調(diào)用
526int dma_buf_fd(struct dma_buf *dmabuf, int flags) 530 if (!dmabuf || !dmabuf->file) 533 fd = get_unused_fd_flags(flags); 537 fd_install(fd, dmabuf->file);
這里調(diào)用了linux 提供的函數(shù) get_unused_fd_flags獲得一個fd號,,然后將dma-buf 的file 和fd綁定,。
這個struct file 的獲取是在前面ion_alloc_dmabuf函數(shù)中,最后在獲取完成buffer后調(diào)用了dma_buf_export函數(shù),,這個函數(shù)
87 file = anon_inode_getfile(bufname, &dma_buf_fops, dmabuf,
可以看到申請file 并且綁定了前面說道的dma_buf_ops 這樣實際上通過fd就可以調(diào)用dma_buf_ops,。
2.內(nèi)存釋放
void ion_system_heap_free(struct ion_buffer *buffer) struct ion_heap *heap = buffer->heap; struct ion_system_heap *sys_heap = container_of(heap, struct sg_table *table = buffer->sg_table; int vmid = get_secure_vmid(buffer->flags); if (!(buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE) && !(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) { ion_heap_buffer_zero(buffer); if (ion_hyp_unassign_sg(table, &vmid, 1, true, false)) for_each_sg(table->sgl, sg, table->nents, i) free_buffer_page(sys_heap, buffer, sg_page(sg),
此函數(shù)前面是一些變量的判斷,重點在for_each_sg 將散列表中的物理頁調(diào)用free_buffer_page 函數(shù)釋放,。
* For secure pages that need to be freed and not added back to the pool; the * hyp_unassign should be called before calling this function void free_buffer_page(struct ion_system_heap *heap, struct ion_buffer *buffer, struct page *page, bool cached = ion_buffer_cached(buffer); int vmid = get_secure_vmid(buffer->flags); if (!(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) { struct ion_page_pool *pool; pool = heap->secure_pools[vmid][order_to_index(order)]; pool = heap->cached_pools[order_to_index(order)]; pool = heap->uncached_pools[order_to_index(order)]; if (buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE) ion_page_pool_free_immediate(pool, page); ion_page_pool_free(pool, page); __free_pages(page, order);
獲得對應的pool然后調(diào)用了
void ion_page_pool_free(struct ion_page_pool *pool, struct page *page) ret = ion_page_pool_add(pool, page); ion_page_pool_free_pages(pool, page);
這是將page保存到了pool中,,但是如果系統(tǒng)內(nèi)存不夠此時需要ion中的heap 將pool存放的page 還給伙伴系統(tǒng),。執(zhí)行這個回收過程的是shrink函數(shù)
static int ion_system_heap_shrink(struct ion_heap *heap, gfp_t gfp_mask, struct ion_system_heap *sys_heap; struct ion_page_pool *pool; sys_heap = container_of(heap, struct ion_system_heap, heap); for (i = 0; i < NUM_ORDERS; i++) { for (j = 0; j < VMID_LAST; j++) { if (is_secure_vmid_valid(j)) nr_freed += ion_secure_page_pool_shrink( sys_heap, j, i, nr_to_scan); pool = sys_heap->uncached_pools[i]; nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan); pool = sys_heap->cached_pools[i]; nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan);
函數(shù)頁比較簡單,除了一些數(shù)據(jù)統(tǒng)計,,最重要的就是調(diào)用ion_page_pool_shrink函數(shù),,函數(shù)里面原理就是從pool中取page,然后調(diào)用
static void ion_page_pool_free_pages(struct ion_page_pool *pool, __free_pages(page, pool->order);
__free_pages 函數(shù)又是Linux 伙伴系統(tǒng)接口,,位于kernel\msm-4.14\mm\page_alloc.c
system heap的 內(nèi)存映射是在dma-buf 的ops中調(diào)用ion_heap_map_user 函數(shù),,此函數(shù)有個非常重要的參數(shù)struct vm_area_struct,它是進程虛擬內(nèi)存管理的,,其中有一些比較重要的變量,,理解了這些變量的含義,理解下邊的代碼就非常簡單了,,首先看此結(jié)構體的定義,,代碼位于kernel\msm-4.14\include\linux\mm_types.h
* This struct defines a memory VMM memory area. There is one of these * per VM-area/task. A VM area is any part of the process virtual memory * space that has a special rule for the page-fault handlers (ie a shared * library, the executable area etc). /* The first cache line has the info for VMA tree walking. */ unsigned long vm_start; /* Our start address within vm_mm. */ unsigned long vm_end; /* The first byte after our end address /* linked list of VM areas per task, sorted by address */ struct vm_area_struct *vm_next, *vm_prev; * Largest free memory gap in bytes to the left of this VMA. * Either between this VMA and vma->vm_prev, or between one of the * VMAs below us in the VMA rbtree and its ->vm_prev. This helps * get_unmapped_area find a free area of the right size. unsigned long rb_subtree_gap; /* Second cache line starts here. */ struct mm_struct *vm_mm; /* The address space we belong to. */ pgprot_t vm_page_prot; /* Access permissions of this VMA. */ unsigned long vm_flags; /* Flags, see mm.h. */ * For areas with an address space and backing store, * linkage into the address_space->i_mmap interval tree. * For private anonymous mappings, a pointer to a null terminated string * in the user process containing the name given to the vma, or NULL unsigned long rb_subtree_last; const char __user *anon_name; * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma * list, after a COW of one of the file pages. A MAP_SHARED vma * can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack * or brk vma (with NULL file) can only be in an anon_vma list. struct list_head anon_vma_chain; /* Serialized by mmap_sem & struct anon_vma *anon_vma; /* Serialized by page_table_lock */ /* Function pointers to deal with this struct. */ const struct vm_operations_struct *vm_ops; /* Information about our backing store: */ unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE struct file * vm_file; /* File we map to (can be NULL). */ void * vm_private_data; /* was vm_pte (shared mem) */ atomic_long_t swap_readahead_info; struct vm_region *vm_region; /* NOMMU mapping region */ struct mempolicy *vm_policy; /* NUMA policy for the VMA */ struct vm_userfaultfd_ctx vm_userfaultfd_ctx; #ifdef CONFIG_SPECULATIVE_PAGE_FAULT atomic_t vm_ref_count; /* see vma_get(), vma_put() */
該結(jié)構體體作用可以參考https://linux-kernel-labs./master/labs/memory_mapping.html 文章, 在用戶進程調(diào)用mmap函數(shù)時候會創(chuàng)建這個結(jié)構,。它描述的是物理頁對應的虛擬內(nèi)存,,它描述的是一段連續(xù)的、具有相同訪問屬性的虛存空間,,該虛存空間的大小為物理內(nèi)存頁面的整數(shù)倍,,結(jié)構體中每個成員的含義可以參考文章https://blog.csdn.net/ganggexiongqi/article/details/6746248
vm_start 是在進程中虛擬地址的起始地址。
int ion_heap_map_user(struct ion_heap *heap, struct ion_buffer *buffer, struct vm_area_struct *vma) struct sg_table *table = buffer->sg_table; unsigned long addr = vma->vm_start; unsigned long offset = vma->vm_pgoff * PAGE_SIZE; for_each_sg(table->sgl, sg, table->nents, i) { struct page *page = sg_page(sg); unsigned long remainder = vma->vm_end - addr; unsigned long len = sg->length; if (offset >= sg->length) { page += offset / PAGE_SIZE; len = sg->length - offset; len = min(len, remainder); ret = remap_pfn_range(vma, addr, page_to_pfn(page), len,
回到代碼中addr = vma->vm_start 保存了虛擬地址的其實地址,,vm_pgoff是該虛存空間起始地址在vm_file文件里面的文件偏移,,單位為物理頁面。比如現(xiàn)在有64個物理頁面,,用戶在映射的時候使用第5個頁面開始映射10個頁面,,那么這個vm_pgoff應該就是5.for_each_sg 代碼主要是將sg散列表中存放的物理頁面拿出來進行映射,首先看offset >= sg->length 這句代碼,,為什么要判斷,,如果offset 是便宜6個物理頁面,當時這個sg只存放了5個物理頁面,,現(xiàn)在我們正??隙ㄊ窃谙乱粋€sg中在取一個頁面構成,6個頁面,,所以
下面相關代碼就是做這部分功能
if (offset >= sg->length) {
87 offset -= sg->length;
88 continue;
89 } else if (offset) {
90 page += offset / PAGE_SIZE;
91 len = sg->length - offset;
92 offset = 0;
93 }
我們假設下一個sg有三個物理頁面,,那么我們只需要在這個sg上page +1 就可以。現(xiàn)在offset就是1,,在if 執(zhí)行過程中 offset -= sg->length,,這里其實已經(jīng)6-5了。 len 變量就變成了3 -1 變成了2 個,。offfset 因為后面不在需要所以設置為0,, 我們需要將這兩個進行映射,,所以下面調(diào)用了linux 內(nèi)核的remap_pfn_range的函數(shù),此函數(shù)網(wǎng)上資料很多,。映射到用戶函數(shù)這里也就執(zhí)行完成了
|