android ion

昵稱3554661 2020-04-07

展開全文

android 在linux 4.12 內(nèi)核對ion驅(qū)動的api 進行了修改,，原來的一部分ioctl命令已經(jīng)不存在了,。

谷歌的ion 我個人覺的還是挺大的，system heap 內(nèi)存分配的方式,，其他的還有使用cma 分配等,，不同的分配方式會調(diào)用linux不同的接口。這篇文章值只寫下自己對system heap 的個人理解,。ion相關代碼在內(nèi)核kernel\msm-4.14\drivers\staging\android\ion 路徑下無論Android ion 最后調(diào)用那種heap 來分配內(nèi)存,。分配的buffer 都是放在linux dma-buf 這個結(jié)構中，dma-buf 是linux 中的一個框架,，具體代碼我并沒有仔細去研究,，根據(jù)ion中的使用來看，每個ion在分配的buffer 會存在dma-buf這個結(jié)構中,，然后谷歌對這個buffer還有操作函數(shù)集ops ,，也放到dma-buf中，在使用這個buffer時候?qū)嶋H上是間接調(diào)用dma-buf ops 來對這個buffer操作了,，然后這個ops 函數(shù)在去調(diào)用heap 綁定的ops去實現(xiàn),。比如system heap，heap 創(chuàng)建時綁定了alloc,。mmap,，free，shrink等函數(shù),。dma-buf ops會最終調(diào)用這些函數(shù),。

在ion.c 文件中能夠看到dma-buf ops 谷歌的實現(xiàn)

static const struct dma_buf_ops dma_buf_ops = {
	.map_dma_buf = ion_map_dma_buf,
	.unmap_dma_buf = ion_unmap_dma_buf,
	.mmap = ion_mmap,
	.release = ion_dma_buf_release,
	.attach = ion_dma_buf_attach,
	.detach = ion_dma_buf_detatch,
	.begin_cpu_access = ion_dma_buf_begin_cpu_access,
	.end_cpu_access = ion_dma_buf_end_cpu_access,
	.begin_cpu_access_umapped = ion_dma_buf_begin_cpu_access_umapped,
	.end_cpu_access_umapped = ion_dma_buf_end_cpu_access_umapped,
	.begin_cpu_access_partial = ion_dma_buf_begin_cpu_access_partial,
	.end_cpu_access_partial = ion_dma_buf_end_cpu_access_partial,
	.map_atomic = ion_dma_buf_kmap,
	.unmap_atomic = ion_dma_buf_kunmap,
	.map = ion_dma_buf_kmap,
	.unmap = ion_dma_buf_kunmap,
	.vmap = ion_dma_buf_vmap,
	.vunmap = ion_dma_buf_vunmap,
	.get_flags = ion_dma_buf_get_flags,
};

在ion.h 中能夠看到heap 必須實現(xiàn)的函數(shù)的定義

/**
 * struct ion_heap_ops - ops to operate on a given heap
 * @allocate:		allocate memory
 * @free:		free memory
 * @map_kernel		map memory to the kernel
 * @unmap_kernel	unmap memory to the kernel
 * @map_user		map memory to userspace
 *
 * allocate, phys, and map_user return 0 on success, -errno on error.
 * map_dma and map_kernel return pointer on success, ERR_PTR on
 * error. @free will be called with ION_PRIV_FLAG_SHRINKER_FREE set in
 * the buffer's private_flags when called from a shrinker. In that
 * case, the pages being free'd must be truly free'd back to the
 * system, not put in a page pool or otherwise cached.
 */
struct ion_heap_ops {
	int (*allocate)(struct ion_heap *heap,
			struct ion_buffer *buffer, unsigned long len,
			unsigned long flags);
	void (*free)(struct ion_buffer *buffer);
	void * (*map_kernel)(struct ion_heap *heap, struct ion_buffer *buffer);
	void (*unmap_kernel)(struct ion_heap *heap, struct ion_buffer *buffer);
	int (*map_user)(struct ion_heap *mapper, struct ion_buffer *buffer,
			struct vm_area_struct *vma);
	int (*shrink)(struct ion_heap *heap, gfp_t gfp_mask, int nr_to_scan);
};

在正式進入到分配內(nèi)存給ion環(huán)節(jié)前，有一些概念應該時要了解的,，struct sg_table 此結(jié)構時linux中保存物理頁面散列表的,。具體解釋建議看蝸窩科技的這篇文章Linux kernel scatterlist API介紹，簡單的接受就是此結(jié)構保存了物理頁面的散列表,，system 在分配的時候并不是分配出來的時一個連續(xù)的物理頁面,，可以不連續(xù)，只要虛擬地址連續(xù)就可以,，比如camera申請了12M的buffer,，此時從伙伴中拿出來的buffer 可能時多個64K的頁面。64k內(nèi)部時連續(xù)的,，當時64k頁面之間并不是連續(xù)的,。

伙伴系統(tǒng)：這個晚上資料很多，概念也比較簡單,，伙伴系統(tǒng)通過哈希表來管理物理內(nèi)存,。分配的時候根據(jù)2的order (幾)次方分配對應的物理頁面數(shù),。

文件描述符fd，ion分配內(nèi)存后最后返回的是fd,，fd通過binder傳輸?shù)讲煌倪M程,，然后在映射成進程的虛擬地址。fd 只能在一個進程內(nèi)使用,，傳遞到其他進程時時通過Android 的binder 機制,，簡單概括就是binder首先從要從其他進程分配個fd，然后讓當前的進程fd對應的內(nèi)核的file 結(jié)構體和其他進程的fd綁定,。

1.內(nèi)存分配

ion 系統(tǒng)分配內(nèi)存時在打開設備后調(diào)用ioctl函數(shù)實現(xiàn)的

case ION_IOC_ALLOC:
	{
		int fd;
		fd = ion_alloc_fd(data.allocation.len,
				  data.allocation.heap_id_mask,
				  data.allocation.flags);
		if (fd < 0)
			return fd;
		data.allocation.fd = fd;
		break;
	}

可以看到調(diào)用了ion_alloc_fd函數(shù)產(chǎn)生了一個fd,，ion_alloc_fd函數(shù)有三個參數(shù)，第一個參數(shù)時分配的buffer長度,，第二個時heap的選擇,，ion中有很多heap類型，本文只將system heap（其他heap 代碼看起來比較難）,，第三個參數(shù)時標志位,，在分配buffer的時候還有很多屬性通過這個標志位來判斷，比如分配的是否時camer內(nèi)存,，是否需要安全內(nèi)存分配,。函數(shù)ion_alloc_fd 實現(xiàn)如下：

int ion_alloc_fd(size_t len, unsigned int heap_id_mask, unsigned int flags)
{
	int fd;
	struct dma_buf *dmabuf;
	dmabuf = ion_alloc_dmabuf(len, heap_id_mask, flags);
	if (IS_ERR(dmabuf)) {
		return PTR_ERR(dmabuf);
	}
	fd = dma_buf_fd(dmabuf, O_CLOEXEC);
	if (fd < 0)
		dma_buf_put(dmabuf);
	return fd;
}

首先是產(chǎn)生產(chǎn)生了一個dma_buf 然后將這個dma-buf 轉(zhuǎn)換成fd。dma-buf 定義位于kernel\msm-4.14\include\linux\dma-buf.h文章將中,，每個變量的含義官方有解釋：

/**
 * struct dma_buf - shared buffer object
 * @size: size of the buffer
 * @file: file pointer used for sharing buffers across, and for refcounting.
 * @attachments: list of dma_buf_attachment that denotes all devices attached.
 * @ops: dma_buf_ops associated with this buffer object.
 * @lock: used internally to serialize list manipulation, attach/detach and vmap/unmap
 * @vmapping_counter: used internally to refcnt the vmaps
 * @vmap_ptr: the current vmap ptr if vmapping_counter > 0
 * @exp_name: name of the exporter; useful for debugging.
 * @name: unique name for the buffer
 * @ktime: time (in jiffies) at which the buffer was born
 * @owner: pointer to exporter module; used for refcounting when exporter is a
 *         kernel module.
 * @list_node: node for dma_buf accounting and debugging.
 * @priv: exporter specific private data for this buffer object.
 * @resv: reservation object linked to this dma-buf
 * @poll: for userspace poll support
 * @cb_excl: for userspace poll support
 * @cb_shared: for userspace poll support
 *
 * This represents a shared buffer, created by calling dma_buf_export(). The
 * userspace representation is a normal file descriptor, which can be created by
 * calling dma_buf_fd().
 *
 * Shared dma buffers are reference counted using dma_buf_put() and
 * get_dma_buf().
 *
 * Device DMA access is handled by the separate &struct dma_buf_attachment.
 */
struct dma_buf {
	size_t size;
	struct file *file;
	struct list_head attachments;
	const struct dma_buf_ops *ops;
	struct mutex lock;
	unsigned vmapping_counter;
	void *vmap_ptr;
	const char *exp_name;
	char *name;
	ktime_t ktime;
	struct module *owner;
	struct list_head list_node;
	void *priv;
	struct reservation_object *resv;
	/* poll support */
	wait_queue_head_t poll;
	struct dma_buf_poll_cb_t {
		struct dma_fence_cb cb;
		wait_queue_head_t *poll;
		unsigned long active;
	} cb_excl, cb_shared;
	struct list_head refs;
};

struct file 這個比較重要,，這個會涉及將來的fd,，實際上fd 是和struct file 連接起來的,。 fd可以多個使用同一個struct file 者也是mmap映射fd 時候能夠映射為多個虛擬地址的原因。

ion_alloc_dmabuf函數(shù)位于kernel\msm-4.14\drivers\staging\android\ion\ion.c 文件中：

struct dma_buf *ion_alloc_dmabuf(size_t len, unsigned int heap_id_mask,
				 unsigned int flags)
{
	struct ion_device *dev = internal_dev;
	struct ion_buffer *buffer = NULL;
	struct ion_heap *heap;
	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
	struct dma_buf *dmabuf;
	char task_comm[TASK_COMM_LEN];
	pr_debug("%s: len %zu heap_id_mask %u flags %x\n", __func__,
		 len, heap_id_mask, flags);
	/*
	 * traverse the list of heaps available in this system in priority
	 * order.  If the heap type is supported by the client, and matches the
	 * request of the caller allocate from it.  Repeat until allocate has
	 * succeeded or all heaps have been tried
	 */
	len = PAGE_ALIGN(len);
	if (!len)
		return ERR_PTR(-EINVAL);
	down_read(&dev->lock);
	plist_for_each_entry(heap, &dev->heaps, node) {
		/* if the caller didn't specify this heap id */
		if (!((1 << heap->id) & heap_id_mask))
			continue;
		buffer = ion_buffer_create(heap, dev, len, flags);
		if (!IS_ERR(buffer) || PTR_ERR(buffer) == -EINTR)
			break;
	}
	up_read(&dev->lock);
	if (!buffer)
		return ERR_PTR(-ENODEV);
	if (IS_ERR(buffer))
		return ERR_CAST(buffer);
	get_task_comm(task_comm, current->group_leader);
	exp_info.ops = &dma_buf_ops;
	exp_info.size = buffer->size;
	exp_info.flags = O_RDWR;
	exp_info.priv = buffer;
	exp_info.exp_name = kasprintf(GFP_KERNEL, "%s-%s-%d-%s", KBUILD_MODNAME,
				      heap->name, current->tgid, task_comm);
	dmabuf = dma_buf_export(&exp_info);
	if (IS_ERR(dmabuf)) {
		_ion_buffer_destroy(buffer);
		kfree(exp_info.exp_name);
	}
	return dmabuf;
}

PAGE_ALIGN 這個宏長度的頁面對齊（向上對齊）,，分配的buffer的大小假如是5K這里是將轉(zhuǎn)換成8K,，因為頁面時以4k為大小的，與之對應的還有向下對齊,，5k將轉(zhuǎn)換為4k,。

plist_for_each_entry 將從所有的heap中查找對應的heap 類型，并執(zhí)行這個heap對應的分配buffer函數(shù),，這里我們假定這個heap時system heap,。

在手機中查看system heap相關的信息，在adb shell 進入/sys/kernel/debug/ion/heaps

執(zhí)行cat system

uncached pool = 349003776 cached pool = 1063071744 secure pool = 0
pool total (uncached + cached + secure) = 1412075520

可以看到system heap中有三個pool ,，這三個pool是谷歌設置的三個存放物理頁面的池,。也可以自己加pool。

找到對應的heap后開始執(zhí)行ion_buffer_create函數(shù)創(chuàng)建ions buffer,，定義位于kernel\msm-4.14\drivers\staging\android\ion\ion.h

/**
 * struct ion_buffer - metadata for a particular buffer
 * @ref:		reference count
 * @node:		node in the ion_device buffers tree
 * @dev:		back pointer to the ion_device
 * @heap:		back pointer to the heap the buffer came from
 * @flags:		buffer specific flags
 * @private_flags:	internal buffer specific flags
 * @size:		size of the buffer
 * @priv_virt:		private data to the buffer representable as
 *			a void *
 * @lock:		protects the buffers cnt fields
 * @kmap_cnt:		number of times the buffer is mapped to the kernel
 * @vaddr:		the kernel mapping if kmap_cnt is not zero
 * @sg_table:		the sg table for the buffer if dmap_cnt is not zero
 * @vmas:		list of vma's mapping this buffer
 */
struct ion_buffer {
	union {
		struct rb_node node;
		struct list_head list;
	};
	struct ion_device *dev;
	struct ion_heap *heap;
	unsigned long flags;
	unsigned long private_flags;
	size_t size;
	void *priv_virt;
	/* Protect ion buffer */
	struct mutex lock;
	int kmap_cnt;
	void *vaddr;
	struct sg_table *sg_table;
	struct list_head attachments;
	struct list_head vmas;
};

前面介紹的struct sg_table 就放在ion buffer中,，用來保存物理頁面散列表。

/* this function should only be called while dev->lock is held */
static struct ion_buffer *ion_buffer_create(struct ion_heap *heap,
					    struct ion_device *dev,
					    unsigned long len,
					    unsigned long flags)
{
	struct ion_buffer *buffer;
	struct sg_table *table;
	int ret;
	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
	if (!buffer)
		return ERR_PTR(-ENOMEM);
	buffer->heap = heap;
	buffer->flags = flags;
	ret = heap->ops->allocate(heap, buffer, len, flags);
	if (ret) {
		if (!(heap->flags & ION_HEAP_FLAG_DEFER_FREE))
			goto err2;
		if (ret == -EINTR)
			goto err2;
		ion_heap_freelist_drain(heap, 0);
		ret = heap->ops->allocate(heap, buffer, len, flags);
		if (ret)
			goto err2;
	}
	if (buffer->sg_table == NULL) {
		WARN_ONCE(1, "This heap needs to set the sgtable");
		ret = -EINVAL;
		goto err1;
	}
	spin_lock(&heap->stat_lock);
	heap->num_of_buffers++;
	heap->num_of_alloc_bytes += len;
	if (heap->num_of_alloc_bytes > heap->alloc_bytes_wm)
		heap->alloc_bytes_wm = heap->num_of_alloc_bytes;
	spin_unlock(&heap->stat_lock);
	table = buffer->sg_table;
	buffer->dev = dev;
	buffer->size = len;
	buffer->dev = dev;
	buffer->size = len;
	INIT_LIST_HEAD(&buffer->attachments);
	INIT_LIST_HEAD(&buffer->vmas);
	mutex_init(&buffer->lock);
	if (IS_ENABLED(CONFIG_ION_FORCE_DMA_SYNC)) {
		int i;
		struct scatterlist *sg;
		/*
		 * this will set up dma addresses for the sglist -- it is not
		 * technically correct as per the dma api -- a specific
		 * device isn't really taking ownership here.  However, in
		 * practice on our systems the only dma_address space is
		 * physical addresses.
		 */
		for_each_sg(table->sgl, sg, table->nents, i) {
			sg_dma_address(sg) = sg_phys(sg);
			sg_dma_len(sg) = sg->length;
		}
	}
	mutex_lock(&dev->buffer_lock);
	ion_buffer_add(dev, buffer);
	mutex_unlock(&dev->buffer_lock);
	atomic_long_add(len, &heap->total_allocated);
	return buffer;
err1:
	heap->ops->free(buffer);
err2:
	kfree(buffer);
	return ERR_PTR(ret);
}

此函數(shù)最主要的是通過ret = heap->ops->allocate(heap, buffer, len, flags);函數(shù)調(diào)用heap對應的分配函數(shù)。其他的代碼是一鏈表和sg_table的賦值,。

systeam 的alloc函數(shù)位于kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.c中

static struct ion_heap_ops system_heap_ops = {
	.allocate = ion_system_heap_allocate,
	.free = ion_system_heap_free,
	.map_kernel = ion_heap_map_kernel,
	.unmap_kernel = ion_heap_unmap_kernel,
	.map_user = ion_heap_map_user,
	.shrink = ion_system_heap_shrink,
};

allocate 實現(xiàn)函數(shù)是ion_system_heap_allocate 源碼如下：

static int ion_system_heap_allocate(struct ion_heap *heap,
				    struct ion_buffer *buffer,
				    unsigned long size,
				    unsigned long flags)
{
	struct ion_system_heap *sys_heap = container_of(heap,
							struct ion_system_heap,
							heap);
	struct sg_table *table;
	struct sg_table table_sync = {0};
	struct scatterlist *sg;
	struct scatterlist *sg_sync;
	int ret = -ENOMEM;
	struct list_head pages;
	struct list_head pages_from_pool;
	struct page_info *info, *tmp_info;
	int i = 0;
	unsigned int nents_sync = 0;
	unsigned long size_remaining = PAGE_ALIGN(size);
	unsigned int max_order = orders[0];
	struct pages_mem data;
	unsigned int sz;
	int vmid = get_secure_vmid(buffer->flags);
	if (size / PAGE_SIZE > totalram_pages / 2)
		return -ENOMEM;
	if (ion_heap_is_system_heap_type(buffer->heap->type) &&
	    is_secure_vmid_valid(vmid)) {
		pr_info("%s: System heap doesn't support secure allocations\n",
			__func__);
		return -EINVAL;
	}
	data.size = 0;
	INIT_LIST_HEAD(&pages);
	INIT_LIST_HEAD(&pages_from_pool);
	while (size_remaining > 0) {
		if (is_secure_vmid_valid(vmid))
			info = alloc_from_pool_preferred(
					sys_heap, buffer, size_remaining,
					max_order);
		else
			info = alloc_largest_available(
					sys_heap, buffer, size_remaining,
					max_order);
		if (IS_ERR(info)) {
			ret = PTR_ERR(info);
			goto err;
		}
		sz = (1 << info->order) * PAGE_SIZE;
		if (info->from_pool) {
			list_add_tail(&info->list, &pages_from_pool);
		} else {
			list_add_tail(&info->list, &pages);
			data.size += sz;
			++nents_sync;
		}
		size_remaining -= sz;
		max_order = info->order;
		i++;
	}
	ret = ion_heap_alloc_pages_mem(&data);
	if (ret)
		goto err;
	table = kzalloc(sizeof(*table), GFP_KERNEL);
	if (!table) {
		ret = -ENOMEM;
		goto err_free_data_pages;
	}
	ret = sg_alloc_table(table, i, GFP_KERNEL);
	if (ret)
		goto err1;
	if (nents_sync) {
		ret = sg_alloc_table(&table_sync, nents_sync, GFP_KERNEL);
		if (ret)
			goto err_free_sg;
	}
	i = 0;
	sg = table->sgl;
	sg_sync = table_sync.sgl;
	/*
	 * We now have two separate lists. One list contains pages from the
	 * pool and the other pages from buddy. We want to merge these
	 * together while preserving the ordering of the pages (higher order
	 * first).
	 */
	do {
		info = list_first_entry_or_null(&pages, struct page_info, list);
		tmp_info = list_first_entry_or_null(&pages_from_pool,
						    struct page_info, list);
		if (info && tmp_info) {
			if (info->order >= tmp_info->order) {
				i = process_info(info, sg, sg_sync, &data, i);
				sg_sync = sg_next(sg_sync);
			} else {
				i = process_info(tmp_info, sg, 0, 0, i);
			}
		} else if (info) {
			i = process_info(info, sg, sg_sync, &data, i);
			sg_sync = sg_next(sg_sync);
		} else if (tmp_info) {
			i = process_info(tmp_info, sg, 0, 0, i);
		}
		sg = sg_next(sg);
	} while (sg);
	if (nents_sync) {
		if (vmid > 0) {
			ret = ion_hyp_assign_sg(&table_sync, &vmid, 1, true);
			if (ret)
				goto err_free_sg2;
		}
	}
	buffer->sg_table = table;
	if (nents_sync)
		sg_free_table(&table_sync);
	ion_heap_free_pages_mem(&data);
	return 0;
err_free_sg2:
	/* We failed to zero buffers. Bypass pool */
	buffer->private_flags |= ION_PRIV_FLAG_SHRINKER_FREE;
	if (vmid > 0)
		ion_hyp_unassign_sg(table, &vmid, 1, true, false);
	for_each_sg(table->sgl, sg, table->nents, i)
		free_buffer_page(sys_heap, buffer, sg_page(sg),
				 get_order(sg->length));
	if (nents_sync)
		sg_free_table(&table_sync);
err_free_sg:
	sg_free_table(table);
err1:
	kfree(table);
err_free_data_pages:
	ion_heap_free_pages_mem(&data);
err:
	list_for_each_entry_safe(info, tmp_info, &pages, list) {
		free_buffer_page(sys_heap, buffer, info->page, info->order);
		kfree(info);
	}
	list_for_each_entry_safe(info, tmp_info, &pages_from_pool, list) {
		free_buffer_page(sys_heap, buffer, info->page, info->order);
		kfree(info);
	}
	return ret;
}

ion_system_heap_allocate 函數(shù)比較長,，此函數(shù)的重點我覺的是 while 這塊代碼

while (size_remaining > 0) {
		if (is_secure_vmid_valid(vmid))
			info = alloc_from_pool_preferred(
					sys_heap, buffer, size_remaining,
					max_order);
		else
			info = alloc_largest_available(
					sys_heap, buffer, size_remaining,
					max_order);
		if (IS_ERR(info)) {
			ret = PTR_ERR(info);
			goto err;
		}
		sz = (1 << info->order) * PAGE_SIZE;
		if (info->from_pool) {
			list_add_tail(&info->list, &pages_from_pool);
		} else {
			list_add_tail(&info->list, &pages);
			data.size += sz;
			++nents_sync;
		}
		size_remaining -= sz;
		max_order = info->order;
		i++;
	}
	ret = ion_heap_alloc_pages_mem(&data);

size_remaining 還是頁對齊的 unsigned long size_remaining = PAGE_ALIGN(size);

整個while函數(shù)就是不斷的從pool或者伙伴系統(tǒng)中取物理頁面，每次取完后size_remaining 減去對應的大小,，不斷的重復直到最后size_remaining 為0,，代表需要的buffer 已經(jīng)全部取出。剛開始分配buffer的時候pool中是沒有buffer進行分配的,，是調(diào)用linux函數(shù)接口從伙伴系統(tǒng)中分配的,。

while中根據(jù)is_secure_vmid_valid 進行了判斷調(diào)用了不同的分配函數(shù)alloc_from_pool_preferred函數(shù)主要是從secure pool 取分配。

static struct page_info *alloc_from_pool_preferred(
		struct ion_system_heap *heap, struct ion_buffer *buffer,
		unsigned long size, unsigned int max_order)
{
	struct page *page;
	struct page_info *info;
	int i;
	if (buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)
		goto force_alloc;
	info = kmalloc(sizeof(*info), GFP_KERNEL);
	if (!info)
		return ERR_PTR(-ENOMEM);
	for (i = 0; i < NUM_ORDERS; i++) {
		if (size < order_to_size(orders[i]))
			continue;
		if (max_order < orders[i])
			continue;
		page = alloc_from_secure_pool_order(heap, buffer, orders[i]);
		if (IS_ERR(page))
			continue;
		info->page = page;
		info->order = orders[i];
		info->from_pool = true;
		INIT_LIST_HEAD(&info->list);
		return info;
	}
	page = split_page_from_secure_pool(heap, buffer);
	if (!IS_ERR(page)) {
		info->page = page;
		info->order = 0;
		info->from_pool = true;
		INIT_LIST_HEAD(&info->list);
		return info;
	}
	kfree(info);
force_alloc:
	return alloc_largest_available(heap, buffer, size, max_order);
}

ION_FLAG_POOL_FORCE_ALLOC 判斷了是否調(diào)用強制分配,，如果強制分配會調(diào)用alloc_largest_available函數(shù)最后會直接帶調(diào)用linux 函數(shù)從伙伴系統(tǒng)中分配物理頁面,。關于struct page 這個結(jié)構體的介紹可以參考《Linux 物理內(nèi)存描述》鏈接

alloc_from_pool_preferred 核心是for循環(huán)，這里通過for 尋找合理的物理頁面大小取分配,，我們知道在伙伴系統(tǒng)是哈希表維護了2 的order次方的物理頁面,，在所有的pool中頁存在這個原理，不過維護的通過數(shù)組的方式,，通常只有2 的0 次方,，和2的4次方。在

kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.h 中可以看到具體的定義

#ifndef CONFIG_ALLOC_BUFFERS_IN_4K_CHUNKS
#if defined(CONFIG_IOMMU_IO_PGTABLE_ARMV7S)
static const unsigned int orders[] = {8, 4, 0};
#else
static const unsigned int orders[] = {4, 0};
#endif
#else
static const unsigned int orders[] = {0};
#endif
#define NUM_ORDERS ARRAY_SIZE(orders)

根據(jù)我的測試目前手機應該是走的 orders[] = {4, 0}; 也就是說申請的物理頁面時4k 或者時64k,。

回到alloc_from_pool_preferred函數(shù)中的for循環(huán),，假定時 orders[] = {4, 0}

static inline unsigned int order_to_size(int order)
{
return PAGE_SIZE << order;
}

PAGE_SIZE 是物理頁面大小，一般默認都是4k,，armv8是支持物理頁面4k,，16k，64k,。假定系統(tǒng)用的4k,，那么開始時候就是2的4次放乘以16 就是64k。if (size < order_to_size(orders[i])) 這句代碼首先判斷了要分配的頁面大小是否小于64k,，如果小于那就不從這個order對應的數(shù)組分,。因為此order存放的都是連續(xù)的64K 的物理頁面如果分配的buffer比64k小那么以為著必須拆分才行，物理頁面分配都是已經(jīng)找最合適的大小,。所以這里size比order_to_size 小會直接continue 跳過后面?zhèn)兝^續(xù)從order中找,。64k后就是4k頁面理論上通過頁向上對齊不會有比這個頁面還小的了。如果 orders[] 不是4,，0 ,，設置更多的數(shù)16，8,，4,，for循環(huán)會遍歷查找,，如果最后不是2的 0次方，比如是2 的1次方那么還存在for循環(huán)還是找不合適的orders問題,，所以會跳出for循環(huán)進行也頁面分割,，從大的物理頁面中分出合適的。調(diào)用split_page_from_secure_pool函數(shù),。

struct page *split_page_from_secure_pool(struct ion_system_heap *heap,
					 struct ion_buffer *buffer)
{
	int i, j;
	struct page *page;
	unsigned int order;
	mutex_lock(&heap->split_page_mutex);
	/*
	 * Someone may have just split a page and returned the unused portion
	 * back to the pool, so try allocating from the pool one more time
	 * before splitting. We want to maintain large pages sizes when
	 * possible.
	 */
	page = alloc_from_secure_pool_order(heap, buffer, 0);
	if (!IS_ERR(page))
		goto got_page;
	for (i = NUM_ORDERS - 2; i >= 0; i--) {
		order = orders[i];
		page = alloc_from_secure_pool_order(heap, buffer, order);
		if (IS_ERR(page))
			continue;
		split_page(page, order);
		break;
	}
	/*
	 * Return the remaining order-0 pages to the pool.
	 * SetPagePrivate flag to mark memory as secure.
	 */
	if (!IS_ERR(page)) {
		for (j = 1; j < (1 << order); j++) {
			SetPagePrivate(page + j);
			free_buffer_page(heap, buffer, page + j, 0);
		}
	}
got_page:
	mutex_unlock(&heap->split_page_mutex);
	return page;
}

page = alloc_from_secure_pool_order(heap, buffer, 0); 從order 數(shù)組0 中分配一個頁,，也就是此時pool中最后的物理頁面。這里的設計思想我猜是如果order[0]都無法分配出來就直接報錯,，下面for 循環(huán)應該是像注釋說的多次嘗試,。split_page 位于

kernel\msm-4.14\mm\page_alloc.c page_alloc.c 存放伙伴系統(tǒng)的核心的接口函數(shù)后面還會用里面的分配內(nèi)存的函數(shù)。split_page函數(shù)沒太看懂內(nèi)核中的實現(xiàn),。split_page_from_secure_pool 從物理頁面分割出來的出來的頁面會在最后放到info中

page = split_page_from_secure_pool(heap, buffer);
	if (!IS_ERR(page)) {
		info->page = page;
		info->order = 0;
		info->from_pool = true;
		INIT_LIST_HEAD(&info->list);
		return info;
	}

回到alloc_from_pool_preferred函數(shù)中繼續(xù)看alloc_from_secure_pool_order 函數(shù)的執(zhí)行

struct page *alloc_from_secure_pool_order(struct ion_system_heap *heap,
					  struct ion_buffer *buffer,
					  unsigned long order)
{
	int vmid = get_secure_vmid(buffer->flags);
	struct ion_page_pool *pool;
	if (!is_secure_vmid_valid(vmid))
		return ERR_PTR(-EINVAL);
	pool = heap->secure_pools[vmid][order_to_index(order)];
	return ion_page_pool_alloc_pool_only(pool);
}

函數(shù)比較簡單主要是根據(jù)order找到對應的pool,，然后調(diào)用

/*
 * Tries to allocate from only the specified Pool and returns NULL otherwise
 */
struct page *ion_page_pool_alloc_pool_only(struct ion_page_pool *pool)
{
	struct page *page = NULL;
	if (!pool)
		return ERR_PTR(-EINVAL);
	if (mutex_trylock(&pool->mutex)) {
		if (pool->high_count)
			page = ion_page_pool_remove(pool, true);
		else if (pool->low_count)
			page = ion_page_pool_remove(pool, false);
		mutex_unlock(&pool->mutex);
	}
	if (!page)
		return ERR_PTR(-ENOMEM);
	return page;
}

函數(shù)從pool中取page。這里分為高端內(nèi)存和低端,，如果是4G內(nèi)存空間那么高端內(nèi)存是指系統(tǒng)使用的3G-4G空間,，這里使用高低內(nèi)存是在從linux 伙伴系統(tǒng)取時候賦值給pool的。

回到ion_system_heap_allocate 的while函數(shù)中,，如果不是從secure pool分配buffer,。那么會調(diào)用alloc_largest_available函數(shù)

static struct page_info *alloc_largest_available(struct ion_system_heap *heap,
						 struct ion_buffer *buffer,
						 unsigned long size,
						 unsigned int max_order)
{
	struct page *page;
	struct page_info *info;
	int i;
	bool from_pool;
	info = kmalloc(sizeof(*info), GFP_KERNEL);
	if (!info)
		return ERR_PTR(-ENOMEM);
	for (i = 0; i < NUM_ORDERS; i++) {
		if (size < order_to_size(orders[i]))
			continue;
		if (max_order < orders[i])
			continue;
		from_pool = !(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC);
		page = alloc_buffer_page(heap, buffer, orders[i], &from_pool);
		if (IS_ERR(page))
			continue;
		info->page = page;
		info->order = orders[i];
		info->from_pool = from_pool;
		INIT_LIST_HEAD(&info->list);
		return info;
	}
	kfree(info);
	return ERR_PTR(-ENOMEM);
}

這里ION_FLAG_POOL_FORCE_ALLOC也判斷了是否需要強制分配如果需要強制分配那么將不會從pool分配。然后調(diào)用alloc_buffer_page函數(shù)

static struct page *alloc_buffer_page(struct ion_system_heap *heap,
				      struct ion_buffer *buffer,
				      unsigned long order,
				      bool *from_pool)
{
	bool cached = ion_buffer_cached(buffer);
	struct page *page;
	struct ion_page_pool *pool;
	int vmid = get_secure_vmid(buffer->flags);
	struct device *dev = heap->heap.priv;
	if (vmid > 0)
		pool = heap->secure_pools[vmid][order_to_index(order)];
	else if (!cached)
		pool = heap->uncached_pools[order_to_index(order)];
	else
		pool = heap->cached_pools[order_to_index(order)];
	page = ion_page_pool_alloc(pool, from_pool);
	if (IS_ERR(page))
		return page;
	if ((MAKE_ION_ALLOC_DMA_READY && vmid <= 0) || !(*from_pool))
		ion_pages_sync_for_device(dev, page, PAGE_SIZE << order,
					  DMA_BIDIRECTIONAL);
	return page;
}

這里根據(jù)從那個pool 中分配獲得了pool 然后調(diào)用了ion_page_pool_alloc函數(shù)同時將pool和是否需要從pool傳遞下去,。

struct page *ion_page_pool_alloc(struct ion_page_pool *pool, bool *from_pool)
{
	struct page *page = NULL;
	BUG_ON(!pool);
	if (fatal_signal_pending(current))
		return ERR_PTR(-EINTR);
	if (*from_pool && mutex_trylock(&pool->mutex)) {
		if (pool->high_count)
			page = ion_page_pool_remove(pool, true);
		else if (pool->low_count)
			page = ion_page_pool_remove(pool, false);
		mutex_unlock(&pool->mutex);
	}
	if (!page) {
		page = ion_page_pool_alloc_pages(pool);
		*from_pool = false;
	}
	if (!page)
		return ERR_PTR(-ENOMEM);
	return page;
}

如果從pool中分配page失敗或者不需要從pool分配那么將會調(diào)用ion_page_pool_alloc_pages函數(shù),。ion_page_pool_alloc_pages實際上是調(diào)用了linux 伙伴系統(tǒng)分配接口

static void *ion_page_pool_alloc_pages(struct ion_page_pool *pool)
{
	struct page *page = alloc_pages(pool->gfp_mask, pool->order);
	return page;
}

回到ion_system_heap_allocate函數(shù)中的while部分

sz = (1 << info->order) * PAGE_SIZE;
		if (info->from_pool) {
			list_add_tail(&info->list, &pages_from_pool);
		} else {
			list_add_tail(&info->list, &pages);
			data.size += sz;
			++nents_sync;
		}
		size_remaining -= sz;
		max_order = info->order;
		i++;

由于分配出來的page都保存到在info中，根據(jù)是否是從pool中分配的會加入到不同的鏈表中,，info中的order 保存的是2的幾次方,，將它乘以物理頁面大小，就會得到這次分配buffer大小,，然后用總的減去這次分配出來的（size_remaining -= sz;）在while后面就是將page加入到page表中,。

這里第一次使用pool中都是沒有page 的都是從linux 伙伴系統(tǒng)中那出來,，pool 存放的page 是在釋放page 的時候保存到里面的,。

回到ion_alloc_fd 函數(shù)，在產(chǎn)生dma-buf 后需要根據(jù)這個dma-buf產(chǎn)生fd調(diào)用

526int dma_buf_fd(struct dma_buf *dmabuf, int flags)
527{
528	int fd;
529
530	if (!dmabuf || !dmabuf->file)
531		return -EINVAL;
532
533	fd = get_unused_fd_flags(flags);
534	if (fd < 0)
535		return fd;
536
537	fd_install(fd, dmabuf->file);
538
539	return fd;

這里調(diào)用了linux 提供的函數(shù) get_unused_fd_flags獲得一個fd號,，然后將dma-buf 的file 和fd綁定,。

這個struct file 的獲取是在前面ion_alloc_dmabuf函數(shù)中，最后在獲取完成buffer后調(diào)用了dma_buf_export函數(shù),，這個函數(shù)

87	file = anon_inode_getfile(bufname, &dma_buf_fops, dmabuf,
488					exp_info->flags);
489	if (IS_ERR(file)) {
490		ret = PTR_ERR(file);
491		goto err_dmabuf;
492	}
493

可以看到申請file 并且綁定了前面說道的dma_buf_ops 這樣實際上通過fd就可以調(diào)用dma_buf_ops,。

2.內(nèi)存釋放

void ion_system_heap_free(struct ion_buffer *buffer)
{
	struct ion_heap *heap = buffer->heap;
	struct ion_system_heap *sys_heap = container_of(heap,
							struct ion_system_heap,
							heap);
	struct sg_table *table = buffer->sg_table;
	struct scatterlist *sg;
	int i;
	int vmid = get_secure_vmid(buffer->flags);
	if (!(buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE) &&
	    !(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) {
		if (vmid < 0)
			ion_heap_buffer_zero(buffer);
	} else if (vmid > 0) {
		if (ion_hyp_unassign_sg(table, &vmid, 1, true, false))
			return;
	}
	for_each_sg(table->sgl, sg, table->nents, i)
		free_buffer_page(sys_heap, buffer, sg_page(sg),
				 get_order(sg->length));
	sg_free_table(table);
	kfree(table);
}

此函數(shù)前面是一些變量的判斷，重點在for_each_sg 將散列表中的物理頁調(diào)用free_buffer_page 函數(shù)釋放,。

/*
 * For secure pages that need to be freed and not added back to the pool; the
 *  hyp_unassign should be called before calling this function
 */
void free_buffer_page(struct ion_system_heap *heap,
		      struct ion_buffer *buffer, struct page *page,
		      unsigned int order)
{
	bool cached = ion_buffer_cached(buffer);
	int vmid = get_secure_vmid(buffer->flags);
	if (!(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) {
		struct ion_page_pool *pool;
		if (vmid > 0)
			pool = heap->secure_pools[vmid][order_to_index(order)];
		else if (cached)
			pool = heap->cached_pools[order_to_index(order)];
		else
			pool = heap->uncached_pools[order_to_index(order)];
		if (buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE)
			ion_page_pool_free_immediate(pool, page);
		else
			ion_page_pool_free(pool, page);
	} else {
		__free_pages(page, order);
	}
}

獲得對應的pool然后調(diào)用了


void ion_page_pool_free(struct ion_page_pool *pool, struct page *page)
{
	int ret;
	ret = ion_page_pool_add(pool, page);
	if (ret)
		ion_page_pool_free_pages(pool, page);
}

這是將page保存到了pool中,，但是如果系統(tǒng)內(nèi)存不夠此時需要ion中的heap 將pool存放的page 還給伙伴系統(tǒng),。執(zhí)行這個回收過程的是shrink函數(shù)

static int ion_system_heap_shrink(struct ion_heap *heap, gfp_t gfp_mask,
				 int nr_to_scan)
{
	struct ion_system_heap *sys_heap;
	int nr_total = 0;
	int i, j, nr_freed = 0;
	int only_scan = 0;
	struct ion_page_pool *pool;
	sys_heap = container_of(heap, struct ion_system_heap, heap);
	if (!nr_to_scan)
		only_scan = 1;
	for (i = 0; i < NUM_ORDERS; i++) {
		nr_freed = 0;
		for (j = 0; j < VMID_LAST; j++) {
			if (is_secure_vmid_valid(j))
				nr_freed += ion_secure_page_pool_shrink(
						sys_heap, j, i, nr_to_scan);
		}
		pool = sys_heap->uncached_pools[i];
		nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan);
		pool = sys_heap->cached_pools[i];
		nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan);
		nr_total += nr_freed;
		if (!only_scan) {
			nr_to_scan -= nr_freed;
			/* shrink completed */
			if (nr_to_scan <= 0)
				break;
		}
	}
	return nr_total;
}

函數(shù)頁比較簡單，除了一些數(shù)據(jù)統(tǒng)計,，最重要的就是調(diào)用ion_page_pool_shrink函數(shù),，函數(shù)里面原理就是從pool中取page，然后調(diào)用

static void ion_page_pool_free_pages(struct ion_page_pool *pool,
				     struct page *page)
{
	__free_pages(page, pool->order);
}

__free_pages 函數(shù)又是Linux 伙伴系統(tǒng)接口,，位于kernel\msm-4.14\mm\page_alloc.c

system heap的內(nèi)存映射是在dma-buf 的ops中調(diào)用ion_heap_map_user 函數(shù),，此函數(shù)有個非常重要的參數(shù)struct vm_area_struct，它是進程虛擬內(nèi)存管理的,，其中有一些比較重要的變量,，理解了這些變量的含義，理解下邊的代碼就非常簡單了,，首先看此結(jié)構體的定義,，代碼位于kernel\msm-4.14\include\linux\mm_types.h

/*
 * This struct defines a memory VMM memory area. There is one of these
 * per VM-area/task.  A VM area is any part of the process virtual memory
 * space that has a special rule for the page-fault handlers (ie a shared
 * library, the executable area etc).
 */
struct vm_area_struct {
	/* The first cache line has the info for VMA tree walking. */
	unsigned long vm_start;		/* Our start address within vm_mm. */
	unsigned long vm_end;		/* The first byte after our end address
					   within vm_mm. */
	/* linked list of VM areas per task, sorted by address */
	struct vm_area_struct *vm_next, *vm_prev;
	struct rb_node vm_rb;
	/*
	 * Largest free memory gap in bytes to the left of this VMA.
	 * Either between this VMA and vma->vm_prev, or between one of the
	 * VMAs below us in the VMA rbtree and its ->vm_prev. This helps
	 * get_unmapped_area find a free area of the right size.
	 */
	unsigned long rb_subtree_gap;
	/* Second cache line starts here. */
	struct mm_struct *vm_mm;	/* The address space we belong to. */
	pgprot_t vm_page_prot;		/* Access permissions of this VMA. */
	unsigned long vm_flags;		/* Flags, see mm.h. */
	/*
	 * For areas with an address space and backing store,
	 * linkage into the address_space->i_mmap interval tree.
	 *
	 * For private anonymous mappings, a pointer to a null terminated string
	 * in the user process containing the name given to the vma, or NULL
	 * if unnamed.
	 */
	union {
		struct {
			struct rb_node rb;
			unsigned long rb_subtree_last;
		} shared;
		const char __user *anon_name;
	};
	/*
	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
	 * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
	 * or brk vma (with NULL file) can only be in an anon_vma list.
	 */
	struct list_head anon_vma_chain; /* Serialized by mmap_sem &
					  * page_table_lock */
	struct anon_vma *anon_vma;	/* Serialized by page_table_lock */
	/* Function pointers to deal with this struct. */
	const struct vm_operations_struct *vm_ops;
	/* Information about our backing store: */
	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
					   units */
	struct file * vm_file;		/* File we map to (can be NULL). */
	void * vm_private_data;		/* was vm_pte (shared mem) */
	atomic_long_t swap_readahead_info;
#ifndef CONFIG_MMU
	struct vm_region *vm_region;	/* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
#endif
	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
	seqcount_t vm_sequence;
	atomic_t vm_ref_count;		/* see vma_get(), vma_put() */
#endif
} __randomize_layout;

該結(jié)構體體作用可以參考https://linux-kernel-labs./master/labs/memory_mapping.html 文章，在用戶進程調(diào)用mmap函數(shù)時候會創(chuàng)建這個結(jié)構,。它描述的是物理頁對應的虛擬內(nèi)存,，它描述的是一段連續(xù)的、具有相同訪問屬性的虛存空間,，該虛存空間的大小為物理內(nèi)存頁面的整數(shù)倍,，結(jié)構體中每個成員的含義可以參考文章https://blog.csdn.net/ganggexiongqi/article/details/6746248

vm_start 是在進程中虛擬地址的起始地址。

int ion_heap_map_user(struct ion_heap *heap, struct ion_buffer *buffer,
              struct vm_area_struct *vma)
{
    struct sg_table *table = buffer->sg_table;
    unsigned long addr = vma->vm_start;
    unsigned long offset = vma->vm_pgoff * PAGE_SIZE;
    struct scatterlist *sg;
    int i;
    int ret;
    for_each_sg(table->sgl, sg, table->nents, i) {
        struct page *page = sg_page(sg);
        unsigned long remainder = vma->vm_end - addr;
        unsigned long len = sg->length;
        if (offset >= sg->length) {
            offset -= sg->length;
            continue;
        } else if (offset) {
            page += offset / PAGE_SIZE;
            len = sg->length - offset;
            offset = 0;
        }
        len = min(len, remainder);
        ret = remap_pfn_range(vma, addr, page_to_pfn(page), len,
                      vma->vm_page_prot);
        if (ret)
            return ret;
        addr += len;
        if (addr >= vma->vm_end)
            return 0;
    }
    return 0;
}

回到代碼中addr = vma->vm_start 保存了虛擬地址的其實地址,，vm_pgoff是該虛存空間起始地址在vm_file文件里面的文件偏移,，單位為物理頁面。比如現(xiàn)在有64個物理頁面,，用戶在映射的時候使用第5個頁面開始映射10個頁面,，那么這個vm_pgoff應該就是5.for_each_sg 代碼主要是將sg散列表中存放的物理頁面拿出來進行映射，首先看offset >= sg->length 這句代碼,，為什么要判斷,，如果offset 是便宜6個物理頁面，當時這個sg只存放了5個物理頁面,，現(xiàn)在我們正?？隙ㄊ窃谙乱粋€sg中在取一個頁面構成，6個頁面,，所以

下面相關代碼就是做這部分功能

      if (offset >= sg->length) {
87			offset -= sg->length;
88			continue;
89		} else if (offset) {
90			page += offset / PAGE_SIZE;
91			len = sg->length - offset;
92			offset = 0;
93		}

我們假設下一個sg有三個物理頁面,，那么我們只需要在這個sg上page +1 就可以。現(xiàn)在offset就是1,，在if 執(zhí)行過程中 offset -= sg->length,，這里其實已經(jīng)6-5了。 len 變量就變成了3 -1 變成了2 個,。offfset 因為后面不在需要所以設置為0,，我們需要將這兩個進行映射,，所以下面調(diào)用了linux 內(nèi)核的remap_pfn_range的函數(shù)，此函數(shù)網(wǎng)上資料很多,。映射到用戶函數(shù)這里也就執(zhí)行完成了