> For the complete documentation index, see [llms.txt](https://zpliu.gitbook.io/booknote/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://zpliu.gitbook.io/booknote/python/duo-jin-cheng.md).

# 简单实现python多进程

在处理特别大的数据的时候，尤其是多重循环。例如对2万个基因进行一个计算，每个基因的计算需要进行1亿次比较。仅仅使用一个主进程进行计算就显得十分吃力；于是学习了python多进程的的处理，极大的缩短了脚本的运行时间。

![小猪](https://s1.ax1x.com/2020/06/22/NJ9bMd.png)

* 首先根据进程数，来分配任务
* 将所有的进程加入进程池
* 启动多个进程任务时，阻塞当前主进程
* 待多进程任务完成后，在主进程中将结果输出

## 分配任务

* `ProcessNum`从命令行从获取进程数
* 根据进程数，平均分配任务给每个进程；最后一个进程负责除不尽的任务
* `p.apply_async`给每个进程指定调用的函数和参数
* `p.close()`所有进程任务指定完毕，开始执行进程任务
* `p.join()`阻塞主进程，等待子进程任务完成

```python
import multiprocessing
with open(args.AS, 'r') as File:
    data = File.readlines()
    ProcessNum = int(args.p)
    average = int(len(data)/ProcessNum)
    p = multiprocessing.Pool(16)
    for processId in range(0, ProcessNum):
        if processId == ProcessNum-1:
            start = processId*average
            end = len(data)
        else:
            start = processId*average
            end = (processId+1)*average
        out.append(p.apply_async(mulProcessPSI,
                                 (data[start:end], processId+1))) ##这里传给子进程函数参数时候，要用逗号
    p.close()
    p.join()
```

## 获取进程结果

* `out`数组中存着每个进程计算的结果
* `get()`对每个进程调用get方法获得结果

```python
with open(args.o, 'w') as File:
    for result in out:
        for item in result.get():
            File.write(item)
```

![运行结果](https://s1.ax1x.com/2020/06/22/NJA8P0.png)

## 仍需学习的

* 多个进程进行文件写入时，涉及谁先写，谁后写的问题；需要使用到文件锁
* 进程间通信问题
* 内存溢出问题，多个进程的结果合并后太大了

## 参考

1. <https://zhuanlan.zhihu.com/p/64702600>
2. <https://www.cnblogs.com/jiangfan95/p/11439207.html>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zpliu.gitbook.io/booknote/python/duo-jin-cheng.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
