Python
multi processing, threading, multi threading
DS-Lee
2020. 9. 11. 20:33
Multiprocessing
Pros
- Separate memory space
- Code is usually straightforward
- Takes advantage of multiple CPUs & cores
- Avoids GIL limitations for cPython
- Eliminates most needs for synchronization primitives unless if you use shared memory (instead, it's more of a communication model for IPC)
- Child processes are interruptible/killable
- Python multiprocessing module includes useful abstractions with an interface much like threading.Thread
- A must with cPython for CPU-bound processing
- Multiprocessing achieves true parallelism and is used for CPU-bound tasks
- - Multithreading cannot achieve this because the GIL prevents threads from running in parallel.
- - Multithreading is concurrent and is used for IO-bound tasks
Cons
- IPC a little more complicated with more overhead (communication model vs. shared memory/objects)
- Larger memory footprint
Threading
Pros
- Lightweight - low memory footprint
- Shared memory - makes access to state from another context easier
- Allows you to easily make responsive UIs
- cPython C extension modules that properly release the GIL will run in parallel
- Great option for I/O-bound applications
Cons
- cPython - subject to the GIL
- Not interruptible/killable
- If not following a command queue/message pump model (using the Queue module), then manual use of synchronization primitives become a necessity (decisions are needed for the granularity of locking)
- Code is usually harder to understand and to get right - the potential for race conditions increases dramatically
[NOTE] GIL이 적용되는 것은 cpu 동작에서이고 쓰레드가 cpu 동작을 마치고 I/O 작업을 실행하는 동안에는 다른 쓰레드가 cpu 동작을 동시에 실행할 수 있다. 따라서 cpu 동작이 많지 않고 I/O동작이 더 많은 프로그램에서는 멀티 쓰레드만으로 성능적으로 큰 효과를 얻을 수 있다.
reference: https://monkey3199.github.io/develop/python/2018/12/04/python-pararrel.html
Template Code
Multi-processing
def add(queue, num1, num2):
result = num1 + num2
queue.put(result)
if __name__ == '__main__': # multi-processing must be used under this
from multiprocessing import Process, Queue
num_sets = ((1, 1), (2, 2), (3, 3), (4, 4))
queue = Queue()
procs = []
for num_set in num_sets:
proc = Process(target=add, args=(queue, num_set[0], num_set[0], )) # make sure to put , (comma) at the end
procs.append(proc)
proc.start()
for p in procs:
p.join() # make each process wait until all the other process ends.
# check results in the queue
print(queue.qsize())
for i in range(queue.qsize()):
print(queue.get())
Single threading
from threading import Thread
import time
def logger(result, fname, delay):
time.sleep(delay)
with open(fname, 'w') as f:
f.write('{}\n'.format(result))
print('* logging is finished.')
num1 = 1
num2 = 2
result = num1 + num2
# log with a thread
thd = Thread(target=logger, args=(result, 'C:/temp01/logger.txt', 5))
thd.start()
# while logging with the thread, the script goes on.
print(result)
Multi threading
from threading import Thread
import time
def logger(result, fname, delay):
time.sleep(delay)
with open(fname, 'w') as f:
f.write('{}\n'.format(result))
print('* logging is finished.')
num1 = 1
num2 = 2
result = num1 + num2
# log with a thread
thd1 = Thread(target=logger, args=(result, 'C:/temp01/logger1.txt', 3))
thd2 = Thread(target=logger, args=(result, 'C:/temp01/logger2.txt', 5))
thd1.start()
thd2.start()
# If you wanna make sure all the threads end at the same time. Use .join()
# while logging with the thread, the script goes on.
print(result)
Multi threading: feching thousands of images from a website
Get+imgs+on+the+internet+with+multi-threading.ipynb
0.00MB