Python

multi processing, threading, multi threading

DS-Lee 2020. 9. 11. 20:33

Multiprocessing

Pros

  • Separate memory space
  • Code is usually straightforward
  • Takes advantage of multiple CPUs & cores
  • Avoids GIL limitations for cPython
  • Eliminates most needs for synchronization primitives unless if you use shared memory (instead, it's more of a communication model for IPC)
  • Child processes are interruptible/killable
  • Python multiprocessing module includes useful abstractions with an interface much like threading.Thread
  • A must with cPython for CPU-bound processing
  • Multiprocessing achieves true parallelism and is used for CPU-bound tasks
  • - Multithreading cannot achieve this because the GIL prevents threads from running in parallel.
  • - Multithreading is concurrent and is used for IO-bound tasks

Cons

  • IPC a little more complicated with more overhead (communication model vs. shared memory/objects)
  • Larger memory footprint

 

Threading

Pros

  • Lightweight - low memory footprint
  • Shared memory - makes access to state from another context easier
  • Allows you to easily make responsive UIs
  • cPython C extension modules that properly release the GIL will run in parallel
  • Great option for I/O-bound applications

Cons

  • cPython - subject to the GIL
  • Not interruptible/killable
  • If not following a command queue/message pump model (using the Queue module), then manual use of synchronization primitives become a necessity (decisions are needed for the granularity of locking)
  • Code is usually harder to understand and to get right - the potential for race conditions increases dramatically

[NOTE] GIL이 적용되는 것은 cpu 동작에서이고 쓰레드가 cpu 동작을 마치고 I/O 작업을 실행하는 동안에는 다른 쓰레드가 cpu 동작을 동시에 실행할 수 있다. 따라서 cpu 동작이 많지 않고 I/O동작이 더 많은 프로그램에서는 멀티 쓰레드만으로 성능적으로 큰 효과를 얻을 수 있다.

reference: https://monkey3199.github.io/develop/python/2018/12/04/python-pararrel.html


Template Code

Multi-processing

def add(queue, num1, num2):
    result = num1 + num2
    queue.put(result)


if __name__ == '__main__':  # multi-processing must be used under this
    from multiprocessing import Process, Queue

    num_sets = ((1, 1), (2, 2), (3, 3), (4, 4))

    queue = Queue()
    procs = []
    for num_set in num_sets:
        proc = Process(target=add, args=(queue, num_set[0], num_set[0], ))  # make sure to put , (comma) at the end
        procs.append(proc)
        proc.start()

    for p in procs:
        p.join()  # make each process wait until all the other process ends.

    # check results in the queue
    print(queue.qsize())
    for i in range(queue.qsize()):
        print(queue.get())

Single threading

from threading import Thread
import time

def logger(result, fname, delay):
    time.sleep(delay)
    with open(fname, 'w') as f:
        f.write('{}\n'.format(result))
    print('* logging is finished.')


num1 = 1
num2 = 2
result = num1 + num2

# log with a thread
thd = Thread(target=logger, args=(result, 'C:/temp01/logger.txt', 5))
thd.start()

# while logging with the thread, the script goes on.
print(result)

Multi threading

from threading import Thread
import time

def logger(result, fname, delay):
    time.sleep(delay)
    with open(fname, 'w') as f:
        f.write('{}\n'.format(result))
    print('* logging is finished.')


num1 = 1
num2 = 2
result = num1 + num2

# log with a thread
thd1 = Thread(target=logger, args=(result, 'C:/temp01/logger1.txt', 3))
thd2 = Thread(target=logger, args=(result, 'C:/temp01/logger2.txt', 5))
thd1.start()
thd2.start()

# If you wanna make sure all the threads end at the same time. Use .join()

# while logging with the thread, the script goes on.
print(result)

 

Multi threading: feching thousands of images from a website

Get+imgs+on+the+internet+with+multi-threading.ipynb
0.00MB
single threading
multi threading (#threads: 16)