Python Multiprocessing Pool Share Read Only Data

You are using an out of date browser. Information technology may not display this or other websites correctly.
You should upgrade or use an alternative browser.

a huge shared read-simply data in parallel accesses -- How?multithreading? multiprocessing?

  • Thread starter Valery
  • Start date

Valery

  • #i

Howdy all,

Q: how to organize parallel accesses to a huge mutual read-just Python
data structure?

Details:

I take a huge information construction that takes >fifty% of RAM.
My goal is to have many computational threads (or processes) that tin can
have an efficient read-access to the huge and complex information structure.

"Efficient" in detail means "without serialization" and "without
unneeded lockings on read-but data"

To what I run into, in that location are following strategies:

1. multi-processing
=> a. child-processes get their ain *copies* of huge data structure
-- bad and non possible at all in my case;
=> b. child-processes often communicate with the parent process via
some IPC -- bad (serialization);
=> c. child-processes admission the huge structure via some shared
memory approach -- feasible without serialization?! (copy-on-write is
not working here well in CPython/Linux!!);

2. multi-threading
=> d. CPython is told to take issues hither because of GIL -- any
comments?
=> due east. GIL-less implementations accept their own issues -- any hot
recommendations?

I am a large fan of parallel map() arroyo -- either
multiprocessing.Pool.map or fifty-fifty better pprocess.pmap. Yet this
doesn't work straight-forward anymore, when "huge data" ways >50%
RAM
;-)

Comments and ideas are highly welcome!!

Here is the workbench example of my instance:

######################
import time
from multiprocessing import Pool
def f(_):
time.sleep(5) # only to emulate the time used past my
computation
res = sum(parent_x) # my sofisticated formula goes here
return res

if __name__ == '__main__':
parent_x = [1./i for i in xrange(i,10000000)]# my huge read-
only data :eek:)
p = Pool(vii)
res= listing(p.map(f, xrange(ten)))
# switch to ps and see how fast your free memory is getting
wasted...
impress res
######################

Kind regards
Valery

Advertisements

Klauss

  • #ii

Hello all,

Q: how to organize parallel accesses to a huge common read-only Python
information structure?

Details:

I accept a huge data structure that takes >50% of RAM.
My goal is to have many computational threads (or processes) that can
have an efficient read-access to the huge and complex data structure.

<snip>

1. multi-processing
=> a. child-processes go their own *copies* of huge data construction
-- bad and not possible at all in my case;

How's the layout of your data, in terms # of objects vs. bytes used?
Simply to accept an idea of the overhead involved in refcount
externalization (you know, what I mentioned hither:
http://groups.google.com/group/unladen-swallow/browse_thread/thread/9d2af1ac3628dc24
)

Valery

  • #3

Hi Klauss,

How's the layout of your data, in terms # of objects vs. bytes used?

dict (or listing) of 10K-100K objects. The objects are lists or dicts.
The whole structure eats upward to 2+ Gb RAM

Just to have an idea of the overhead involved in refcount
externalization (y'all know, what I mentioned here:http://groups.google.com/group/unladen-swallow/browse_thread/thread/9...
)

yep, I've understood the idea explained past you there.

regards,
Valery

Emile van Sebille

  • #4

On 12/9/2009 6:58 AM Valery said...

Hi all,

Q: how to organize parallel accesses to a huge mutual read-simply Python
information structure?

I take such a structure which I buried in a zope process which keeps information technology
in retentiveness and is accessed through http requests. This was washed about 8
years ago, and I think today I'd check out pyro.

Emile

Aaron Watters

  • #5

Hi all,

Q: how to organize parallel accesses to a huge common read-only Python
data construction?

Apply a BTree on disk in a file. A good file organisation will keep nigh of
the
pages y'all need in RAM whenever the data is "warm". This works
for Python or any other programming linguistic communication. Mostly you can
e'er get to any piece of data in about iv seeks at nearly anyhow,
then if your deejay is fast your app will be fast too. The file can
exist accessed concurrently without issues past any number of processes
or threads.

-- Aaron Watters
http://listtree.appspot.com
http://whiffdoc.appspot.com
===
less is more

Antoine Pitrou

  • #6

Le Wednesday, 09 Dec 2009 06:58:eleven -0800, Valery a écrit :

I have a huge information construction that takes >50% of RAM. My goal is to have
many computational threads (or processes) that can have an efficient
read-access to the huge and complex information structure.

"Efficient" in particular means "without serialization" and "without
unneeded lockings on read-only data"

I was going to suggest memcached but it probably serializes non-diminutive
types. It doesn't hateful it volition be dull, though. Serialization implemented
in C may well exist faster than whatever "smart" non-serializing scheme
implemented in Python.

two. multi-threading
=> d. CPython is told to have problems here considering of GIL -- any
comments?

What exercise y'all phone call "problems because of the GIL"? Information technology is quite a vague
statement, and an answer would depend on your Os, the number of threads
you lot're willing to run, and whether you want to extract throughput from
multiple threads or are just concerned most latency.

In any case, y'all have to practice some homework and compare the various
approaches on your own data, and decide whether the numbers are
satisfying to you.

I am a large fan of parallel map() approach

I don't encounter what map() has to practise with accessing data. map() is for
*processing* of information. In other words, whether or not you apply a map()-like
archaic does not say anything well-nigh how the underlying data should be
accessed.

Advertisements

Klauss

  • #7

I was going to suggest memcached but it probably serializes not-diminutive
types.

Diminutive too.
memcached communicates through sockets[3] (admitting perchance unix
sockets, which are faster than TCP ones).

multiprocessing has shared retentivity schemes, but does a lot of internal
copying (uses ctypes)... and are particularly unhelpful when your
shared data is highly structured, since y'all tin can't share objects, only
archaic types.

I finished a patch that pushes reference counters into packed pools.
It has lots of drawbacks, but manages to solve this particular
trouble, if the information is prominently non-numeric (ie: lists and dicts,
as mentioned before). Of the drawbacks, mayhap the bigger is a bigger
memory footprint - yep... I don't believe in that location'south annihilation that can be
done to change that. It tin exist optimized, to brand the overhead a
little less though.

This exam lawmaking[1] consumes roughly 2G of RAM on an x86_64 with python
two.6.1, with the patch, information technology *should* apply two.3G of RAM (every bit specified by
its output), then you tin can run into the footprint overhead... but better page
sharing makes it consume about 6 times less - roughly 400M... which is
the size of the dataset. Ie: about-optimal information sharing.

This patch[2] has other optimizations intermingled - if there's
interest in the patch without those (which are both unproven and
nonportable) I could endeavor to separate them. I volition have to, anyhow, to
upload for inclusion into CPython (if I manage to fix the
shortcomings, and if information technology gets approved).

The most of import shortcomings of the refcount patch are:
1) Tripled retentivity overhead of reference counting. Earlier, it was a
single Py_ssize_t per object. Now, it'southward 2 pointers plus the
Py_ssize_t. This could perhaps be optimized (by getting rid of the
arena pointer, for case).
ii) Increased code output for Py_INCREF/DECREF. It's small-scale, only it
adds upward to a lot. Timings on test_decimal.py (a small numeric
benchmark I use, which might not be representative at all) shows a x%
performance loss in CPU time. Again, this might exist optimized with a
lot of work and creativity.
3) Breaks binary compatibility, and in weird cases source
compatibility with extension modules. PyObject layout is different, so
statically-initialized variables demand to stick to using CPython's
macros (I've seen cases when they don't), and code should apply Py_REFCNT
() for accessing the refcount, just many simply practice ob->ob_refcnt, which
volition break with the patch.
4) I'k also not actually sure (oasis't tested) what happens when
CPython runs out of memory - I tried real hard not to segfault, fifty-fifty
recover nicely, just you know how hard that is...

[iii] http://code.google.com/p/memcached/...mpare_to_a_server_local_cache?_(PHP's_APC,_mm
[2] http://www.deeplayer.com/claudio/misc/Python-2.6.1-refcount.patch
[1] test code beneath

import fourth dimension
from multiprocessing import Pool

def usoMemoria():
import os
import subprocess
pid = os.getpid()
cmd = "ps -o vsz=,rss=,share= -p %s --ppid %s" % (pid,pid)
p = subprocess.Popen(cmd.carve up(), stdout=subprocess.PIPE)
info = p.stdout.readlines()
s = sum( int(r) for v,r,s in map(str.split,map(str.strip, info)) )
return south

def f(_):
return sum(int(x) for d in huge_global_data for x in d if ten !=
"land") # my sofisticated formula goes here

if __name__ == '__main__':
huge_global_data = []
for i in xrange(500000):
d = {}
d[str(i)] = str(i*x)
d[str(i+1)] = str(i)
d["country"] = three
huge_global_data.append(d)

p = Puddle(vii)
res= listing(p.map(f, xrange(20)))

impress "%.2fM" % (usoMemoria() / 1024.0)

Valery

  • #eight

Hi Antoine

I was going to suggest memcached simply it probably serializes non-atomic
types. It doesn't mean information technology volition be slow, though. Serialization implemented
in C may well be faster than whatsoever "smart" non-serializing scheme
implemented in Python.

No serializing could exist faster than NO serializing at all :)

If child process could direct read the parent RAM -- what could be
meliorate?

What exercise you call "bug considering of the GIL"? It is quite a vague
statement, and an reply would depend on your Bone, the number of threads
you're willing to run, and whether you want to extract throughput from
multiple threads or are just concerned about latency.

it seems to be a known fact, that simply 1 CPython iterpreter will be
running at a time, considering a thread is aquiring the GIL during the
execution and other threads within same process are then just waiting
for GIL to be released.

In whatsoever case, you have to exercise some homework and compare the various
approaches on your ain data, and decide whether the numbers are
satisfying to you.

well, I the to the lowest degree evil is to pack-unpack things into array.array and/
or similarly NumPy.

I exercise hope that Klauss' patch volition exist accepted, because it will allow me
to forget a lot of those unneeded packing-unpacking.

I don't come across what map() has to practise with accessing data. map() is for
*processing* of data. In other words, whether or non you utilise a map()-like
primitive does not say anything about how the underlying information should be
accessed.

correct. However, saying "a big fan" has had another focus hither: if you
write your code based on maps then you take a tiny endeavor to convert
your code into a MULTIprocessing ane :)

merely that.

Kind regards.
Valery

garyrob

  • #9

One thing I'grand not clear on regarding Klauss' patch. He says information technology'southward
applicative where the data is primarily non-numeric. In trying to
sympathize why that would be the example, I'yard thinking that the increased
per-object retentivity overhead for reference-counting would outweigh the
infinite gains from the shared memory.

Klauss's exam code stores a large number of dictionaries which each
comprise merely 3 items. The stored items are strings, but brusk ones...
it looks like they take up less infinite than double floats(?).

So my understanding is that the point is that the overhead for the
dictionaries is large plenty that the patch is very helpful even though
the stored items are small. And that the patch would exist less and less
effective as the number of items stored in each lexicon became
greater and greater, until eventually the patch might do more than use more than
infinite for reference counting than it saved by shared memory.

Is this understanding right? (I'm hoping not, because for some
applications, I'd like to be able to use it for large dictionaries
containing lots of numbers.)

Thanks,
Gary

Advertisements

Klauss

  • #10

Ane thing I'm non articulate on regarding Klauss' patch. He says it's
applicable where the data is primarily non-numeric. In trying to
understand why that would be the example, I'thousand thinking that the increased
per-object memory overhead for reference-counting would outweigh the
space gains from the shared memory.

Klauss'southward test code stores a large number of dictionaries which each
comprise just 3 items. The stored items are strings, just brusk ones...
it looks like they take upwards less infinite than double floats(?).

So my understanding is that the signal is that the overhead for the
dictionaries is large enough that the patch is very helpful even though
the stored items are small. And that the patch would be less and less
constructive as the number of items stored in each dictionary became
greater and greater, until eventually the patch might do more employ more than
space for reference counting than it saved by shared retentiveness.

Not really.
The existent difference is that numbers (ints and floats) are allocated
out of small-scale contiguous pools. And then fifty-fifty if a smashing percentage of those
objects would remain read-only, there'due south probably holes in those pools
left by the irregular access pattern during initialization, and those
holes would be written to eventually as the puddle gets used.

In essence, those pools aren't read-only for other reasons than
reference counting.

Dictionaries, tuples and lists (and many other types) don't showroom
that behavior.


Want to answer to this thread or ask your own question?

You'll need to choose a username for the site, which just take a couple of moments. Later that, you lot can post your question and our members will help you out.

Ask a Question

giguerelossing82.blogspot.com

Source: https://www.thecodingforums.com/threads/a-huge-shared-read-only-data-in-parallel-accesses-how-multithreading-multiprocessing.708125/

0 Response to "Python Multiprocessing Pool Share Read Only Data"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel