python >> Inefficient summing

by Bruno Desthuilliers » Thu, 09 Oct 2008 02:35:38 GMT

beginner a rit :
> Hi All,
>
> I have a list of records like below:
>
> rec=[{"F1":1, "F2":2}, {"F1":3, "F2":4} ]
>
> Now I want to write code to find out the ratio of the sums of the two
> fields.
>
> One thing I can do is:
>
> sum(r["F1"] for r in rec)/sum(r["F2"] for r in rec)
>
> But this is slow because I have to iterate through the list twice.
> Also, in the case where rec is an iterator, it does not work.
>
> I can also do this:
>
> sum1, sum2= reduce(lambda x, y: (x[0]+y[0], x[1]+y[1]), ((r["F1"],
> r["F2"]) for r in rec))
> sum1/sum2
>
> This loops through the list only once, and is probably more efficient,
> but it is less readable.
>
> I can of course use an old-fashioned loop. This is more readable, but
> also more verbose.
>
> What is the best way, I wonder?

The best way is the one that give you the correct answer while being
fast enough - for a definition of 'fast enough' that is highly
project-specific - and still readable - for a definition of 'readable'
that may be somehow subjective, but clearly favours a plain for loop
over your 'I can also do this' snippet.



python >> Inefficient summing

by beginner » Thu, 09 Oct 2008 04:23:08 GMT


Hi All,

I have a list of records like below:

rec=[{"F1":1, "F2":2}, {"F1":3, "F2":4} ]

Now I want to write code to find out the ratio of the sums of the two
fields.

One thing I can do is:

sum(r["F1"] for r in rec)/sum(r["F2"] for r in rec)

But this is slow because I have to iterate through the list twice.
Also, in the case where rec is an iterator, it does not work.

I can also do this:

sum1, sum2= reduce(lambda x, y: (x[0]+y[0], x[1]+y[1]), ((r["F1"],
r["F2"]) for r in rec))
sum1/sum2

This loops through the list only once, and is probably more efficient,
but it is less readable.

I can of course use an old-fashioned loop. This is more readable, but
also more verbose.

What is the best way, I wonder?


-a new python programmer



python >> Inefficient summing

by Chris Rebert » Thu, 09 Oct 2008 04:47:05 GMT

I personally would probably do:

from collections import defaultdict

label2sum = defaultdict(lambda: 0)
for r in rec:
for key, value in r.iteritems():
label2sum[key] += value

ratio = label2sum["F1"] / label2sum["F2"]

This iterates through each 'r' only once, and (imho) is pretty
readable provided you know how defaultdicts work. Not everything has
to unnecessarily be made a one-liner. Coding is about readability
first, optimization second. And optimized code should not be
abbreviated, which would make it even harder to understand.

I probably would have gone with your second solution if performance
was no object.

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com




Inefficient summing

by bearophileHUGS » Thu, 09 Oct 2008 05:22:41 GMT

beginner:

In such situation the old loop seems the best solution. Short code is
good only when it doesn't make the code too much slow/difficult to
understand. Keeping the code quite readable is very important. So I
think a simple solution is the best in this situation. The following
code can be understood quickly:

records = [{"F1": 1, "F2": 2}, {"F1": 3, "F2": 4}]

f1sum, f2sum = 0, 0
for rec in records:
f1sum += rec["F1"]
f2sum += rec["F2"]
ratio = f1sum / float(f2sum)
print ratio

Output:
0.666666666667

Note that I allowed myself to use this line of code:
f1sum, f2sum = 0, 0
because the two values on the right are equal, so you don't need one
bit of brain to understand where each value goes :-)

You can of course generalize the code in various ways, for example:

keys = ["F1", "F2"]
totals = [0] * len(keys)

for rec in records:
for i, key in enumerate(keys):
totals[i] += rec[key]

ratio = totals[0] / float(totals[1])
print ratio

But that already smells of over-engineering. Generally it's better to
use the simpler solution that works in all your cases at a speed that
is acceptable for you (my variant of the KISS principle).

Bye,
bearophile


Inefficient summing

by Matt Nordhoff » Thu, 09 Oct 2008 15:40:18 GMT




FWIW, you can just use:

label2sum = defaultdict(int)

You don't need a lambda.

--


Inefficient summing

by bieffe62 » Thu, 09 Oct 2008 16:30:36 GMT




The loop way is probably the right choice.
OTHA, you could try to make more readable the 'reduce' approach,
writing it like this:

def add_r( sums, r ): return sums[0]+r['F1'], sums[1]+r['F2']
sum_f1, sum_f2 = reduce( add_r, rec, (0,0) )
result = sum_f1/sum_f2

Less verbose than the for loop, but IMO almost as understandable : one
only needs to know the semantic
of 'reduce' (which for a python programmer is not big thing) and most
important the code does only one thing per line.


Ciao
-----
FB


Inefficient summing

by bearophileHUGS » Thu, 09 Oct 2008 18:58:59 GMT

FB:

Until this feature vanishes I think it's better to use it (untested):

add_r = lambda (a, b), r: (a + r['F1'], b + r['F2'])

Bye,
bearophile


Inefficient summing

by Terry Reedy » Fri, 10 Oct 2008 02:47:29 GMT





Indeed, in this case, with two known keys, the defaultdict is not needed
either, since the following should work as well to initialize

label2sum = {'F1':0,'F2':0}




Inefficient summing

by Alexander Schmolck » Fri, 10 Oct 2008 04:53:52 GMT

beginner < XXXX@XXXXX.COM > writes:


how about:


ratio = (lambda c: c.real/c.imag)(sum(complex(r["F1"], r["F2"] for r in rec)))

?

:)



Inefficient summing

by beginner » Fri, 10 Oct 2008 07:00:46 GMT




Neat, but I will have a problem if I am dealing with three fields,
right?


Inefficient summing

by Alexander Schmolck » Fri, 10 Oct 2008 16:43:33 GMT

beginner < XXXX@XXXXX.COM > writes:



Sure but then how often do you want to take the ratio of 3 numbers? :)

More seriously if you often find yourself doing similar operations and are
(legimately) worried about performance, numpy and pytables might be worth a
look. By "legitimately" I mean that I wouldn't be bothered by iterating twice
over rec; it doesn't affect the algorithmic complexity at all and I woudn't be
surprised if sum(imap(itemgetter("F1"),rec))/sum(imap(itemgetter("F2"),rec))
weren't faster than the explicit loop version for the cases you care about
(timeit will tell you). You're right that you loose some generality in not
being able to deal with arbitrary iterables in that case though.

'as


Similar Threads

1. Decorator for validation - inefficient?

I want my business objects to be able to do this:
class Person(base):

    def __init__(self):
        self.name = None

    @base.validator
    def validate_name(self):
        if not self.name: return ['Name cannot be empty']

p = Person()
print p.invalid  # Prints ['Name cannot be empty']
p.name = 'foo'
print p.invalid  # Prints []
print bool(p.invalid)  # Prints False

The invalid attribute and validator decorator would be in the base
class:
class base(object):

    @staticmethod  # so child can say: @base.validator
    def validator(func):
        """Mark the function as a validator."""
        func._validator = True
        return func

    def _get_invalid(self):
    """Collect all validation results from registered validators"""
        result = []
        for attrName in dir(self):
            # Prevent recursive calls
            if attrName == 'get_invalid' or attrName == 'invalid':
continue

            attr =  eval('self.' + attrName)  # Get attribute

            if str(type(attr)) == "<type 'instancemethod'>":  # Check
if is function
                if hasattr(attr, '_validator'):  # Check if function
is a validator
                    valerr = attr()  # Get result of validation
                    # Validation result can be a single string, list
of strings, or None.
                    # If the validation fails, it will be a string or
list of strings
                    # which describe what the validation errors are.
                    # If the validation succeeds, None is returned.
                    if type(valerr) == type([]):
                        for err in valerr:
                            result.append(err)
                    else:
                        if valerr != None: result.append(valerr)
        return result  # List of validation error strings
    invalid = property(_get_invalid, None, None, "List of validation
errors")  # Read-only, so no fset or fdel

I don't really like the _get_invalid() logic that reflects over each
attribute and does ugly string comparisons.  Is there a cleaner/more
pythonic way to do this reflection?

Also, I am using a decorator to simply mark a function as being a
validator.  This works, but I must enumerate all of the object's
attributes to find all the validators.  My original plan was to use a
decorator to "register" a function as being a validator.  Then the
_get_invalid() call would only need to enumerate the registered
functions.  I ran into problems when I couldn't figure out how to use
a decorator to store a function in an attribute of the function's
class:
# Decorator to register function as a validator
def validator(func):
    """Save func in a list of validator functions on the object that
contains func"""
    self._validation_functions.append(func)  # ERROR: Cannot access
self this way
    return func

I appreciate your feedback.  I am relatively new to python but have
become completely enamored by it after looking for alternatives to the
MS languages I use to develop business apps.

2. Parsing data from pyserial, an (inefficient) solution

3. Iterating command switches from a data file - have a working solution but it seems inefficient

News wrote:

> OT: I saw several references to "OP". What does this mean?

It's a TOOTKA :-)

You -  The Original Post/Poster.

Gerard

4. Iterating command switches from a data file - have a working solution but it seems inefficient

5. Iterating command switches from a data file - have a working solution but it seems inefficient

News wrote:
> Hi everyone,
> 
> My goal is to pull command switches/options from a file and then assign
> the values to select variables which would eventually be included in a
> class object.
> 
> The data file looks something like this but the switches could be in any
> order and not all may be used.
> 
> -m quemanager -s server -p port -k key -o object -c 20 -t  XXXX@XXXXX.COM 
> 
> Also, please keep in mind that the source code will have more than one
> line in it and each has to be treaded separately.

I think you could just use getopt or optparse, passing in each line 
after doing a simple .split() on it.  The resulting handling of the 
arguments would be much simpler and cleaner than what you have.

That won't work perfectly well if you can have quoted arguments with 
spaces in them, however, but the example above does not.

-Peter

6. Is shelve/dbm supposed to be this inefficient?

7. Python 'for' loop is memory inefficient

8. max(), sum(), next()