PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Monday, November 7, 2022

[FIXED] How are lists converted into sets in Python 3? And what's the performance/complexity of this conversion?

 November 07, 2022     big-o, list, python, set, time-complexity     No comments   

Issue

Maybe I missed something, but I couldn't find an answer to this on SO or the Python docs.

Got curious about what actually happens under the hood when we write the following code:

maybe_smaller_set = set(some_huge_list)

Since sets don't have duplicates, I imagine there's something like a list comprehension or some clever trick (maybe using map() or filter()?) that occurs to avoid iterating over everything in some_huge_list.

So my questions are these:

  1. How does Python actually do this conversion from a list instance into a set instance?
  2. And how costly is this conversion in terms of algorithmic complexity (ie, big-O)?

Solution

Python sets are based on hash tables. As such, insertion operations, per-element, are average/expected case O(1), though in pathological cases they can approach O(n).

There is no trick that can be used to avoid iterating the input list though; you can't iterate all the elements without iterating all the elements in a general purpose data structure, no matter the language. The expected case for constructing a set from a list will be O(n), and the theoretical worst case scenario (if literally every element's hash code collides with every other element, but all elements are unique) would be O(n²).

As for how Python does it, it's nothing special; internally, it's roughly equivalent to:

newself = set()
newself.update(iterable)

which is in turn roughly equivalent to:

newself = set()
for elem in iterable:
    newself.add(elem)

it just pushes all the work to the C layer (on the CPython reference interpreter), bypassing the Python layer work such a loop would entail.

To correct an additional misapprehension: There are no clever tricks for map or filter either. In the best case scenario (when called with a built-in function implemented in C), both of them just avoid a little Python level overhead that the equivalent generator expression would entail. But they're fully general purpose; they use no special tricks for list or set inputs or outputs, they just steadily grind through the input iterator, mapping/filtering each element as they're requested (when the map/filter object is itself iterated). They're definitionally O(n).



Answered By - ShadowRanger
Answer Checked By - Marie Seifert (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing