Showing posts with label time-complexity. Show all posts

Monday, November 7, 2022

[FIXED] What is time complexity of a list to set conversion?

November 07, 2022 hash, list, python, set, time-complexity No comments

Issue

I've noticed the table of the time complexity of set operations on the python official website. But i just wanna ask what's the time complexity of converting a list to a set, for instance,

l = [1, 2, 3, 4, 5]
s = set(l)

I kind of know that this is actually a hash table, but how exactly does it work? Is it O(n) then?

Solution

Yes. Iterating over a list is O(n) and adding each element to the hash set is O(1), so the total operation is O(n).

Answered By - Mad Physicist

Answer Checked By - Mildred Charles (PHPFixing Admin)

[FIXED] How are lists converted into sets in Python 3? And what's the performance/complexity of this conversion?

November 07, 2022 big-o, list, python, set, time-complexity No comments

Issue

Maybe I missed something, but I couldn't find an answer to this on SO or the Python docs.

Got curious about what actually happens under the hood when we write the following code:

maybe_smaller_set = set(some_huge_list)

Since sets don't have duplicates, I imagine there's something like a list comprehension or some clever trick (maybe using map() or filter()?) that occurs to avoid iterating over everything in some_huge_list.

So my questions are these:

How does Python actually do this conversion from a list instance into a set instance?
And how costly is this conversion in terms of algorithmic complexity (ie, big-O)?

Solution

Python sets are based on hash tables. As such, insertion operations, per-element, are average/expected case O(1), though in pathological cases they can approach O(n).

There is no trick that can be used to avoid iterating the input list though; you can't iterate all the elements without iterating all the elements in a general purpose data structure, no matter the language. The expected case for constructing a set from a list will be O(n), and the theoretical worst case scenario (if literally every element's hash code collides with every other element, but all elements are unique) would be O(n²).

As for how Python does it, it's nothing special; internally, it's roughly equivalent to:

newself = set()
newself.update(iterable)

which is in turn roughly equivalent to:

newself = set()
for elem in iterable:
    newself.add(elem)

it just pushes all the work to the C layer (on the CPython reference interpreter), bypassing the Python layer work such a loop would entail.

To correct an additional misapprehension: There are no clever tricks for map or filter either. In the best case scenario (when called with a built-in function implemented in C), both of them just avoid a little Python level overhead that the equivalent generator expression would entail. But they're fully general purpose; they use no special tricks for list or set inputs or outputs, they just steadily grind through the input iterator, mapping/filtering each element as they're requested (when the map/filter object is itself iterated). They're definitionally O(n).

Answered By - ShadowRanger

Answer Checked By - Marie Seifert (PHPFixing Admin)

[FIXED] How to write a python3 function that merges two lists of sets so that the output list has sets of elements that were present in both input sets?

November 07, 2022 data-structures, merge, python, set, time-complexity No comments

Issue

I have two lists of sets, let's say: [{1, 2, 3}, {4, 5}, {6, 7}] and [{1, 2}, {3, 4}, {5, 6, 7}]

No set in the list has the same element and the sum of all sets in both lists is the same. The function should check if the sets in both lists had the same elements. If there were some differences, put them in another set.

So above example should return: [{1, 2}, {3}, {4}, {5}, {6, 7}]

I work on large sets so I would need this function to be as effective as possible.

Here is the example code and how I would like it to work:

def mergeSets(x, y):
    out = set()
    for i in x:
        out = out.union(i)
        # this allows me to get the set of all elements but here where my mind stops working
        # the problem sounds simple but thinking hours I can not think of good algorythm for this       issue :(
        # I found set.intersection() function but it works on single sets only, not lists of sets
    return out


x = mergeSets([{1, 2, 3}, {4, 5}, {6, 7}], [{1, 2}, {3, 4}, {5, 6, 7}])
print(x)
# [{1, 2}, {3}, {4}, {5}, {6, 7}]
x = mergeSets([{1, 2}, {3, 4, 5, 6, 7}, {8}], [{1}, {2, 3, 4}, {5, 6, 7, 8}])
print(x)
# [{1}, {2}, {3, 4}, {5, 6, 7}, {8}]

EDIT: the data doesn't have to be sorted and may be even of different types than integer

EDIT2: the input lists don't have to be sorted so sets may appear in random order

Solution

Given that each value occurs in exactly two sets (one per input list), you could collect the index pairs for each value, where an index pair gives an indication in which two sets (at which indexes in both lists) the value occurs.

Unique pairs indicate unique sets in the output, so a dictionary of such pairs can serve to populate the result:

from collections import defaultdict

def merge_sets(lista, listb):
    index_in_a = {
        val: idx
        for idx, elem in enumerate(lista) for val in elem
    }
    set_by_key = defaultdict(set)
    for idx, elem in enumerate(listb):
        for val in elem:
            set_by_key[(index_in_a[val], idx)].add(val)
    return list(set_by_key.values())

This looks O(n) to me.

NB: since the order of iteration over a set is not defined, the order of the output can look a bit mingled, but I assume the order of where sets appear in the output is not significant.

Answered By - trincot

Answer Checked By - Dawn Plyler (PHPFixing Volunteer)

[FIXED] What is the time complexity of HashMap.containsKey() in java?

November 01, 2022 hashmap, java, performance, time-complexity No comments

Issue

I need to know: What is the time complexity of HashMap.containsKey() in java?

Solution

From the API doc ofHashMap:

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.

Since containsKey() is just a get() that throws away the retrieved value, it's O(1) (assuming the hash function works properly, again).

Answered By - Michael Borgwardt

Answer Checked By - David Goodson (PHPFixing Volunteer)

[FIXED] How do I profile a Python script?

October 31, 2022 optimization, performance, profiling, python, time-complexity No comments

Issue

Project Euler and other coding contests often have a maximum time to run or people boast of how fast their particular solution runs. With Python, sometimes the approaches are somewhat kludgey - i.e., adding timing code to __main__.

What is a good way to profile how long a Python program takes to run?

Solution

Python includes a profiler called cProfile. It not only gives the total running time, but also times each function separately, and tells you how many times each function was called, making it easy to determine where you should make optimizations.

You can call it from within your code, or from the interpreter, like this:

import cProfile
cProfile.run('foo()')

Even more usefully, you can invoke the cProfile when running a script:

python -m cProfile myscript.py

To make it even easier, I made a little batch file called 'profile.bat':

python -m cProfile %1

So all I have to do is run:

profile euler048.py

And I get this:

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.061    0.061 <string>:1(<module>)
 1000    0.051    0.000    0.051    0.000 euler048.py:2(<lambda>)
    1    0.005    0.005    0.061    0.061 euler048.py:2(<module>)
    1    0.000    0.000    0.061    0.061 {execfile}
    1    0.002    0.002    0.053    0.053 {map}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler objects}
    1    0.000    0.000    0.000    0.000 {range}
    1    0.003    0.003    0.003    0.003 {sum}

EDIT: Updated link to a good video resource from PyCon 2013 titled Python Profiling
Also via YouTube.

Answered By - Chris Lawlor

Answer Checked By - David Goodson (PHPFixing Volunteer)

[FIXED] When should us use normal BFS over bidirectional BFS?

June 28, 2022 algorithm, breadth-first-search, graph, graph-theory, time-complexity No comments

Issue

I understand that Bidirectional BFS has a lot of advantage over using normal BFS, as it theoretically halves the time to discover the shortest path between two nodes and the time to find if a node is reachable from another node.

Also I understand that we should use Bidirectional only if we have Uniquely defined both the nodes.

Is there any situation when we should prefer a normal BFS over bidirectional BFS?

Solution

I understand that bidirectional BFS consists of, given start and goal nodes, alternately expanding layers of nodes from start and goal, until a node in the middle has been reached from both ends. The shortest path from start to goal is then understood to be the shortest from start to middle node, continued by the shortest from middle to goal. I can see that less nodes may need to be expanded as compared with a standard BFS approach. However,

It is easier to implement standard BFS (sBFS) than to implement a bidirectional BFS (bBFS for short). Simple is often good, as it is easier to code and to later verify its correctness.
If the graph is directed and unweighted, sBFS guarantees that it will find the shortest path from start to goal in minimal steps; bBFS is not guaranteed to work with directed graphs.
After running sBFS, you can reconstruct shortest paths from the start to all nodes that are at most 1 step before the goal node (that is, before the search was stopped). This may be valuable in and of itself. Running bBFS does not generate such a list.

With this in mind, I would argue the bBFS is only useful for a very narrow case (where, depending on the graph, it is expected to perform better than sBFS), and sBFS is both simpler and useful in a larger range of scenarios.

Answered By - tucuxi

Answer Checked By - Marilyn (PHPFixing Volunteer)

[FIXED] How should I change my Graph structure (very slow insertion)?

June 28, 2022 c, graph, hashtable, space-complexity, time-complexity No comments

Issue

This program I'm doing is about a social network, which means there are users and their profiles. The profiles structure is UserProfile.

Now, there are various possible Graph implementations and I don't think I'm using the best one. I have a Graph structure and inside, there's a pointer to a linked list of type Vertex. Each Vertex element has a value, a pointer to the next Vertex and a pointer to a linked list of type Edge. Each Edge element has a value (so I can define weights and whatever it's needed), a pointer to the next Edge and a pointer to the Vertex owner.

I have a 2 sample files with data to process (in CSV style) and insert into the Graph. The first one is the user data (one user per line); the second one is the user relations (for the graph). The first file is quickly inserted into the graph cause I always insert at the head and there's like ~18000 users. The second file takes ages but I still insert the edges at the head. The file has about ~520000 lines of user relations and takes between 13-15mins to insert into the Graph. I made a quick test and reading the data is pretty quickly, instantaneously really. The problem is in the insertion.

This problem exists because I have a Graph implemented with linked lists for the vertices. Every time I need to insert a relation, I need to lookup for 2 vertices, so I can link them together. This is the problem... Doing this for ~520000 relations, takes a while.

How should I solve this?

Solution 1) Some people recommended me to implement the Graph (the vertices part) as an array instead of a linked list. This way I have direct access to every vertex and the insertion is probably going to drop considerably. But, I don't like the idea of allocating an array with [18000] elements. How practically is this? My sample data has ~18000, but what if I need much less or much more? The linked list approach has that flexibility, I can have whatever size I want as long as there's memory for it. But the array doesn't, how am I going to handle such situation? What are your suggestions?

Using linked lists is good for space complexity but bad for time complexity. And using an array is good for time complexity but bad for space complexity.

Any thoughts about this solution?

Solution 2) This project also demands that I have some sort of data structures that allows quick lookup based on a name index and an ID index. For this I decided to use Hash Tables. My tables are implemented with separate chaining as collision resolution and when a load factor of 0.70 is reach, I normally recreate the table. I base the next table size on this Link.

Currently, both Hash Tables hold a pointer to the UserProfile instead of duplication the user profile itself. That would be stupid, changing data would require 3 changes and it's really dumb to do it that way. So I just save the pointer to the UserProfile. The same user profile pointer is also saved as value in each Graph Vertex.

So, I have 3 data structures, one Graph and two Hash Tables and every single one of them point to the same exact UserProfile. The Graph structure will serve the purpose of finding the shortest path and stuff like that while the Hash Tables serve as quick index by name and ID.

What I'm thinking to solve my Graph problem is to, instead of having the Hash Tables value point to the UserProfile, I point it to the corresponding Vertex. It's still a pointer, no more and no less space is used, I just change what I point to.

Like this, I can easily and quickly lookup for each Vertex I need and link them together. This will insert the ~520000 relations pretty quickly.

I thought of this solution because I already have the Hash Tables and I need to have them, then, why not take advantage of them for indexing the Graph vertices instead of the user profile? It's basically the same thing, I can still access the UserProfile pretty quickly, just go to the Vertex and then to the UserProfile.

But, do you see any cons on this second solution against the first one? Or only pros that overpower the pros and cons on the first solution?

Other Solution) If you have any other solution, I'm all ears. But please explain the pros and cons of that solution over the previous 2. I really don't have much time to be wasting with this right now, I need to move on with this project, so, if I'm doing to do such a change, I need to understand exactly what to change and if that's really the way to go.

Hopefully no one fell asleep reading this and closed the browser, sorry for the big testament. But I really need to decide what to do about this and I really need to make a change.

P.S: When answering my proposed solutions, please enumerate them as I did so I know exactly what are you talking about and don't confuse my self more than I already am.

Solution

The first approach is the Since the main issue here is speed, I would prefer the array approach.

You should, of course, maintain the hash table for the name-index lookup.

If I understood correctly, you only process the data one time. So there is no dynamic data insertion.

To deal with the space allocation problem, I would recommend:

1 - Read once the file, to get the number of vertex.

2 - allocate that space

If you data is dynamic, you could implement some simple method to increment the array size in steps of 50%.

3 - In the Edges, substitute you linked list for an array. This array should be dynamically incremented with steps of 50%.

Even with the "extra" space allocated, when you increment the size with steps of 50%, the total size used by the array should only be marginally larger than with the size of the linked list.

I hope I could help.

Answered By - Khalid Salomão

Answer Checked By - David Goodson (PHPFixing Volunteer)

Monday, November 7, 2022

[FIXED] What is time complexity of a list to set conversion?

Issue

Solution

[FIXED] How are lists converted into sets in Python 3? And what's the performance/complexity of this conversion?

Issue

Solution

[FIXED] How to write a python3 function that merges two lists of sets so that the output list has sets of elements that were present in both input sets?

Issue

Solution

Tuesday, November 1, 2022

[FIXED] What is the time complexity of HashMap.containsKey() in java?

Issue

Solution

Monday, October 31, 2022

[FIXED] How do I profile a Python script?

Issue

Solution

Tuesday, June 28, 2022

[FIXED] When should us use normal BFS over bidirectional BFS?

Issue

Solution

[FIXED] How should I change my Graph structure (very slow insertion)?

Issue

Solution

Total Pageviews

Featured Post

Why Learn PHP Programming

Monday, November 7, 2022

Issue

Solution

Issue

Solution

Issue

Solution

Tuesday, November 1, 2022

Issue

Solution

Monday, October 31, 2022

Issue

Solution

Tuesday, June 28, 2022

Issue

Solution

Issue

Solution

Total Pageviews

Featured Post

Subscribe To