Showing posts with label for-loop. Show all posts

Friday, December 16, 2022

[FIXED] What is the correct syntax when calling a variable name from within a function

December 16, 2022 data.table, for-loop, function, r, syntax No comments

Issue

I have data as follows:

library(data.table)
set.seed(1) 
year = c(rep(2000,5), rep(2001,5),  rep(2002,5),  rep(2003,5),  rep(2004,5))
DT <- data.table(panelID = sample(10,10),                                                   
                      some_type = as.factor(sample(0:5, 6)),                                             
                      some_other_type = as.factor(sample(0:5, 6)),         
                      Group = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
                      wt = 15*round(runif(100)/10,2),
                      Income = round(rnorm(10,-5,5),2),
                      Income_proxy = round(rnorm(10,-6,6),2),
                      year = rep(year,4),
                      Happiness = sample(10,10),
                      Sex = round(rnorm(10,0.75,0.3),2),
                      Age = sample(100,100),
                      Height= 150*round(rnorm(10,0.75,0.3),2))

I am trying to write a function that automatically creates certain calculations, just by providing the grouping variables.

calulate_relative_dev <- function(DT, varA="Income", varB="Income_proxy", groups, years=NULL) {
  if (is.null(years)) {
    out_names <- paste0("rel_deviation_", groups[i]) 
    for (i in seq_along(groups)) {
      setDT(DT)[, (out_names[i]) := 100*mean((varA - varB) / varA), by=eval(groups[i])]
    }
  } else if (!is.null(years))
    out_names <- paste0("rel_deviation_", groups[i], years[i]) 
    for (i in seq_along(groups)) {
      for (j in seq_along(years)) {
        setDT(DT)[, (out_names[i]) := 100*mean((varA - varB) / varA), by=eval(groups[i], years[i])]
      }
    }
}

In order to do:

calulate_relative_dev(DT, groups = c("Group","some_type"))

and

calulate_relative_dev(DT, groups = c("Group","some_type"), years=year))

But when I do, I get the following error:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'mean': object 'Income' not found
Called from: h(simpleError(msg, call))

If I try to put Income in quotes, I get:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'mean': non-numeric argument to binary operator
Called from: h(simpleError(msg, call))

How should I write the syntax here?

Solution

Based on your comment/reply to my question, I understand years is intended to be a logical. Here is one possible function:

calulate_relative_dev <- function(DT, varA="Income", varB="Income_proxy", groups, year=FALSE) {
  dt <- copy(DT)
  setnames(dt, old = c(varA, varB), new = c("varA", "varB"))
  for (i in seq_along(groups)) {
    out_names <- paste0("rel_deviation_", groups)
    if(year) out_names <- paste0(out_names, "_by_year")
    dt[, c(out_names[i]) := 100*mean((varA - varB) / varA), by=c(groups[i], if(year){"year"})]
  }
  setnames(dt, old = c("varA", "varB"), new = c(varA, varB))
  return(dt[])
}

calulate_relative_dev(DT, groups = c("Group","some_type"))
calulate_relative_dev(DT, groups = c("Group","some_type"), year=TRUE)

I did temporary renames to make the data.table code simpler to read/write. Returning dt[] ensures the data.table is printed after the function is evaluated.

Answered By - Hutch3232

Answer Checked By - Katrina (PHPFixing Volunteer)

[FIXED] what does this line of syntax mean in c++?

December 14, 2022 c++, for-loop, syntax No comments

Issue

this is a quick question, Im translating a program that's in C++ to C, and I saw this line of code,

for (int v : adj[u]) {

referenced in this article: link

and I am not really sure what it does. I tried googling it and got results for range based for loops in C++, but cannot find anything that has this exact syntax and what it means. Help would be much appreciated.

Solution

It's a very simple for loop that iterates over the elements of adj[u], going 1 by 1.

Answered By - Dennis Kozevnikoff

Answer Checked By - Cary Denson (PHPFixing Admin)

[FIXED] What does "for(;;)" mean?

December 14, 2022 c, c++, for-loop, syntax No comments

Issue

In C/C++, what does the following mean?

for(;;){
    ...
}

Solution

It's an infinite loop, equivalent to while(true). When no termination condition is provided, the condition defaults to false (i.e., the loop will not terminate).

Answered By - Justin Ardini

Answer Checked By - Cary Denson (PHPFixing Admin)

[FIXED] How to avoid for-loops with multiple criteria in function which()

November 22, 2022 for-loop, multiple-conditions, performance, r No comments

Issue

I have a 25 years data set that looks similar to the following:

        date name        value tag
1 2014-12-01    f -0.338578654  12
2 2014-12-01    a  0.323379254   4
3 2014-12-01    f  0.004163806   9
4 2014-12-01    f  1.365219477   2
5 2014-12-01    l -1.225602543   7
6 2014-12-01    d -0.308544089   9

This is how to replicate it:

set.seed(9)
date <- rep(seq(as.Date("1990-01-01"), as.Date("2015-01-1"), by="months"), each=50)
N <- length(date)
name <- sample(letters, N, replace=T)
value <- rnorm(N)
tag <- sample(c(1:50), N, replace=T)
mydata <- data.frame(date, name, value, tag)
head(mydata)

I would like to create a new matrix that stores values that satisfy multiple criteria. For instance, the sum of values that have a name j and a tag i. I use two for-loops and the which() function to filter out the correct values. Like this:

S <- matrix(data=NA, nrow=length(unique(mydata$tag)), ncol=length(unique(mydata$name)))
for(i in 1:nrow(S)){
  for (j in 1:ncol(S)){
    foo <- which(mydata$tag == unique(mydata$tag)[i] & mydata$name == unique(mydata$name)[j])
    S[i,j] <- sum(mydata$value[foo])
  }
}

This is ok for small data sets, but too slow for larger ones. Is it possible to avoid the for-loops or somehow speed up the process?

Solution

You can use dcast from package reshape2, with a custom function to sum your values:

library(reshape2)
dcast(mydata, name~tag, value.var='value', fun.aggregate=sum)

Or simply xtabs, base R:

xtabs(value~name+tag, mydata)

Some benchmark:

funcPer = function(){
    S <- matrix(data=NA, nrow=length(unique(mydata$tag)), ncol=length(unique(mydata$name)))
    for(i in 1:nrow(S)){
      for (j in 1:ncol(S)){
        foo <- which(mydata$tag == unique(mydata$tag)[i] & mydata$name == unique(mydata$name)[j])
        S[i,j] <- sum(mydata$value[foo])
      }
    }
}

colonel1 = function() dcast(mydata, name~tag, value.var='value', fun.aggregate=sum)

colonel2 = function() xtabs(value~name+tag, mydata)

#> system.time(colonel1())
#  user  system elapsed 
#   0.01    0.00    0.01 
#> system.time(colonel2())
#   user  system elapsed 
#   0.05    0.00    0.05 
#> system.time(funcPer())
#   user  system elapsed 
#   4.67    0.00    4.82

Answered By - Colonel Beauvel

Answer Checked By - Mildred Charles (PHPFixing Admin)

[FIXED] How can I put multiple statements in one line in python?without using ; and exec

November 05, 2022 for-loop, lambda, python No comments

Issue

I want to write this code in one line without using ; and exec

input_string = str(input())
array = []
for i in range(len(input_string)):
    if (ord(input_string[i]) - 97) % 2 == 0:
       array.append(input_string[i])
    else:
       array.append(input_string[i].upper())
array.sort(reverse=True)
answer = ' '.join(array)
print(answer)

and couldn't do that so i came up with 4 line like this

input_string = str(input())
array = []
for i in range(len(input_string)): array.append(input_string[i]) if (ord(input_string[i]) -97) % 2 == 0 else array.append(input_string[i].upper())
print(' '.join(sorted(array,reverse=True)))

please help me to write this code in one line. thank you all in advance.

Solution

Done.

print(' '.join(sorted([letter if (ord(letter) -97) % 2 == 0 else letter.upper() for letter in str(input())],reverse=True)))

Answered By - Elahe M

Answer Checked By - Katrina (PHPFixing Volunteer)

[FIXED] How to cast lambda parameter to char in the ifPresentOrElse() method

November 04, 2022 for-loop, java, java-stream, lambda, stack No comments

Issue

How do I fix this code block (in ifPresentOrElse())?

I'm stuck here with:

Inconvertible types; cannot cast '<lambda parameter>' to 'char'

Please advise how to get this compiled and running.

public static boolean isBracketsInOrder1(String bracket) {
    
    Stack<Character> charStack = new Stack<>();
    static Map<Character, Character> leftToRightBracketsMap = 
                                       Map.of('{', '}', '[', ']', '(', ')');
    bracket.chars()
        .filter(i -> leftToRightBracketsMap.containsKey((char) i))
        .findFirst()
        .ifPresentOrElse((i) -> charStack.push((char) i),
            (i) -> {
                // code does not COMPILE at equals((char) i)
                return leftToRightBracketsMap.get(charStack.pop()).equals((char) i);
            });
    return true;
}

And this is the working code using for loop representing what I'm trying to implement above using streams above.

public static boolean isBracketsInOrder(String bracket) {
    Stack<Character> charStack = new Stack<>();
    static Map<Character, Character> leftToRightBracketsMap = 
                                       Map.of('{', '}', '[', ']', '(', ')');
    boolean matchFound = true;
    for (char c : bracket.toCharArray()) {
        if (leftToRightBracketsMap.containsKey(c)) charStack.push(c);
        else {
            char leftBrack = charStack.pop();
            char correctBrack = leftToRightBracketsMap.get(leftBrack);
            if (c != correctBrack) return false;
        }
    }
    return true;
}

Solution

You've introduced the code for a very basic algorithmic question - validate a string of brackets.

There are many mistakes in your code:

A stream doesn't act precisely like a loop, when it hits the terminal operation (which is findFirst() in your code), it's done. Code inside ifPresentOrElse() would not be executed multiple times (you probably expected the opposite). It would be invoked only once on an optional result returned by the findFirst().
As its second argument ifPresentOrElse() expects an instance of the Runnable interface. Method run() neither expects any arguments, no returns a value. Therefore, this attempt to define a Runnable is incorrect: (i) -> { return something; }.
Any lambda expressions should to conform to a particular functional interface (see). It can't appear out of nowhere.
Class Stack is legacy, it's still in the JDK for backward compatibility reasons. Implementations of the Deque interface should be used instead.
You are not checking whether the stack is empty, which can cause an EmptyStackException. If you would replace the Stack with ArrayDeque the problem will remain, method pop() will throw NoSuchElementException. You need to make sure that stack is not empty.
Returning true in the imperative solution is not correct. Instead, you need to return charStack.isEmpty(), because if there are some elements in the stack - sequence is not valid, there are brackets that haven't been closed.

Implementing this problem using streams requires far more efforts than a solution using a plain loop.

According to the documentation, the only place where mutation can occur in a stream is inside the collector. As a general rule, all functions used in a stream pipeline should not operate via side effects and accumulate the stated outside the stream (apart from some edge cases, see the link). Only collector's mutable container should maintain a state.

We can contract such a collector using Collecor.of() method:

public static boolean isValidBracketSequence(String brackets) {
    return brackets.chars()
        .mapToObj(c -> (char) c)
        .collect(getBracketCollector());
}

public static Collector<Character, ?, Boolean> getBracketCollector() {
    
    return Collector.of(
        BracketContainer::new,
        BracketContainer::add,
        (left, right) -> { throw new AssertionError("should not be executed in parallel"); },
        bracketContainer -> bracketContainer.isValid() && bracketContainer.isEmpty()
    );
}

That's how a mutable container might look like:

class BracketContainer {
    public static final Map<Character, Character> leftToRightBracketsMap =
        Map.of('(', ')', '[', ']', '{', '}');
    
    private Deque<Character> stack = new ArrayDeque<>();
    private boolean isValid = true;
    
    public void add(Character next) {
        if (!isValid) return;
        
        if (leftToRightBracketsMap.containsKey(next)) {
            stack.push(next);
        } else {
            compare(next);
        }
    }
    
    public void compare(Character next) {
        
        this.isValid = !isEmpty() && leftToRightBracketsMap.get(stack.pop()).equals(next);
    }
    
    public boolean isEmpty() {
        return stack.isEmpty();
    }
    
    public boolean isValid() {
        return isValid;
    }
}

main()

public static void main(String[] args) {
    System.out.println("(([])) -> " + isValidBracketSequence("(([]))")); // true
    System.out.println("(([]]) -> " + isValidBracketSequence("(([]])")); // false
    System.out.println("(([})) -> " + isValidBracketSequence("(([}))")); // false
    System.out.println("({[])) -> " + isValidBracketSequence("({[]))")); // false
    System.out.println("({[]}) -> " + isValidBracketSequence("({[]})")); // true
}

Output:

(([])) -> true
(([]]) -> false
(([})) -> false
({[])) -> false
({[]}) -> true

A link to Online Demo

Answered By - Alexander Ivanchenko

Answer Checked By - Pedro (PHPFixing Volunteer)

[FIXED] How to extract insights from facebook action dataset and covert all values into each column

November 03, 2022 dictionary, facebook-graph-api, for-loop, json, pandas No comments

Issue

Here is dataset as shown in below and I want to convert it into each data column with their values as

i want to append the values in columns and I tried this code

y = data['actions'].apply(lambda x: str(x).replace("'",'"'))
json.loads(y[0])
json.loads(y[1])

it gives output like as shown in below

[{'action_type': 'post_reaction', 'value': '2'},
 {'action_type': 'link_click', 'value': '42'},
 {'action_type': 'comment', 'value': '1'},
 {'action_type': 'post_engagement', 'value': '45'},
 {'action_type': 'page_engagement', 'value': '45'},
 {'action_type': 'onsite_conversion.lead_grouped', 'value': '6'},
 {'action_type': 'leadgen_grouped', 'value': '6'},
 {'action_type': 'lead', 'value': '6'}]

[{'action_type': 'onsite_conversion.post_save', 'value': '1'},
 {'action_type': 'post_reaction', 'value': '4'},
 {'action_type': 'link_click', 'value': '62'},
 {'action_type': 'post_engagement', 'value': '67'},
 {'action_type': 'page_engagement', 'value': '67'},
 {'action_type': 'onsite_conversion.lead_grouped', 'value': '6'},
 {'action_type': 'leadgen_grouped', 'value': '6'},
 {'action_type': 'lead', 'value': '6'}]

I want to create the dataframe that gives each action type as column and append their values in respective columns and if there is no value it appends zero like

| post_reaction| link click | comment |---------------------
| --------     | -----------|---------|
| 2            | 42         |1        |
|  4           | 62         |67       |

Solution

If no lists in data use ast.literal_eval for converting first and then with list comprehension create DataFrame:

import ast

y = data['actions'].apply(ast.literal_eval)

df = pd.DataFrame([{z['action_type']:z['value'] for z in x} for x in y]).fillna(0)

If lists in data use only list comprehension:

df = (pd.DataFrame([{z['action_type']:z['value'] for z in x} for x in data['actions']])
        .fillna(0))

Answered By - jezrael

Answer Checked By - Gilberto Lyons (PHPFixing Admin)

[FIXED] How to perform same operation on multiple text files and save the output in different files using python?

November 02, 2022 file, for-loop, nlp, python, text-files No comments

Issue

I have written a code which extracts stop words from a text file and outputs two new text files. One file contains the stop words from that text file and another file contains the data without stop words. Now I have more than 100 text file in a folder, I would like to perform the same operation on all those file simultaneously.

For example there is a Folder A which contains 100 text file the code should be executed on all those text files simultaneously. The output should be two new text files such as 'Stop_Word_Consist_Filename.txt' and 'Stop_word_not_Filename.txt' which should be stored in a separate folder.That means for every 100 text files there will 200 output text files stored in a new folder. Please note the 'Filename' in both these output file is the actual name of the text file meaning 'Walmart.txt' should have 'Stop_Word_Consist_Walmart.txt' and 'Stop_word_not_Walmart.txt'. I did try few things and I know loop in involved giving the path directory but I didn't get any success.

Apologies for such a long question.

Following is the code for 1 file.

import numpy as np
import pandas as pd

# Pathes of source files and that for after-modifications
files_path = os.getcwd()
# another folder, your should create first to store files after modifications in
files_after_path = os.getcwd() + '/' + 'Stopwords_folder'
os.makedirs(files_after_path, exist_ok=True)
text_files = os.listdir(files_path)
data = pd.DataFrame(text_files)
data.columns = ["Review_text"]

import re
import nltk
import string
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

def clean_text(df):
    all_reviews = list()
    #lines = df["Review_text"].values.tolist()
    lines = data.values.tolist()

    for text in lines:
        #text = text.lower()
        text = [word.lower() for word in text]

        pattern = re.compile('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
        text = pattern.sub('', str(text))
        
        emoji = re.compile("["
                           u"\U0001F600-\U0001FFFF"  # emoticons
                           u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                           u"\U0001F680-\U0001F6FF"  # transport & map symbols
                           u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           u"\U00002702-\U000027B0"
                           u"\U000024C2-\U0001F251"
                           "]+", flags=re.UNICODE)
        text = emoji.sub(r'', text)
        
        text = re.sub(r"i'm", "i am", text)
        text = re.sub(r"he's", "he is", text)
        text = re.sub(r"she's", "she is", text)
        text = re.sub(r"that's", "that is", text)        
        text = re.sub(r"what's", "what is", text)
        text = re.sub(r"where's", "where is", text) 
        text = re.sub(r"\'ll", " will", text)  
        text = re.sub(r"\'ve", " have", text)  
        text = re.sub(r"\'re", " are", text)
        text = re.sub(r"\'d", " would", text)
        text = re.sub(r"\'ve", " have", text)
        text = re.sub(r"won't", "will not", text)
        text = re.sub(r"don't", "do not", text)
        text = re.sub(r"did't", "did not", text)
        text = re.sub(r"can't", "can not", text)
        text = re.sub(r"it's", "it is", text)
        text = re.sub(r"couldn't", "could not", text)
        text = re.sub(r"have't", "have not", text)
        
        text = re.sub(r"[,.\"!@#$%^&*(){}?/;`~:<>+=-]", "", text)
        tokens = word_tokenize(text)
        table = str.maketrans('', '', string.punctuation)
        stripped = [w.translate(table) for w in tokens]
        words = [word for word in stripped if word.isalpha()]
        stop_words = set(stopwords.words("english"))
        stop_words.discard("not")
        PS = PorterStemmer()
        words = [PS.stem(w) for w in words if not w in stop_words]
        words = ' '.join(words)
        all_reviews.append(words)
    return all_reviews,stop_words

for entry in data:
    #all_reviews , stop_words = clean_text(entry)
    for r in all_reviews: 
        if not r in stop_words: 
            appendFile = open(f'No_Stopwords{entry}.txt','a') 
            appendFile.write(" "+r) 
            appendFile.close() 
    
    for r in stop_words: 
        appendFile = open(f'Stop_Word_Consist{entry}.txt','a') 
        appendFile.write(" "+r) 
        appendFile.close() 
        
    all_reviews , stop_words = clean_text(entry)

UPDATE :

So I have made changes to the code. I did got two output files Stop_Word_Consist and No_Stop_word. But I am not getting the required Data inside. Meaning Stop_word consist does not have the stop words I am looking for. I am pretty sure I made some mistakes in indentation. I would appreciate the help.

Solution

You can use OS.listdir to get the number of text files, and use a for loop to run each time. To assign a name to the output file you can use an f-string in its creation so it looks like f'Stop_Word_Consist_{fileName}':

for entry in OS.listdir(folder location):
    all_reviews , stop_words = clean_text(data_1)
    all_reviews[:]

for r in all_reviews: 
    if not r in stop_words: 
    appendFile = open('Stop_Word_hrb02-phil-usa.txt.txt','a') 
    appendFile.write(" "+r) 
    appendFile.close() 

for r in stop_words: 
    appendFile = open(f'Stop_Word_Consist{entry}.txt','a') 
    appendFile.write(" "+r) 
    appendFile.close()

Answered By - Le_Me

Answer Checked By - Terry (PHPFixing Volunteer)

[FIXED] How can i optimize my Python loop for speed

November 01, 2022 for-loop, ocr, performance, python, python-tesseract No comments

Issue

I wrote some code that uses OCR to extract text from screenshots of follower lists and then transfer them into a data frame.

The reason I have to do the hustle with "name" / "display name" and removing blank lines is that the initial text extraction looks something like this:

Screenname 1

name 1

Screenname 2

name 2

(and so on)

So I know in which order each extraction will be. My code works well for 1-30 images, but if I take more than that its gets a bit slow. My goal is to run around 5-10k screenshots through it at once. I'm pretty new to programming so any ideas/tips on how to optimize the speed would be very appreciated! Thank you all in advance :)


from PIL import Image
from pytesseract import pytesseract
import os
import pandas as pd
from itertools import chain

list_final = [""]
list_name = [""]
liste_anzeigename = [""]
list_raw = [""]
anzeigename = [""]
name = [""]
sort = [""]
f = r'/Users/PycharmProjects/pythonProject/images'
myconfig = r"--psm 4 --oem 3"

os.listdir(f)
for file in os.listdir(f):
    f_img = f+"/"+file
    img = Image.open(f_img)
    img = img.crop((240, 400, 800, 2400))
    img.save(f_img)

for file in os.listdir(f):
    f_img = f + "/" + file
    test = pytesseract.image_to_string(PIL.Image.open(f_img), config=myconfig)

    lines = test.split("\n")
    list_raw = [line for line in lines if line.strip() != ""]
    sort.append(list_raw)

    name = {list_raw[0], list_raw[2], list_raw[4],
            list_raw[6], list_raw[8], list_raw[10],
            list_raw[12], list_raw[14], list_raw[16]}
    list_name.append(name)

    anzeigename = {list_raw[1], list_raw[3], list_raw[5],
                   list_raw[7], list_raw[9], list_raw[11],
                   list_raw[13], list_raw[15], list_raw[17]}
    liste_anzeigename.append(anzeigename)

reihenfolge_name = list(chain.from_iterable(list_name))
index_anzeigename = list(chain.from_iterable(liste_anzeigename))
sortieren = list(chain.from_iterable(sort))

print(list_raw)
sort_name = sorted(reihenfolge_name, key=sortieren.index)
sort_anzeigename = sorted(index_anzeigename, key=sortieren.index)

final = pd.DataFrame(zip(sort_name, sort_anzeigename), columns=['name', 'anzeigename'])
print(final)

Solution

Use a multiprocessing.Pool.

Combine the code under the for-loops, and put it into a function process_file. This function should accept a single argument; the name of a file to process.

Next using listdir, create a list of files to process. Then create a Pool and use its map method to process the list;

import multiprocessing as mp

def process_file(name):
    # your code goes here.
    return anzeigename # Or watever the result should be.


if __name__ is "__main__":
    f = r'/Users/PycharmProjects/pythonProject/images'
    p = mp.Pool()
    liste_anzeigename = p.map(process_file, os.listdir(f))

This will run your code in parallel in as many cores as your CPU has. For a N-core CPU this will take approximately 1/N times the time as doing it without multiprocessing.

Note that the return value of the worker function should be pickleable; it has to be returned from the worker process to the parent process.

Answered By - Roland Smith

Answer Checked By - Mary Flores (PHPFixing Volunteer)

[FIXED] Why is in my case For loop faster vs Map, Reduce and List comprehension

October 31, 2022 for-loop, list-comprehension, mapreduce, performance, python No comments

Issue

I wrote a simple script that test the speed and this is what I found out. Actually for loop was fastest in my case. That really suprised me, check out bellow (was calculating sum of squares). Is that because it holds list in memory or is that intended? Can anyone explain this.

from functools import reduce
import datetime


def time_it(func, numbers, *args):
    start_t = datetime.datetime.now()
    for i in range(numbers):
        func(args[0])
    print (datetime.datetime.now()-start_t)

def square_sum1(numbers):
    return reduce(lambda sum, next: sum+next**2, numbers, 0)


def square_sum2(numbers):
    a = 0
    for i in numbers:
        i = i**2
        a += i
    return a

def square_sum3(numbers):
    sqrt = lambda x: x**2
    return sum(map(sqrt, numbers))

def square_sum4(numbers):
    return(sum([i**2 for i in numbers]))


time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])

0:00:00.302000 #Reduce
0:00:00.144000 #For loop
0:00:00.318000 #Map
0:00:00.290000 #List comprehension`

Update - when I tried longer loops there are the results.

time_it(square_sum1, 100, range(1000))
time_it(square_sum2, 100, range(1000))
time_it(square_sum3, 100, range(1000))
time_it(square_sum4, 100, range(1000))

0:00:00.068992
0:00:00.062955
0:00:00.069022
0:00:00.057446

Solution

Python function calls have overheads which make them relatively slow, so code that uses a simple expression will always be faster than code that wraps that expression in a function; it doesn't matter whether it's a normal def function or a lambda. For that reason, it's best to avoid map or reduce if you are going to pass them a Python function if you can do the equivalent job with a plain expression in a for loop or a comprehension or generator expression.

There are a couple of minor optimizations that will speed up some of your functions. Don't make unnecessary assignments. Eg,

def square_sum2a(numbers):
    a = 0
    for i in numbers:
        a += i ** 2
    return a

Also, i * i is quite a bit faster than i ** 2 because multiplication is faster than exponentiation.

As I mentioned in the comments, it's more efficient to pass sum a generator than a list comprehension, especially if the loop is large; it probably won't make difference with a small list of length 8, but it will be quite noticeable with large lists.

sum(i*i for i in numbers)

As Kelly Bundy mentions in the comments, the generator expression version isn't actually faster than the equivalent list comprehension. Generator expressions are more efficient than list comps in terms of RAM use, but they're not necessarily faster. And when the sequence length is small, the RAM usage differences are negligible, although there is also the time required to allocate & free the RAM used.

I just ran a few tests, with a larger data list. The list comp is still the winner (usually), but the speed differences are generally around 5-10%.

BTW, you shouldn't use sum or next as variable names as that masks the built-in functions with the same names. It won't hurt anything here, but it's still not a good idea, and it makes your code look odd in an editor with more comprehensive syntax highlighting than the SO syntax highlighter.

Here's a new version of your code that uses the timeit module. It does 3 repetitions of 10,000 loops each and sorts the results. As explained in the timeit docs, the important figure to look at in the series of the repetitions is the minimum one.

In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in.

from timeit import Timer
from functools import reduce

def square_sum1(numbers):
    return reduce(lambda total, u: total + u**2, numbers, 0)

def square_sum1a(numbers):
    return reduce(lambda total, u: total + u*u, numbers, 0)

def square_sum2(numbers):
    a = 0
    for i in numbers:
        i = i**2
        a += i
    return a

def square_sum2a(numbers):
    a = 0
    for i in numbers:
        a += i * i
    return a

def square_sum3(numbers):
    sqr = lambda x: x**2
    return sum(map(sqr, numbers))

def square_sum3a(numbers):
    sqr = lambda x: x*x
    return sum(map(sqr, numbers))

def square_sum4(numbers):
    return(sum([i**2 for i in numbers]))

def square_sum4a(numbers):
    return(sum(i*i for i in numbers))

funcs = (
    square_sum1,
    square_sum1a,
    square_sum2,
    square_sum2a,
    square_sum3,
    square_sum3a,
    square_sum4,
    square_sum4a,
)

data = [1, 2, 5, 3, 1, 2, 5, 3]

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import data, ' + fname
        cmd = fname + '(data)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:14} {1}'.format(fname, result))

loops, reps = 10000, 3
time_test(loops, reps)

output

square_sum2a   [0.03815755599862314, 0.03817843700016965, 0.038571521999983815]
square_sum4a   [0.06384095800240175, 0.06462285799716483, 0.06579178199899616]
square_sum3a   [0.07395686000018031, 0.07405958899835241, 0.07463337299850537]
square_sum1a   [0.07867341000019223, 0.0788448769999377, 0.07908406700153137]
square_sum2    [0.08781023399933474, 0.08803317899946705, 0.08846573399932822]
square_sum4    [0.10260082300010254, 0.10360279499946046, 0.10415067900248687]
square_sum3    [0.12363515399920288, 0.12434166299863136, 0.1273790529994585]
square_sum1    [0.1276186039976892, 0.13786184099808452, 0.16315817699796753]

The results were obtained on an old single core 32 bit 2GHz machine running Python 3.6.0 on Linux.

Answered By - PM 2Ring

Answer Checked By - Clifford M. (PHPFixing Volunteer)

[FIXED] How to divide data into groups in a fast way

October 31, 2022 dataframe, for-loop, performance, python, r No comments

Issue

I have a large matrix with 12 columns and approximately 1.000.000 rows. Each column represents the money spent by a client in a given month, so with the 12 columns I have information for 1 full year. Each row represents one client.

I need to divide the people into groups based on how much money they spent each month, and I consider the following intervals:

money=0
0<money<=25
25<money<=50
50<money<=75

So for example group1 would be formed by clients that spent 0$ each month for the whole year, group2 would be clients who spent between 0 and 25$ the first month, and 0$ the rest of the months, and so on. In the end I have 12 months, and 4 intervals, so I need to divide data into 4^12=16.777.216 groups (I know this yields to more groups than observations, and that many of the groups will be empty or with very few clients, but that is another problem, so far I am interested in doing this division into groups)

I am currently working in R although I could also switch to Python if required (those are the programming languages I control best), and so far my only idea has been to use nested for loops, one for loop for each month. But this is very, very slow.

So my question is: is there a faster way to do this?

Here I provide a small example with fake data, 10 observations (instead of the 1.000.000), 5 columns (instead of 12) and a simplified version of my current code for doing the grouping.

set.seed(5)
data = data.frame(id=1:10, matrix(rnorm(50), nrow=10, ncol=5))

intervals = c(-4, -1, 0, 1, 4)

id_list = c()
group_list = c()

group_idx = 0

for(idx1 in 1:(length(intervals)-1))
{
  data1 = data[(data[, 2] >= intervals[idx1]) & (data[, 2] < intervals[idx1+1]),]
  for(idx2 in 1:(length(intervals))-1)
  {
    data2 = data1[(data1[, 3] >= intervals[idx2]) & (data1[, 3] < intervals[idx2+1]),]
    for(idx3 in 1:(length(intervals)-1))
    {
      data3 = data2[(data2[, 4] >= intervals[idx3]) & (data2[, 4] < intervals[idx3+1]),]
      for(idx4 in 1:(length(intervals)-1))
      {
        data4 = data3[(data3[, 5] >= intervals[idx4]) & (data3[, 5] < intervals[idx4+1]),]
        for(idx5 in 1:(length(intervals)-1))
        {
          data5 = data4[(data4[, 6] >= intervals[idx5]) & (data4[, 6] < intervals[idx5+1]),]
          group_idx = group_idx + 1
          id_list = c(id_list, data5$id)
          group_list = c(group_list, rep(group_idx, nrow(data5)))
        }
      }
    }
  }
}

Solution

If you do need to do this--which I certainly have my doubts about--I would suggest creating a matrix with the classification for each cell of the original data, and then pasting them together to make a group label.

Doing this we can set the group labels to be human readable, which might be nice.

I would recommend simply adding this grouping column to the original data and then using dplyr or data.table to do grouped operations for your next steps, but if you really want separate data frames for each you can then split the original data based on these group labels.

## I redid your sample data to put it on the same general scale as 
## your actual data
set.seed(5)
data = data.frame(id=1:10, matrix(rnorm(50, mean = 50, sd = 20), nrow=10, ncol=5))

my_breaks = c(0, 25 * 1:3, Inf)
## you could use default labels, but this seems nicer
my_labs = c("Low", "Med", "High", "Extreme")

## classify each value from the data
grouping = vapply(
  data[-1], \(x) as.character(cut(x, breaks = my_breaks)),
  FUN.VALUE = character(nrow(data))
)

## create labels for the groups
group_labels = apply(grouping, 2, \(x) paste(1:(ncol(data) - 1), x, sep = ":", collapse = " | "))

## either add the grouping value to the original data or split the data based on groups
data$group = group_labels
result = split(data, group_labels)

result
# $`1:(25,50] | 2:(75,Inf] | 3:(0,25] | 4:(50,75] | 5:(75,Inf] | 1:(25,50] | 2:(25,50] | 3:(25,50] | 4:(25,50] | 5:(50,75]`
#   id       X1       X2       X3       X4       X5
# 1  1 33.18289 74.55261 68.01024 56.31830 81.00121
# 6  6 37.94184 47.22028 44.13036 69.03148 61.24447
# 
# $`1:(50,75] | 2:(25,50] | 3:(25,50] | 4:(25,50] | 5:(25,50] | 1:(25,50] | 2:(25,50] | 3:(0,25] | 4:(50,75] | 5:(25,50]`
#   id       X1       X2       X3       X4       X5
# 2  2 77.68719 33.96441 68.83739 72.19388 33.95154
# 7  7 40.55667 38.05374 78.37178 29.80935 32.25983
# 
# $`1:(50,75] | 2:(50,75] | 3:(75,Inf] | 4:(50,75] | 5:(50,75] | 1:(25,50] | 2:(75,Inf] | 3:(75,Inf] | 4:(25,50] | 5:(25,50]`
#   id       X1        X2       X3        X4       X5
# 3  3 24.89016 28.392148 79.35924 94.309211 48.50842
# 8  8 37.29257  6.320665 79.97548  9.990545 40.79511
# 
# $`1:(50,75] | 2:(50,75] | 3:(75,Inf] | 4:(50,75] | 5:(75,Inf] | 1:(50,75] | 2:(25,50] | 3:(0,25] | 4:(0,25] | 5:(25,50]`
#   id       X1       X2       X3       X4       X5
# 4  4 51.40286 46.84931 64.13522 74.34207 87.91336
# 9  9 44.28453 54.81635 36.85836 14.75628 35.51343
# 
# $`1:(75,Inf] | 2:(25,50] | 3:(25,50] | 4:(75,Inf] | 5:(25,50] | 1:(50,75] | 2:(25,50] | 3:(25,50] | 4:(25,50] | 5:(25,50]`
#    id       X1       X2       X3       X4       X5
# 5   5 84.22882 28.56480 66.38018 79.58444 40.86862
# 10 10 52.76216 44.81289 32.94409 47.14784 48.61578

Answered By - Gregor Thomas

Answer Checked By - Cary Denson (PHPFixing Admin)

[FIXED] How to get the return value from For loop and pass it to .body(StringBody(session => in Gatling using Scala

October 31, 2022 for-loop, gatling, performance, scala, scala-gatling No comments

Issue

How to get the return value from For loop and pass it to .body(StringBody(session => in Gatling using Scala

I have created a method with for loop to generate String Array in gatling with scala

def main(args: Array[String]): Unit = {
    var AddTest: Array[String] = Array[String]()
    for (i <- 0 to 3) {
      val TestBulk: String =
        s"""{ "name": "Perftest ${Random.alphanumeric.take(6).mkString}",
            "testID": "00000000-0000-0000-0000-000000006017",
            "typeId": "00000000-0000-0000-0000-000000011001",
            "statusId": "00000000-0000-0000-0000-000000005058"};"""
      AddTest = TestBulk.split(",")
      //  val TestBulk2: Nothing = AddTest.mkString(",").replace(';', ',')
      // println(TestBulk)
    }
  }

now I want to pass the return value to .body(StringBody(session =>

    .exec(http("PerfTest Bulk Json")
      .post("/PerfTest/bulk")
      .body(StringBody(session =>
        s"""[(Value from the for loop).mkString(",").replace(';', ',')
]""".stripMargin)).asJson

Please help me with the possibilities Please let me know if

Solution

You don't need for loop (or var or split) for this. You also do not have ; anywhere, so last replace is pointless.

    val ids = """
       "testId": "foo", 
       "typeId": "bar", 
       "statusId": "baz"
    """

    val data = (1 to 3)
     .map { _ => Random.alphanumeric.take(6).mkString }
     .map { r => s""""name": "Perftest $r"""" }
     .map { s => s"{ $s, $ids }" }
     .mkString("[", ",", "]")

   exec("foo").post("/bar").body(_ => StringBody(data)).asJson

(I added [ and ] around your generated string to make it look like valid json).

Alternatively, you probably have some library that converts maps and lists to json out-of-the box (I don't know gatling, but there must be something), a bit cleaner way to do this would be with something like this:

    val ids = Map(
       "testId" -> "foo", 
       "typeId" ->  "bar", 
       "statusId" ->  "baz"
    )

    val data = (1 to 3)
     .map { _ => Random.alphanumeric.take(6).mkString }
     .map { r => ids + ("name" -> s"Perftest $r")  }
  

   exec("foo").post("/bar").body(_ => StringBody(toJson(data))).asJson

Answered By - Dima

Answer Checked By - Robin (PHPFixing Admin)

[FIXED] How to make python for loops faster

October 31, 2022 arrays, bigdata, for-loop, performance, python No comments

Issue

I have a list of dictionaries, like this:

[{'user': '123456', 'db': 'db1', 'size': '8628'}
{'user': '123456', 'db': 'db1', 'size': '7168'}
{'user': '123456', 'db': 'db1', 'size': '38160'}
{'user': '222345', 'db': 'db3', 'size': '8628'}
{'user': '222345', 'db': 'db3', 'size': '8628'}
{'user': '222345', 'db': 'db5', 'size': '840'}
{'user': '34521', 'db': 'db6', 'size': '12288'}
{'user': '34521', 'db': 'db6', 'size': '476'}
{'user': '2345156', 'db': 'db7', 'size': '5120'}.....]

This list contains millions of entries. Each user can be found in multiple dbs, each user can have multiple entires in the same db. I want to sum up how much is the size occupied by each user, per each db. I don't want to use pandas. At the moment I do it this way:

I create 2 lists of unique users and unique dbs
Use those lists to iterate through the big list and sum up where user and db are the same

result = []
for user in unique_users:
    for db in unique_dbs:
        total_size = 0
        for i in big_list:
            if (i['user'] == user and i['db'] == db):
                total_size += float(i['size'])
        if(total_size) > 0:
            row = {}
            row['user'] = user
            row['db'] = db
            row['size'] = total_size
            result.append(row)

The problem is that this triple for loop develops into something very large (hundreds of billions of iterations) which takes forever to sum up the result. If the big_list is small, this works very well.

How should I approach this in order to keep it fast and simple? Thanks a lot!

Solution

There are two main issue with the current approach: the inefficient algorithm and the inefficient data structure.

The first is that the algorithm used is clearly inefficient as it iterates many times over the big list. There is not need to iterate over the whole list to filter a unique user and db. You can iterate over the big list once and aggregate data using a dictionary. The key of the target dictionary is simply a (user, db) tuple. The value of the dictionary is total_size. Here is an untested example:

# Aggregation part
# Note: a default dict can be used instead to make the code possibly simpler
aggregate_dict = dict()
for i in big_list:
    key = (i['user'], i['db'])
    value = float(i['size'])
    if key in aggregate_dict:
        aggregate_dict[key] += value
    else:
        aggregate_dict[key] = value

# Fast creation of `result`
result = []
for user in unique_users:
    for db in unique_dbs:
        total_size = aggregate_dict.get((user, key))
        if total_size is not None and total_size > 0:
            result.append({'user': user, 'db': db, 'size': total_size})

The other issue is the inefficient data structure: for each row, the keys are replicated while tuples can be used instead. In fact, a better data structure is to store a dictionary of (column, items) key-values where items is a list of items for the target column. This way of storing data is called a dataframe. This is roughly what Pandas uses internally (except it is a Numpy array which is even better as it is more compact and generally more efficient than a list for most operations). Using this data structure for both the input and the output should result in a significant speed up (if combined with Numpy) and a lower memory footprint.

Answered By - Jérôme Richard

Answer Checked By - Terry (PHPFixing Volunteer)

[FIXED] How do I improve this for loop in python, maybe numpy arrays?

October 31, 2022 for-loop, loops, performance, python No comments

Issue

This code works but is too slow, any ideas for improvement would be appreciated. Numpy arrays?, other?

Estatus=Vigentes[['UUID','Estatus']]
MOV_10 = MOV_09.copy()
MOV_10['Estatus'] = ""
for i in range(0, len(MOV_10[['UUID']])):
    u = MOV_10.loc[i][0]
    w = MOV_10.loc[i][1]
    tempu = Estatus.loc[Estatus['UUID'] == u]
    tempw = Estatus.loc[Estatus['UUID'] == w]
    try:
        if w == 'N/A':
            MOV_10.loc[i, 'Estatus'] = int(tempu.iloc[0, 1])
        else:
            MOV_10.loc[i, 'Estatus'] = int(tempu.iloc[0, 1]) \
                * int(tempw.iloc[0, 1])
    except IndexError:
        MOV_10.loc[i, 'Estatus'] = 0

#Estatus table, Mov_09 Table, Mov_10 Table, expected result

	UUID	Estatus
0	a	0
1	b	1
2	x	1
3	y	1

	UUID	UIID_2	estatus
0	a	x
1	b	y

	UUID	UIID_2	estatus
0	a	x	0*1
1	b	y	1*1

Solution

You should be able to do much better than your existing method. I assume your existing data structure is a pandas dataframe. If so, it's very straightforward swap to use vector operations for a lot of the calculations. This approach should also scale much better than your approach.

uuid_index = Estatus.set_index('UUID').rename(columns={'Estatus': 'val'})
out = pd.DataFrame({ 'UUID': MOV_09.UUID.values, 'UIID2': MOV_09.UIID2.values }).join(uuid_index, on=['UUID']).join(uuid_index, on=['UIID2'], rsuffix='_uiid2')
out['Estatus'] = 0
out.loc[out.val_uiid2 != 0, 'Estatus'] = out.val / out.val_uiid2

using this approach gives the following performance improvement for a dataset of with 1000 entries in the MOV_09 table:

Method	Time	Relative
Original	8.066573400050402	193.82507958031653
Swapping to joining dataframes	0.04161780001595616	1.0

I have attached the test code below:

import pandas as pd
import numpy as np
import random
import timeit

# generate test data
random.seed(1)
iterations = 10
uuid_count = 1000
mov_count = 1000

uuid_values = [(hex(i), random.random(), random.randint(0, 1)) for i in range(uuid_count)]
uuid_values.sort(key=lambda x: x[1])

def rand_uuid():
    return uuid_values[random.randint(0, uuid_count - 1)][0]

mov_values = set()
for i in range(mov_count):
    uuid = rand_uuid()
    while not ((uiid2 := rand_uuid()) and not (pair := (uuid, uiid2)) in mov_values): pass
    mov_values.add(pair)

Estatus = pd.DataFrame({
    'UUID': [v[0] for v in uuid_values],
    'Estatus': [v[2] for v in uuid_values],
})

MOV_09 = pd.DataFrame({
    'UUID': [t[0] for t in mov_values],
    'UIID2': [t[1] for t in mov_values],
})

# base method
def method0():
    MOV_10 = MOV_09.copy()
    MOV_10['Estatus'] = ""
    for i in range(0, len(MOV_10[['UUID']])):
        u = MOV_10.loc[i][0]
        w = MOV_10.loc[i][1]
        tempu = Estatus.loc[Estatus['UUID'] == u]
        tempw = Estatus.loc[Estatus['UUID'] == w]
        try:
            if w == 'N/A':
                MOV_10.loc[i, 'Estatus'] = int(tempu.iloc[0, 1])
            else:
                MOV_10.loc[i, 'Estatus'] = int(tempu.iloc[0, 1]) \
                    * int(tempw.iloc[0, 1])
        except IndexError:
            MOV_10.loc[i, 'Estatus'] = 0
    return MOV_10

# updated method
def method1():
    uuid_index = Estatus.set_index('UUID').rename(columns={'Estatus': 'val'})
    out = pd.DataFrame({ 'UUID': MOV_09.UUID.values, 'UIID2': MOV_09.UIID2.values }).join(uuid_index, on=['UUID']).join(uuid_index, on=['UIID2'], rsuffix='_uiid2')
    out['Estatus'] = 0
    out.loc[out.val_uiid2 != 0, 'Estatus'] = out.val / out.val_uiid2
    return out[['UUID', 'UIID2', 'Estatus']]

m0 = method0()
m0['Estatus'] = m0.Estatus.astype(np.int64)
pd.testing.assert_frame_equal(m0, method1())

t0 = timeit.timeit(lambda: method0(), number=iterations)
t1 = timeit.timeit(lambda: method1(), number=iterations)

tmin = min((t0, t1))

print(f'| Method                                  | Time | Relative      |')
print(f'|------------------                       |----------------------|')
print(f'| Original                                | {t0} | {t0 / tmin}   |')
print(f'| Swap to joining dataframes              | {t1} | {t1 / tmin}   |')

Answered By - John M.

Answer Checked By - Clifford M. (PHPFixing Volunteer)

[FIXED] Why is a `for` loop so much faster to count True values?

October 31, 2022 for-loop, performance, python, python-3.x, sum No comments

Issue

I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which turned out to be the fastest, so far):

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

In addition I looked at using a list comprehension and list.count:

def count_even_digits_spyr03_list(n):
    return [c in "02468" for c in str(n)].count(True)

The first two functions are essentially the same, except that the first one uses an explicit counting loop, while the second one uses the built-in sum. I would have expected the second one to be faster (based on e.g. this answer), and it is what I would have recommended turning the former into if asked for a review. But, it turns out it is the other way around. Testing it with some random numbers with increasing number of digits (so the chance that any single digit is even is about 50%) I get the following timings:

Why is the manual for loop so much faster? It's almost a factor two faster than using sum. And since the built-in sum should be about five times faster than manually summing a list (as per the linked answer), it means that it is actually ten times faster! Is the saving from only having to add one to the counter for half the values, because the other half gets discarded, enough to explain this difference?

Using an if as a filter like so:

def count_even_digits_spyr03_sum2(n):
    return sum(1 for c in str(n) if c in "02468")

Improves the timing only to the same level as the list comprehension.

When extending the timings to larger numbers, and normalizing to the for loop timing, they asymptotically converge for very large numbers (>10k digits), probably due to the time str(n) takes:

Solution

sum is quite fast, but sum isn't the cause of the slowdown. Three primary factors contribute to the slowdown:

The use of a generator expression causes overhead for constantly pausing and resuming the generator.
Your generator version adds unconditionally instead of only when the digit is even. This is more expensive when the digit is odd.
Adding booleans instead of ints prevents sum from using its integer fast path.

Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.

If we replace the genexp with a list comprehension:

In [66]: def f1(x):
   ....:     return sum(c in '02468' for c in str(x))
   ....: 
In [67]: def f2(x):
   ....:     return sum([c in '02468' for c in str(x)])
   ....: 
In [68]: x = int('1234567890'*50)
In [69]: %timeit f1(x)
10000 loops, best of 5: 52.2 µs per loop
In [70]: %timeit f2(x)
10000 loops, best of 5: 40.5 µs per loop

we see an immediate speedup, at the cost of wasting a bunch of memory on a list.

If you look at your genexp version:

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

you'll see it has no if. It just throws booleans into sum. In constrast, your loop:

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

only adds anything if the digit is even.

If we change the f2 defined earlier to also incorporate an if, we see another speedup:

In [71]: def f3(x):
   ....:     return sum([True for c in str(x) if c in '02468'])
   ....: 
In [72]: %timeit f3(x)
10000 loops, best of 5: 34.9 µs per loop

f1, identical to your original code, took 52.2 µs, and f2, with just the list comprehension change, took 40.5 µs.

It probably looked pretty awkward using True instead of 1 in f3. I wrote True there because changing it to 1 activates one final speedup. sum has a fast path for integers, but the fast path only activates for objects whose type is exactly int. bool doesn't count. This is the line that checks that items are of type int:

if (PyLong_CheckExact(item)) {

Once we make the final change, changing True to 1:

In [73]: def f4(x):
   ....:     return sum([1 for c in str(x) if c in '02468'])
   ....: 
In [74]: %timeit f4(x)
10000 loops, best of 5: 33.3 µs per loop

we see one last small speedup.

So after all that, do we beat the explicit loop?

In [75]: def explicit_loop(x):
   ....:     count = 0
   ....:     for c in str(x):
   ....:         if c in '02468':
   ....:             count += 1
   ....:     return count
   ....: 
In [76]: %timeit explicit_loop(x)
10000 loops, best of 5: 32.7 µs per loop

Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum approach, we don't have any good way to get rid of the list. The explicit loop wins.

Can we go further anyway? Well, we've been trying to bring the sum closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len instead of sum:

def f5(x):
    return len([1 for c in str(x) if c in '02468'])

Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count. str.count iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:

def f6(x):
    s = str(x)
    return sum(s.count(c) for c in '02468')

Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:

>>> import timeit
>>> def f(x):
...     return sum([1 for c in str(x) if c in '02468'])
... 
>>> def g(x):
...     return len([1 for c in str(x) if c in '02468'])
... 
>>> def h(x):
...     s = str(x)
...     return sum(s.count(c) for c in '02468')
... 
>>> x = int('1234567890'*50)
>>> timeit.timeit(lambda: f(x), number=10000)
0.331528635986615
>>> timeit.timeit(lambda: g(x), number=10000)
0.30292080697836354
>>> timeit.timeit(lambda: h(x), number=10000)
0.15950968803372234
>>> def explicit_loop(x):
...     count = 0
...     for c in str(x):
...         if c in '02468':
...             count += 1
...     return count
... 
>>> timeit.timeit(lambda: explicit_loop(x), number=10000)
0.3305045129964128

Answered By - user2357112

Answer Checked By - Cary Denson (PHPFixing Admin)

[FIXED] How to return an empty array Java

October 28, 2022 arrays, for-loop, is-empty No comments

Issue

Hi, how are you? =) I'm new to Java, currently, I'm learning arrays and loops, and I'm really struggling with them at the moment.

Here is my homework: Write a public method int[] findMinMaxPrices(int[] price). It takes an array of prices and returns a new array.

Empty, if the array is empty. Returns only one element, if the maximum and minimum prices in the prices array are the same. Returns only two elements if the price array contains both the minimum and maximum prices. The minimum price should go first, then the maximum.

Only for loop can be used.

I would use some help here: How to return an empty array in this case? I made it many times in draft, unfortunately, it doesn't work here.

How can I use a for loop here?

Can you give me some hints or advice?

Thank you in advance!)

import java.util.Arrays;

public class QuadraticEquationSolver {
    
        public int[] findMinMaxPrices(int[] prices) { // I know it's a mess, but I'm just learning =)
        
        Arrays.sort(prices);
        
        int empty [] = {};
        int first [] = Arrays.copyOf(prices, 1);
        
        int a = prices[0];
        int b = prices[prices.length-1];
        
        int second [] = new int[] {a,b};        
        
            if(prices[0] == prices[prices.length-1]) {
                    return first;
            }   
            else if(prices[0] < prices[prices.length-1]) {
                    return second;
                        
            }else{ 
                return empty;
                //return new int[0]; I tried to use this here, didn't work =(
            }       
    }
   
    public static void main(String[] args) {
        QuadraticEquationSolver shop = new QuadraticEquationSolver();

        //Should be [50, 1500]
        int[] prices = new int[] {100, 1500, 300, 50};
        int[] minMax = shop.findMinMaxPrices(prices);
        System.out.println(Arrays.toString(minMax));

//findMaxPrices(new int[] {10, 50, 3, 1550}), returns [3, 1550]
//findMaxPrices(new int[] {}), returns []
//findMaxPrices(new int[] {50, 50}), returns [50]


    }
}

Solution

You return an empty array the same way you create an empty array when you call findMaxPrices(new int[]{}), just use new int[]{}.

For your requirement you don't need to sort the array, because you can store the minimum and maximum value in a local variable and just add minimum before maximum to the array that you return.

One essential thing you are missing is checking for the array length of prices. Using prices.length you get how many values are in the array. Always check the length of an array before you try to access it using an index, otherwise you risk getting an IndexOutOfBounds exception. Using this you can immediately return when prices.length == 0 because then there are no values in the array and you need to return an empty array. If prices.length == 1 we only have one value in the array. This value must be minimum and maximum so we can return an array containing this value. In all other cases we can use a for loop to loop over all elements in the array. If we find a value that is smaller/ greater than the current minimum/ maximum we set that as new maximum. For this to work we need to initialize minimum and maximum to the biggest/ smallest possible value first. Java has the constants Integer.MAX_VALUE and Integer.MIN_VALUE for this kind of thing. Last but not least we need to check if maximum and minimum are the same. In that case we only return one element, otherwise we return both, but minimum before maximum.

public class Application {
    public static void main(String[] args) {
        System.out.println(Arrays.toString(findMinMaxPrices(new int[]{100, 1500, 300, 50})));
        System.out.println(Arrays.toString(findMinMaxPrices(new int[]{})));
        System.out.println(Arrays.toString(findMinMaxPrices(new int[]{50, 50})));
    }

    public static int[] findMinMaxPrices(int[] prices) {
        // we can simply check the array length. If it is zero we return an empty array
        if(prices.length == 0) return new int[]{};
        else if(prices.length == 1) return new int[prices[0]]; // if we only have one element that one element must be the minimum and maximum value
        // array length is not zero or one -> we need to find minimum and maximum
        int min = Integer.MAX_VALUE; // set to maximal possible int value
        int max = Integer.MIN_VALUE; // set to minimal possible int value
        for (int i = 0; i < prices.length; i++) {
            int currentPrice = prices[i];
            if(currentPrice < min) min = currentPrice;
            if(currentPrice > max) max = currentPrice;
        }
        // now we have minimum and a maxium value
        // if they are the same only return one value
        if(min == max) return new int[]{max};
        // otherwise return both, but minumum first
        return new int[]{min, max};
    }
}

Expected output:

[50, 1500]
[]
[50]

Answered By - Mushroomator

Answer Checked By - Marilyn (PHPFixing Volunteer)

[FIXED] How do I iterate over every value in a key where the value is the instance of a class?

October 26, 2022 dictionary, for-loop, oop, python No comments

Issue

I want to iterate over every instance i stored as a value to a number stored as a key in a dictionary. Where if I were to make an account named jason, it would be assigned 1, then if I were to make a new one, it would be assigned 2. That part is already done but the iteration part is very confusing for me. Why does it only go through the first key value pair in the dictionary?

Ps: I am new to oop this is my first oop thing where i did not follow any guides so that id would actually learn. Thank you <3


class Bank:
  serialnum = 0
  username = ""
  email = ""
  password = ""
  bal = 0
  
  
  def __init__(self,count):
    self.serialnum = count
    self.username = input("What's your username? \n")
    self.email = input("What's your email? \n")
    self.password = input("What's your password \n")
    self.bal = input("How much balance do you have \n")

    def withdraw(money):
      self.bal= bal-money 
      print (bal)
      

global count
count = 0 #counts and is the serial num
accounts = {} #accounts dictionary

def accountcreate(): #gets called when account does not exist
  global count
  while True:
    serial = int(count)
    account = Bank(count)
    print("The serial is {}".format(count))
    count += 1
    accounts[serial] = account
    print("Your account has been created, please use the username registered. ")
    break
  accountaccess()
    
    
def accountverify(name):#if accountverify returns false, prompts the accountcreate function
  username = ""
  start = 0
  balance = 0
  if 0 in accounts: #essentially means if the first account has been made
      for key in accounts: #loops through accounts in accounts dictionary
         #sets the keys gotten and sets the un to the username attribute of every key
      
        if hasattr((accounts[key]),"username") == name:
          print("logged in as ", name, "Password is \n", 
                (getattr((accounts[key]), "password")), 
                "Account balance is ", getattr((accounts[key]), "bal"))

          action = input("What do you want to do? \n -Withdraw \n -Deposit \n -Transfer \n -Make another account \n")
          if "make" in action:
            print("Making another account... \n Please enter your credentials")
            makeaccount = accountcreate()
          
       
        else: #if username does not exist
          print("First item in list is ",(getattr((accounts[key]),"username")))
          print(accounts)
          accountask = input("Account does not exist, make a new account? Yes or No \n").lower()
          if accountask == "yes":
            makeAccount = accountcreate()
       
  else: #makes first account
    ask1 = (input("Account does not exist, would you like to make an account? Yes or No \n")).lower()
    if ask1 == "yes":
      makeAccount =  accountcreate()

def accountaccess(): #accesses account
  ask = (input("Do you want to access an account? Yes, or no. "))
  if ask == "yes":
    getname = (input("What is your username? ")).lower()
    ver = accountverify(getname) 
    loop = False
  
    
loop = True
while loop == True: #mainloop
  ask = (input("Do you want to access an account? Yes, or no. \n")).lower()
  if ask == "yes":
    getname = (input("What is your username? ")).lower()
    ver = accountverify(getname) 
    loop = False

The replit link

It would also be helpful to know how to store the username as the name of the value since what is shown there is incredibly cryptic

In this image, every new username registered is a new instance of the Bank class. But the for loop only goes on the first one

Solution

The part of your code that is causing the issue is

if hasattr((accounts[key]),"username") == name:
          print("logged in as ", name, "Password is \n", 
                (getattr((accounts[key]), "password")), 
                "Account balance is ", getattr((accounts[key]), "bal"))

The return from hasattr is a boolean and cannot be compared to name.

Try changing it too

if hasattr(accounts[key],"username"):
    if accounts[key].username == name:
        ....

Also your use of getattr() is incorrect and unnecessary since you can simply access those attributes directly.

For example:

account = accounts[key]
print(account.username)
print(account.password)
print(account.bal)

with all of that in mind your accountsverify function should look more like this:

def accountverify(name):
  start = 0
  balance = 0
  if 0 in accounts:
      for key in accounts:  
        account = accounts[key]
        if account.username == name:
          print(f"logged in as {name} Password is \n {account.password} \n Account balance is {account.bal}")

          action = input("What do you want to do? \n -Withdraw \n -Deposit \n -Transfer \n -Make another account \n")
          if "make" in action:
            print("Making another account... \n Please enter your credentials")
            makeaccount = accountcreate()
          
       
        else: #if username does not exist
          print("First item in list is ",account.username)
          print(accounts)
          accountask = input("Account does not exist, make a new account? Yes or No \n").lower()
          if accountask == "yes":
            makeAccount = accountcreate()

As far as making the Bank class print the accounts name you just need to overwrite the __str__ method.

class Bank:
    def __init__(self, name):
        self.name = name
    
    def __str__(self):
        return self.name

Answered By - Alexander

Answer Checked By - Candace Johnson (PHPFixing Volunteer)

[FIXED] How can I find number of iterations from this function in R

October 18, 2022 for-loop, function, iteration, r No comments

Issue

Maybe you can help me with this code. I got only 4 values, which are right and needed, but also I need to make a table with all iterations, till I get those right values, but I don't know how to get number of those iterations from this function I have: example provided here enter image description here

E <- matrix(c(1, 0, 0, 0,
              0, 1, 0, 0, 
              0, 0, 1, 0, 
              0, 0, 0, 1), byrow = TRUE, nrow = 4)

B <- matrix(c(1.2, 2, 3, 1.5), nrow = 4, byrow = TRUE)

D <- matrix(c(0.18, 0.57, 0.38, 0.42,
              0.57, 0.95, 0.70, 0.44,
              0.38, 0.70, 0.37, 0.18,
              0.42, 0.44, 0.18, 0.40), byrow = TRUE, nrow = 4)

#my matrix
A <- D + 0.1*(8+3)*E
A

# Define a convenience function matching `numpy.inner`
inner <- function(a, b) as.numeric(as.numeric(a) %*% as.numeric(b))

conjugate_grad <- function(A, b) {    
  n <- dim(A)[1]
  x <- numeric(n)
  z <- b - A %*% x
  p <- z
  z_old <- inner(z, z)
  for (i in 1:60) {
    teta <- z_old / inner(p, (A %*% p))
    x <- x + teta * p
    z <- z - teta * A %*% p
    z_new <- inner(z, z)
    if (sqrt(z_new) < 0.001)
      break
    beta <- z_new / z_old
    p <- z + beta * p
    z_old <- z_new
  }
  return(x) 
}

conjugate_grad(A,B)

Thank you in advance!

Solution

I think this modification of your code gives you what you want:

conjugate_grad <- function(A, b) {    
  n <- dim(A)[1]
  x <- numeric(n)
  history <- NULL   # Insert this line 
  z <- b - A %*% x
  p <- z
  z_old <- inner(z, z)
  for (i in 1:60) {
    teta <- z_old / inner(p, (A %*% p))
    x <- x + teta * p
    history <- cbind(history, x)   # Insert this line
    z <- z - teta * A %*% p
    z_new <- inner(z, z)
    if (sqrt(z_new) < 0.001)
      break
    beta <- z_new / z_old
    p <- z + beta * p
    z_old <- z_new
  }
  return(history)      # return history instead of x
}

conjugate_grad(A,B)
#           [,1]      [,2]       [,3]       [,4]
# [1,] 0.4326431 0.1288703 0.09609509 0.08061088
# [2,] 0.7210718 0.1695202 0.15988971 0.16900513
# [3,] 1.0816077 1.8689058 1.85211220 1.85311415
# [4,] 0.5408039 0.6284172 0.70703113 0.70548042

You have to accumulate the results as you compute them in an object, here called history.

Answered By - dcarlson

Answer Checked By - David Goodson (PHPFixing Volunteer)

[FIXED] How do I set a level based on the required XP using an int and an xp table array

October 17, 2022 arrays, c#, for-loop, integer No comments

Issue

Say I have an array with the require xp for each level so for instance

0 == Level 1
83 == Level 2
174 == Level 3
276 == Level 4
388 == Level 5
512 == Level 6
650 == Level 7
801 == Level 8
969 == Level 9
...

What's the most efficient way to increment an int to the corresponding level?

So I'd have an int like this

ìnt currentLevel = 1;

and then then say I added 25 xp, that would still put me at level 1, but then I added 800 xp which would set me at 825xp which then would put me at level 8

This was my solution but it keeps running at 98 and won't stop since it never levels up to 99. Not sure how to fix it.

internal class Program
{
    private const int XPGain = 500;
    static int currentLevel = 1;
    static int currentXP = 0;

    static void Main(string[] args)
    {
        string[] levels = File.ReadAllLines("XPTable.txt");

        while (currentLevel < 99)
        {
            currentXP += XPGain;
            for (int i = currentLevel; i < levels.Length; i++)
            {
                if (currentXP >= Convert.ToInt32(levels[i]))
                {
                    //Set appropriate level
                    for (int l = 1; l < levels.Length + 1; l++)
                    {
                        if (currentXP <= Convert.ToInt32(levels[l - 1]))
                        {
                            currentLevel = l - 1;
                            Console.WriteLine("Level up!");
                            break;
                        }
                    }
                    break;
                }
            }
            Console.WriteLine($"[Action] - Current Level: {currentLevel}. Added XP: {XPGain} - Current XP: {currentXP}");
        }
    }
}

Here's the XP table I'm using. https://www.klgrth.io/paste/g79ht

Solution

There are a few improvements you can make

internal class Program
{
    private const int XPGain = 500;
    static int currentLevel = 1;
    static int currentXP = 0;

    static void Main(string[] args)
    {
        while (currentLevel < 99)
        {
            int oldLevel = currentLevel;
            currentXP += XPGain;
            // 1 dont need to increment this one at a time, just set it when you know the value
            currentLevel = CalculateLevel(currentXP);
            for (int i = oldLevel; i < currentLevel; i++)
            {
                Console.WriteLine("Level up!");
            }
            Console.WriteLine($"[Action] - Current Level: {currentLevel}. Added XP: {XPGain} - Current XP: {currentXP}");
        }
    }

    static int CalculateLevel(int currentXP)
    {
        string[] levels = File.ReadAllLines("D:\\Workspace\\FormTest\\ConsoleApp1\\XPTable.txt");

        // 2 just go through each line and figure out the level.
        for (int i = 1; i < levels.Length; i++) // level 1 to 98
        {
            // maxXP for level 1 is on line 2, level 2 is on line 3 and so on...
            int maxXPForLevel = int.Parse(levels[i]);
            if (currentXP < maxXPForLevel)
                return i;
        }
        return levels.Length; // 3. im greater than the highest values
    }
}

You can programmatically get currentLevel from currentXP you dont need to use this accumulator logic and instead compute this on demand.
You can use a single for loop to figure out level ups and new level gains.
Your solution doesnt get to lvl 99 because in your code you need to check between the lower bound and the upper bound for that level, and level 99 doesn't have an upper bound. I think you tweaked the loop indexes to get it to work until level 98 but you need to somewhere write the behavior at lvl 99.

Answered By - Slack Groverglow

Answer Checked By - Gilberto Lyons (PHPFixing Admin)

[FIXED] how can I make a loop print out missing elements in my int array?

October 17, 2022 arrays, c, comparison, for-loop, integer No comments

Issue

I've been trying to solve an assignment where you:

Enter the size N of an array of ints
Enter the max value M of the ints that you want to populate the array with
Enter N values that are of a value between 1 and M, into a second array.
compare these two and print out the missing numbers...

Like this:

Size of array? 10        // => N = 10
Max value in array? 8    // => M = 8

Please enter 10 values between 1 and 8:
4 1 3 1 7 3 4 4 6 1

Missing values: 2 5 8

for some reason, my for loop just prints out all the numbers between 1 and M instead, no matter what I try... What am I missing??

code:

#include <stdio.h>

int main(void)
{
    int aSize, mValue;



    printf("Size of array? ");
    scanf(" %d", &aSize);
    printf("Max value in array: ");
    scanf(" %d", &mValue);

    int table[aSize];
    int values[mValue];

    for (int i = 0; i < aSize; i++)
    {
        table[i] = i+1;
        if ((i+1) > mValue)
        {
            table[i] = 0;
        }
    }

    printf("Please enter %d values between 1 and %d:\n", aSize, mValue);

    for (int i = 0; i < mValue; i++)
    {
        scanf(" %d", &values[i]);
    }

    for(int i = 0; i < aSize; i++)
    {
        for (int j = 0; j < mValue; j++)
        {
            if(table[i] != values[j] && table[i] != 0)
            {
                printf("%d ", table[i]);
                break;
            }
        }
    }
}

Solution

#include <stdio.h>
   int main()
   {
    int aSize, mValue;
    printf("Size of array? ");
    scanf(" %d", &aSize);
    printf("Max value in array: ");
    scanf(" %d", &mValue);

    int table[aSize];
    int values[aSize]; // not 'mSize' because it is just max value not size of array

    for (int i = 0; i < aSize; i++)
    {
        table[i] = i+1;
        if ((i+1) > mValue)
        {
            table[i] = 0;
        }
    }

    printf("Please enter %d values between 1 and %d:\n", aSize, mValue);

    for (int i = 0; i < aSize; i++)
    {
        scanf(" %d", &values[i]);
    }

    for(int i = 0; i < aSize; i++)
    {
        int flag=0;
        for (int j = 0; j < aSize; j++)
        {
            if(table[i] == 0 || table[i] == values[j]) // numbers in common or zero
            {
                flag=1;
                break;
            }
        }
        if(flag == 0) printf("%d",table[i]); // missing numbers
    }
}

Answered By - Prabhand Reddy

Answer Checked By - Terry (PHPFixing Volunteer)

Friday, December 16, 2022

Issue

Solution

Wednesday, December 14, 2022

Issue

Solution

Issue

Solution

Tuesday, November 22, 2022

Issue

Solution

Saturday, November 5, 2022

Issue

Solution

Friday, November 4, 2022

Issue

Solution

Thursday, November 3, 2022

Issue

Solution

Wednesday, November 2, 2022

Issue

Solution

Tuesday, November 1, 2022

Issue

Solution

Monday, October 31, 2022

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Issue

Solution

Friday, October 28, 2022

Issue

Solution

Wednesday, October 26, 2022

Issue

Solution

Tuesday, October 18, 2022

Issue

Solution

Monday, October 17, 2022

Issue

Solution

Issue

Solution

Total Pageviews

Featured Post

Subscribe To