PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Thursday, May 12, 2022

[FIXED] How to append Psypark FOR loop output into single dataframe (spark.sql)

 May 12, 2022     apache-spark-sql, append, dataframe, pyspark, python     No comments   

Issue

I have a pyspark for loop that uses a "customer" variable. I want to append the output of each loop iteration so that the final dataframe has all the rows output by the for loop. The code works except for the append portion. I have also tried using "union" but without success.

df = ""
df_output = []
customer=""

for customer in ['customer_abc', 'customer_xyz']:
  df = spark.sql(f"""
  SELECT sale, sum(amt) as total_sales
  FROM {customer}.salestable
  GROUP BY sale
  """).withColumn('Customer',lit(customer))
  df_output.append(df)
  
display(df_output)

Solution

With your approach you can use a reduce :

from functools import reduce
unioned_df = reduce(lambda x,y: x.union(y) ,df_output)

Or instead of initiating df_output as a list, you can initiate it as a spark dataframe. and then keep unioning as mentioned by @Luiz



Answered By - anky
Answer Checked By - Gilberto Lyons (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing