Issue
I have a pyspark for loop that uses a "customer" variable. I want to append the output of each loop iteration so that the final dataframe has all the rows output by the for loop. The code works except for the append portion. I have also tried using "union" but without success.
df = ""
df_output = []
customer=""
for customer in ['customer_abc', 'customer_xyz']:
df = spark.sql(f"""
SELECT sale, sum(amt) as total_sales
FROM {customer}.salestable
GROUP BY sale
""").withColumn('Customer',lit(customer))
df_output.append(df)
display(df_output)
Solution
With your approach you can use a reduce :
from functools import reduce
unioned_df = reduce(lambda x,y: x.union(y) ,df_output)
Or instead of initiating df_output
as a list
, you can initiate it as a spark dataframe
. and then keep unioning as mentioned by @Luiz
Answered By - anky Answer Checked By - Gilberto Lyons (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.