Issue
I have this code which generates a list by means of a for, I look for the output of the println to pass it to a dataframe to be able to manipulate the resulting damage, in Scala.
for (l <- ListArchive){
val LastModified: (String, String) =(l,getLastModifiedLCO(l))
println(LastModified)
}
Output println (LCO_2014-12-09_3.XML.gz,Tue Dec 09 07:48:30 UTC 2014)
(LCO_2014-12-09_1.XML.gz,Tue Dec 09 07:48:30 UTC 2014)
Solution
Rewrite it to generate a list/sequence, and then turn into a DataFrame. Something like this:
import spark.implicits._
val df = ListArchive.map(l => (l, getLastModifiedLCO(l)))
.toDF("col1Name", "col2Name")
If the list is very big, then you can try to turn it into an RDD via parallelize, and then apply similar map to it, but it will run in the distributed manner.
Answered By - Alex Ott Answer Checked By - David Marino (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.