PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, November 1, 2022

[FIXED] How to map one column to multiple binary columns in Spark?

 November 01, 2022     apache-spark, binary, multiple-columns, pivot, pyspark     No comments   

Issue

This might be related to pivoting, but I am not sure. Basically, what I want to achieve is the following binary transformation:

+-----------------+
| C1     | C2     |
+--------|--------+
| A      | xxx    |
| B      | yyy    |
| A      | yyy    |
| B      | www    |
| B      | xxx    |
| A      | zzz    |
| A      | xxx    |
| A      | yyy    |
+-----------------+

to

+--------------------------------------------+
| C1     | www    | xxx    | yyy    | zzz    |
+--------|--------|--------|--------|--------|
| A      |   0    |   1    |   1    |   1    |
| B      |   1    |   1    |   1    |   0    |
+--------------------------------------------+

How does one attain this in PySpark? Presence is 1 and absence is 0.


Solution

Yes, you will need pivot. But for aggregation, in your case it's best just to use F.first(F.lit(1)) and when you get nulls, just replace them with 0 using df.fillna(0).

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('A', 'xxx'),
     ('B', 'yyy'),
     ('A', 'yyy'),
     ('B', 'www'),
     ('B', 'xxx'),
     ('A', 'zzz'),
     ('A', 'xxx'),
     ('A', 'yyy')],
    ['C1', 'C2'])

df = df.groupBy('C1').pivot('C2').agg(F.first(F.lit(1)))
df = df.fillna(0)

df.show()
# +---+---+---+---+---+
# | C1|www|xxx|yyy|zzz|
# +---+---+---+---+---+
# |  B|  1|  1|  1|  0|
# |  A|  0|  1|  1|  1|
# +---+---+---+---+---+


Answered By - ZygD
Answer Checked By - Robin (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing