PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0
Showing posts with label azure-databricks. Show all posts
Showing posts with label azure-databricks. Show all posts

Sunday, October 23, 2022

[FIXED] How to update two columns in PySpark satisfying the same condition?

 October 23, 2022     apache-spark, apache-spark-sql, azure-databricks, pyspark, sql-update     No comments   

Issue

I have a table in which there are 4 columns: "ID", "FLAG_A", "FLAG_B", "FLAG_C". This is the SQL query I want to transform into PySpark, there are two conditions which I need to satisfy for both columns "FLAG_A" and "FLAG_B". How to do it in PySpark?

UPDATE STATUS_TABLE SET STATUS_TABLE.[FLAG_A] = "JAVA", 
STATUS_TABLE.FLAG_B = "PYTHON"
WHERE (((STATUS_TABLE.[FLAG_A])="PROFESSIONAL_CODERS") AND 
((STATUS_TABLE.FLAG_C) Is Null)); 

Is it possible to code this in a single statement by giving two conditions and satisfying the "FLAG_A" and "FLAG_B" columns in PySpark?


Solution

I can't think of any way to rewrite this into a single statement which you thought of. I tried writing the UPDATE query inside Spark, but it seems UPDATE is not working:

: java.lang.UnsupportedOperationException: UPDATE TABLE is not supported temporarily.

The following does exactly the same as your UPDATE query:

Input:

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [(1, 'PROFESSIONAL_CODERS', 'X', None),
     (2, 'KEEP', 'KEEP', 'KEEP')],
    ['ID', 'FLAG_A', 'FLAG_B', 'FLAG_C'])

Script:

cond = (F.col('FLAG_A') == 'PROFESSIONAL_CODERS') & F.isnull('FLAG_C')
df = df.withColumn('FLAG_B', F.when(cond, 'PYTHON').otherwise(F.col('FLAG_B')))
df = df.withColumn('FLAG_A', F.when(cond, 'JAVA').otherwise(F.col('FLAG_A')))

df.show()
# +---+------+------+------+
# | ID|FLAG_A|FLAG_B|FLAG_C|
# +---+------+------+------+
# |  1|  JAVA|PYTHON|  null|
# |  2|  KEEP|  KEEP|  KEEP|
# +---+------+------+------+


Answered By - ZygD
Answer Checked By - Mary Flores (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

[FIXED] How to translate MS Access UPDATE query which uses inner join into PySpark?

 October 23, 2022     apache-spark, azure-databricks, join, pyspark, sql-update     No comments   

Issue

I have two MS Access SQL queries which I want to convert into PySpark. The queries look like this (we have two tables Employee and Department):

UPDATE EMPLOYEE INNER JOIN [DEPARTMENT] ON
EMPLOYEE.STATEPROVINCE = [DEPARTMENT].[STATE_LEVEL] 
SET EMPLOYEE.STATEPROVINCE = [DEPARTMENT]![STATE_ABBREVIATION];
UPDATE EMPLOYEE INNER JOIN [DEPARTMENT] ON
EMPLOYEE.STATEPROVINCE = [DEPARTMENT].[STATE_LEVEL] 
SET EMPLOYEE.MARKET = [DEPARTMENT]![MARKET];

Solution

Test dataframes:

from pyspark.sql import functions as F

df_emp = spark.createDataFrame([(1, 'a'), (2, 'bb')], ['EMPLOYEE', 'STATEPROVINCE'])
df_emp.show()
# +--------+-------------+
# |EMPLOYEE|STATEPROVINCE|
# +--------+-------------+
# |       1|            a|
# |       2|           bb|
# +--------+-------------+

df_dept = spark.createDataFrame([('bb', 'b')], ['STATE_LEVEL', 'STATE_ABBREVIATION'])
df_dept.show()
# +-----------+------------------+
# |STATE_LEVEL|STATE_ABBREVIATION|
# +-----------+------------------+
# |         bb|                 b|
# +-----------+------------------+

Running your SQL query in Microsoft Access does the following:

enter image description here

In PySpark, you can get it like this:

df = (df_emp.alias('a')
    .join(df_dept.alias('b'), df_emp.STATEPROVINCE == df_dept.STATE_LEVEL, 'left')
    .select(
        *[c for c in df_emp.columns if c != 'STATEPROVINCE'],
        F.coalesce('b.STATE_ABBREVIATION', 'a.STATEPROVINCE').alias('STATEPROVINCE')
    )
)
df.show()
# +--------+-------------+
# |EMPLOYEE|STATEPROVINCE|
# +--------+-------------+
# |       1|            a|
# |       2|            b|
# +--------+-------------+

First you do a left join. Then, select.

The select has 2 parts.

  • First, you select everything from df_emp except for "STATEPROVINCE".
  • Then, for the new "STATEPROVINCE", you select "STATE_ABBREVIATION" from df_dept, but in case it's null (i.e. not existent in df_dept), you take "STATEPROVINCE" from df_emp.

For your second query, you only need to change values in the select statement:

df = (df_emp.alias('a')
    .join(df_dept.alias('b'), df_emp.STATEPROVINCE == df_dept.STATE_LEVEL, 'left')
    .select(
        *[c for c in df_emp.columns if c != 'MARKET'],
        F.coalesce('b.MARKET', 'a.MARKET').alias('MARKET')
    )
)


Answered By - ZygD
Answer Checked By - Dawn Plyler (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Monday, August 1, 2022

[FIXED] How to connect and read files from Azure FTP folder using Python in Azure Databricks?

 August 01, 2022     azure-databricks, databricks, ftp, python     No comments   

Issue

I need to use Python in Azure Databricks to do the following:

  1. Merge multiple text files stored in Azure FTP folder (\VMAZR1\ABCDFiles). Here, 'VMAZR1' is the server name and 'ABCDFiles' is the folder name
  2. Store the merged file in the same location with new name

I can write the code to do the merging but I need assistance with connecting to Azure FTP folder and reading text file names only. Can someone please assist?


Solution

You can rely on this answer. Just change the method of storing to retrieving, e.g., retrbinary, or retrlines as well as mlsd to get a list of file names.



Answered By - Phuri Chalermkiatsakul
Answer Checked By - Willingham (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Older Posts Home
View mobile version

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
All Comments
Atom
All Comments

Copyright © PHPFixing