[SOLVED] How to subtract 2 string columns in a Pyspark dataframe

Issue

Below is the scenario:
Consider a Pyspark dataframe having 2 columns like below:

{
fullname: facebook,
lastname: book
}

I want a new column firstname by subracting fullname and lastname like below

{
firstname:face,
lastname:book
}

Solution

df  = spark.createDataFrame(
  [
('facebook','book')
  ], ['fullname','lastname'])

df.withColumn('firstname', F.expr("regexp_replace(fullname,lastname,'')")).show()
+--------+--------+---------+
|fullname|lastname|firstname|
+--------+--------+---------+
|facebook|    book|     face|
+--------+--------+---------+

Answered By – Luiz Viola

Answer Checked By – Marilyn (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *