Issue
Below is the scenario:
Consider a Pyspark dataframe having 2 columns like below:
{
fullname: facebook,
lastname: book
}
I want a new column firstname by subracting fullname and lastname like below
{
firstname:face,
lastname:book
}
Solution
df = spark.createDataFrame(
[
('facebook','book')
], ['fullname','lastname'])
df.withColumn('firstname', F.expr("regexp_replace(fullname,lastname,'')")).show()
+--------+--------+---------+
|fullname|lastname|firstname|
+--------+--------+---------+
|facebook| book| face|
+--------+--------+---------+
Answered By – Luiz Viola
Answer Checked By – Marilyn (BugsFixing Volunteer)