. Advertisement .
..3..
. Advertisement .
..4..
You need to convert string to date in Pyspark using the to_date() function when you have a DataFrame with a string column.
Here we instruct you step by step to do so.
Convert String to Date in Pyspark with to_date() Function
General Formular
As the name suggests, the to_date(column,format) function in Pyspark helps convert the String format in a data model into the Date format, for example, YYYY-MM-DD.
The function accordingly determines string values in the column as the input function and also date patterns. Then, it takes the date strings in the column as the first argument and the patterns as the second argument.
The syntax looks like follows:
to_date(col("string_column_name"),"YYYY-MM-DD")
When using the above to_date(column,format) function for a practical DataFrame, you might need to import the functions required for the string to date conversion.
from pyspark.sql.functions import *
df2 = df1.select(col("column_name"),to_date(col("column_name"),"YYYY-MM-DD").alias("to_date"))
The corresponding output:
df2.show()
An explanation for the syntaxes:
- df1: This is the data model or DataFrame you want to convert string to date in Pyspark.
- df2: The new data model was created after converting the string to date.
- to_date: The function used for the conversion.
- YYYY-MM-DD: The date format. In some areas, the format can be MM-DD-YYYY or DD-MM-YYYY.
- alias: In Pyspark, this function helps add a particular signature for the table or column so that it is shorter and more readable. The alias value will be returned in the new DataFrame.
Sample Coding to Convert String to Date in Pyspark
Now, let’s try the formula with a simple DataFrame:
df1=spark.createDataFrame(
data = [ ("1","Angela","2018-18-07 14:01:23.000"),("2","Amandy","2018-21-07 13:04:29.000"),("3","Michalle","2018-24-07 06:03:13.009")],
schema=["Id","CustomerName","timestamp"])
df1.printSchema()
As can be seen, the timestamp column contains strings in the format YYYY-MM-DD.
We will now convert it into a date column by start selecting the column “timestamp” in the table to be date-converted.
from pyspark.sql.functions import *
df2 = df1.select(col("timestamp"),to_date(col("timestamp"),"YYYY-MM-DD").alias("to_date"))
df2.show()
This sample coding converts the given string format into to_date and returns in the new table as follows:
+----------+----------+
| input | to_date|
+----------+----------+
|2018-18-07|2018-07-18|
|2018-21-07|2023-07-21|
|2018-24-07|2023-07-24|
+----------+----------+
Convert String to Date in Pyspark in SQL
To return a specific date, you can also pass the above to_date function in Pyspark SQL.
General Formular
spark.sql("select to_date('string_value','YYYY-MM-DD') to_date")
.show()
Sample Coding to Convert String to Date in SQL
Now we can try converting the timestamp of Angela:
spark.sql("select to_date('2018-18-07','YYYY-MM-DD') to_date")
.show()
The string will be converted into date YYYY-MM-DD then:
+----------+----------+
| input| to_date|
+----------+----------+
|2018-18-07|2018-07-18|
Conclusion
You can convert string to date in Pyspark or Pyspark SQL using the suggested sample coding, using the to_date() function. We recommend the first way since it can quickly convert the whole column into the DataFrame!
Leave a comment