I am learning pyspark, and trying to connect to a mysql database.
But i am getting a
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver Exception while running the code. I have spent a whole day trying to fix it, any help would be appreciated 🙂
I am using pycharm community edition with anaconda and python 3.6.3
Here is my code:
from pyspark import SparkContext,SQLContext sc= SparkContext() sqlContext= SQLContext(sc) df = sqlContext.read.format("jdbc").options( url ="jdbc:mysql://192.168.0.11:3306/my_db_name", driver = "com.mysql.jdbc.Driver", dbtable = "billing", user="root", password="root").load()
Here is the error:
py4j.protocol.Py4JJavaError: An error occurred while calling o27.load. : java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
This got asked 9 months ago at the time of writing, but since there’s no answer, there it goes. I was in the same situation, searched stackoverflow over and over, tried different suggestions but the answer finally is absurdly simple: You just have to COPY the MySQL driver into the “jars” folder of Spark!
Download here https://dev.mysql.com/downloads/connector/j/5.1.html
I’m using the 5.1 version, although 8.0 exists, but I had some other problems when running the latest version with Spark 2.3.2 (had also other problems running Spark 2.4 on Windows 10).
Once downloaded you can just copy it into your Spark folder
E:\spark232_hadoop27\jars\ (use your own drive:\folder_name — this is just an example)
You should have two files:
After that the following code launched through pyCharm or jupyter notebook should work (as long as you have a MySQL database set up, that is):
import findspark findspark.init() import pyspark # only run after findspark.init() from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() dataframe_mysql = spark.read.format("jdbc").options( url="jdbc:mysql://localhost:3306/uoc2", driver = "com.mysql.jdbc.Driver", dbtable = "company", user="root", password="password").load() dataframe_mysql.show()
Bear in mind, I’m working currently locally with my Spark setup, so no real clusters involved, and also no “production” kind of code which gets submitted to such a cluster. For something more elaborate this answer could help: MySQL read with PySpark
Answered By – Kondado
Answer Checked By – Terry (BugsFixing Volunteer)