Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save DanielSnipes/39e6a847bc2e97ea9ba90c02bd11e7be to your computer and use it in GitHub Desktop.

Select an option

Save DanielSnipes/39e6a847bc2e97ea9ba90c02bd11e7be to your computer and use it in GitHub Desktop.

Revisions

  1. Mark Vervuurt revised this gist Aug 11, 2016. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions spark_pandas_dataframes.py
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,6 @@
    import pandas as pd
    from pyspark.sql.types import *

    #Create Pandas DataFrame
    pd_person = pd.DataFrame({'PERSONID':'0','LASTNAME':'Doe','FIRSTNAME':'John','ADDRESS':'Museumplein','CITY':'Amsterdam'}, index=[0])

  2. Mark Vervuurt revised this gist Aug 10, 2016. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions spark_pandas_dataframes.py
    Original file line number Diff line number Diff line change
    @@ -7,6 +7,8 @@

    #Create Spark DataFrame from Pandas
    df_person = sqlContext.createDataFrame(pd_person, p_schema)
    #Important to order columns in the same order as the target database
    df_person = df_person.select("PERSONID", "LASTNAME", "FIRSTNAME", "CITY", "ADDRESS")

    #Writing Spark DataFrame to local Oracle Expression Edition 11.2.0.2
    #This uses the relatively older Spark jdbc DataFrameWriter api
  3. Mark Vervuurt revised this gist Aug 10, 2016. 1 changed file with 4 additions and 1 deletion.
    5 changes: 4 additions & 1 deletion spark_pandas_dataframes.py
    Original file line number Diff line number Diff line change
    @@ -2,8 +2,11 @@
    #Create Pandas DataFrame
    pd_person = pd.DataFrame({'PERSONID':'0','LASTNAME':'Doe','FIRSTNAME':'John','ADDRESS':'Museumplein','CITY':'Amsterdam'}, index=[0])

    #Create PySpark DataFrame Schema
    p_schema = StructType([StructField('ADDRESS',StringType(),True),StructField('CITY',StringType(),True),StructField('FIRSTNAME',StringType(),True),StructField('LASTNAME',StringType(),True),StructField('PERSONID',DecimalType(),True)])

    #Create Spark DataFrame from Pandas
    df_person = sqlContext.createDataFrame(pd_person)
    df_person = sqlContext.createDataFrame(pd_person, p_schema)

    #Writing Spark DataFrame to local Oracle Expression Edition 11.2.0.2
    #This uses the relatively older Spark jdbc DataFrameWriter api
  4. Mark Vervuurt created this gist Aug 9, 2016.
    10 changes: 10 additions & 0 deletions spark_pandas_dataframes.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,10 @@
    import pandas as pd
    #Create Pandas DataFrame
    pd_person = pd.DataFrame({'PERSONID':'0','LASTNAME':'Doe','FIRSTNAME':'John','ADDRESS':'Museumplein','CITY':'Amsterdam'}, index=[0])

    #Create Spark DataFrame from Pandas
    df_person = sqlContext.createDataFrame(pd_person)

    #Writing Spark DataFrame to local Oracle Expression Edition 11.2.0.2
    #This uses the relatively older Spark jdbc DataFrameWriter api
    df_person.write.jdbc(url='jdbc:oracle:thin:@127.0.0.1:1521:XE', table='HR.PERSONS', mode='append', properties={'driver':'oracle.jdbc.driver.OracleDriver', 'user' : 'SYSTEM', 'password' : 'password'})