Commit 6aeea11a authored by Dave Wentzel's avatar Dave Wentzel

temp

parent 8dca0255
delta: versioned parquet files
handles upserts, but must use MERGE INTO syntax
WORM
compaction: run OPTIMIZE
quiesce the system and run VACUUM
VACUUM events RETAIN 24 HOURS
OPTIMIZE events ZORDER BY eventType, city
multi-dimensional clustering
OPTIMIZE events
WHERE date >= current_timestamp() - INTERVAL 1 day
ZORDER BY (eventType)
“CREATE TABLE ... USING parquet” to
“CREATE TABLE ... USING delta”
“dataframe.write.format(“parquet”).load(“/data/events”)”
“dataframe.write.format(“delta”).load(“/data/events”)”
display(spark.sql("DESCRIBE HISTORY quickstart_delta"))
Only when you are working with a HiveContext can DataFrames be saved as persistent tables; DataFrame sources from a SQLContext cannot be saved as Hive tables. Here is an example of DataFrame created from an existing RDD; note that the DataFrame was created using the HiveContext: val flightsDF = hiveContext.createDataFrame(resultRDD) flightsDF.write.saveAsTable("FlightDelaysSummaryRDD")
\ No newline at end of file
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment