Parquet stores data in columnar format, and is highly optimized in Spark. The best format for performance is parquet with snappy compression, which is the default in Spark 2.x. Spark can be extended to support many more formats with external data sources - for more information, see Apache Spark packages. Spark supports many formats, such as csv, json, xml, parquet, orc, and avro. No query optimization through Catalyst.You don't need to use RDDs, unless you need to build a new custom RDD. Adds serialization/deserialization overhead.Developer-friendly by providing domain object programming and compile-time checks.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |