本文整理汇总了Python中pyspark.sql.HiveContext.setConf方法的典型用法代码示例。如果您正苦于以下问题:Python HiveContext.setConf方法的具体用法?Python HiveContext.setConf怎么用?Python HiveContext.setConf使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类pyspark.sql.HiveContext
的用法示例。
在下文中一共展示了HiveContext.setConf方法的4个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: get_context_test
# 需要导入模块: from pyspark.sql import HiveContext [as 别名]
# 或者: from pyspark.sql.HiveContext import setConf [as 别名]
def get_context_test():
conf = SparkConf()
sc = SparkContext('local[1]', conf=conf)
sql_context = HiveContext(sc)
sql_context.sql("""use fex_test""")
sql_context.setConf("spark.sql.shuffle.partitions", "1")
return sc, sql_context
示例2: get_context
# 需要导入模块: from pyspark.sql import HiveContext [as 别名]
# 或者: from pyspark.sql.HiveContext import setConf [as 别名]
def get_context():
conf = SparkConf()
conf.set("spark.executor.instances", "4")
conf.set("spark.executor.cores", "4")
conf.set("spark.executor.memory", "8g")
sc = SparkContext(appName="__file__", conf=conf)
sql_context = HiveContext(sc)
sql_context.sql("""use fex""")
sql_context.setConf("spark.sql.shuffle.partitions", "32")
return sc, sql_context
示例3: SparkContext
# 需要导入模块: from pyspark.sql import HiveContext [as 别名]
# 或者: from pyspark.sql.HiveContext import setConf [as 别名]
Fails after 2+ hours. Problem seems to be "(Too many open files)"
Likely several thousand files are open at one time.
"""
from pyspark import SparkContext
from pyspark.sql import HiveContext
sc = SparkContext()
sqlContext = HiveContext(sc)
# snappy compression recommended for Arrow
# Interesting- snappy is slightly smaller than gz for the 10 rows.
sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")
# Testing
#pems = sqlContext.sql("SELECT * FROM pems LIMIT 10")
# This works
# pems = sqlContext.sql("SELECT * FROM pems WHERE station IN (402265, 402264, 402263, 402261, 402260)")
pems = sqlContext.sql("SELECT * FROM pems ORDER BY station")
# Don't see options about file chunk sizes, probably comes from some
# environment variable
# Later versions:
# pems.write.parquet("pems_sorted", compression = "snappy")
#pems.write.parquet("pems_station", partitionBy="station")
示例4: SparkConf
# 需要导入模块: from pyspark.sql import HiveContext [as 别名]
# 或者: from pyspark.sql.HiveContext import setConf [as 别名]
adjclose float
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
""")
sqlContext.sql(""" use fex """)
df = sqlContext.sql("""
SELECT
*
FROM
eod_spx
WHERE
symbol = "SPX"
AND date >= "2010-01-01"
AND date <= "2010-06-30"
""")
sqlContext.sql(""" use fex_test """)
df.repartition(1).insertInto("eod_spx", True)
if __name__ == "__main__":
conf = SparkConf();
conf.set("spark.executor.instances", "4")
conf.set("spark.executor.cores", "4")
conf.set("spark.executor.memory", "8g")
sc = SparkContext(appName=__file__, conf = conf)
sqlContext = HiveContext(sc)
sqlContext.setConf("spark.sql.shuffle.partitions", "1")
main(sc, sqlContext)
sc.stop()