當前位置: 首頁>>代碼示例>>Python>>正文


Python streaming.StreamingContext方法代碼示例

本文整理匯總了Python中pyspark.streaming.StreamingContext方法的典型用法代碼示例。如果您正苦於以下問題:Python streaming.StreamingContext方法的具體用法?Python streaming.StreamingContext怎麽用?Python streaming.StreamingContext使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在pyspark.streaming的用法示例。


在下文中一共展示了streaming.StreamingContext方法的8個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: main

# 需要導入模塊: from pyspark import streaming [as 別名]
# 或者: from pyspark.streaming import StreamingContext [as 別名]
def main():
    # Adapted from https://github.com/apache/spark/tree/master/examples/src/main/python/streaming
    sc = SparkContext(appName='PythonStreamingQueue')
    ssc = StreamingContext(sc, 1)

    # Create the queue through which RDDs can be pushed to
    # a QueueInputDStream
    rddQueue = []
    for _ in range(5):
        rddQueue += [ssc.sparkContext.parallelize([j for j in range(1, 1001)], 10)]

    # Create the QueueInputDStream and use it do some processing
    inputStream = ssc.queueStream(rddQueue)
    mappedStream = inputStream.map(lambda x: (x % 10, 1))
    reducedStream = mappedStream.reduceByKey(lambda a, b: a + b)
    reducedStream.pprint()

    ssc.start()
    time.sleep(6)
    ssc.stop(stopSparkContext=True, stopGraceFully=True) 
開發者ID:DataDog,項目名稱:integrations-core,代碼行數:22,代碼來源:app.py

示例2: bluecoat_parse

# 需要導入模塊: from pyspark import streaming [as 別名]
# 或者: from pyspark.streaming import StreamingContext [as 別名]
def bluecoat_parse(zk, topic, db, db_table, num_of_workers, batch_size):
    """
    Parse and save bluecoat logs.

    :param zk: Apache ZooKeeper quorum
    :param topic: Apache Kafka topic (application name)
    :param db: Apache Hive database to save into
    :param db_table: table of `db` to save into
    :param num_of_workers: number of Apache Kafka workers
    :param batch_size: batch size for Apache Spark streaming context
    """
    app_name = topic
    wrks = int(num_of_workers)

    # create spark context
    sc = SparkContext(appName=app_name)
    ssc = StreamingContext(sc, int(batch_size))
    sqc = HiveContext(sc)

    tp_stream = KafkaUtils.createStream(ssc, zk, app_name, {topic: wrks}, keyDecoder=spot_decoder, valueDecoder=spot_decoder)

    proxy_data = tp_stream.map(lambda row: row[1]).flatMap(lambda row: row.split("\n")).filter(lambda row: rex_date.match(row)).map(lambda row: row.strip("\n").strip("\r").replace("\t", " ").replace("  ", " ")).map(lambda row: split_log_entry(row)).map(lambda row: proxy_parser(row))
    saved_data = proxy_data.foreachRDD(lambda row: save_data(row, sqc, db, db_table, topic))
    ssc.start()
    ssc.awaitTermination() 
開發者ID:apache,項目名稱:incubator-spot,代碼行數:27,代碼來源:bluecoat.py

示例3: create_streaming_context

# 需要導入模塊: from pyspark import streaming [as 別名]
# 或者: from pyspark.streaming import StreamingContext [as 別名]
def create_streaming_context(spark_context, config):
    """
    Create a streaming context with a custom Streaming Listener
    that will log every event.
    :param spark_context: Spark context
    :type spark_context: pyspark.SparkContext
    :param config: dict
    :return: Returns a new streaming context from the given context.
    :rtype: pyspark.streaming.StreamingContext
    """
    ssc = streaming.StreamingContext(spark_context, config[
        "spark_config"]["streaming"]["batch_interval"])
    ssc.addStreamingListener(DriverStreamingListener)
    directory = os_path.expanduser("~/checkpointing")
    logger.info("Checkpointing to `{}`".format(directory))
    # Commented out to fix a crash occurring when
    # phase 1 is used. The reason of the crash is still unclear
    # but Spark complains about the SSC being transferred
    # to workers.
    # ssc.checkpoint(directory)
    return ssc 
開發者ID:openstack,項目名稱:monasca-analytics,代碼行數:23,代碼來源:streaming_context.py

示例4: streaming_listener

# 需要導入模塊: from pyspark import streaming [as 別名]
# 或者: from pyspark.streaming import StreamingContext [as 別名]
def streaming_listener(**kwargs):
    '''
        Initialize the Spark job.
    '''
    Util.get_logger('SPOT.INGEST', kwargs.pop('log_level'))

    logger  = logging.getLogger('SPOT.INGEST.COMMON.LISTENER')
    logger.info('Initializing Spark Streaming Listener...')

    dbtable = '{0}.{1}'.format(kwargs.pop('database'), kwargs['type'])
    topic   = kwargs.pop('topic')

    sc      = SparkContext(appName=kwargs['app_name'] or topic)
    logger.info('Connect to Spark Cluster as job "{0}" and broadcast variables on it.'
        .format(kwargs.pop('app_name') or topic))
    ssc     = StreamingContext(sc, batchDuration=kwargs['batch_duration'])
    logger.info('Streaming data will be divided into batches of {0} seconds.'
        .format(kwargs.pop('batch_duration')))
    hsc     = HiveContext(sc)
    logger.info('Read Hive\'s configuration to integrate with data stored in it.')

    import pipelines
    module  = getattr(pipelines, kwargs.pop('type'))
    stream  = module.StreamPipeline(ssc, kwargs.pop('zkquorum'),
                kwargs.pop('group_id') or topic, { topic: int(kwargs.pop('partitions')) })

    schema  = stream.schema
    segtype = stream.segtype

    stream.dstream\
        .map(lambda x: module.StreamPipeline.parse(x))\
        .filter(lambda x: bool(x))\
        .foreachRDD(lambda x: store(x, hsc, dbtable, topic, schema, segtype))

    ssc.start()
    logger.info('Start the execution of the streams.')
    ssc.awaitTermination() 
開發者ID:apache,項目名稱:incubator-spot,代碼行數:39,代碼來源:listener.py

示例5: setUp

# 需要導入模塊: from pyspark import streaming [as 別名]
# 或者: from pyspark.streaming import StreamingContext [as 別名]
def setUp(self):
        self.sc = SparkContext('local[4]', "MLlib tests")
        self.ssc = StreamingContext(self.sc, 1.0) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:5,代碼來源:tests.py

示例6: main

# 需要導入模塊: from pyspark import streaming [as 別名]
# 或者: from pyspark.streaming import StreamingContext [as 別名]
def main():
    """Run Spark Streaming"""
    conf = SparkConf()
    sc = SparkContext(appName='Ozymandias', conf=conf)
    sc.setLogLevel('WARN')
    
    with open(ROOT + 'channels.json', 'r') as f:
        channels = json.load(f)
    topics = [t['topic'] for t in channels['channels']]
    
    n_secs = 0.5
    ssc = StreamingContext(sc, n_secs)
    stream = KafkaUtils.createDirectStream(ssc, topics, {
                        'bootstrap.servers':'localhost:9092', 
                        'group.id':'ozy-group', 
                        'fetch.message.max.bytes':'15728640',
                        'auto.offset.reset':'largest'})
    
    stream.map(
            deserializer
        ).map(
            image_detector
        ).foreachRDD(
            message_sender)
    
    ssc.start()
    ssc.awaitTermination() 
開發者ID:pambot,項目名稱:ozymandias,代碼行數:29,代碼來源:ozy_streaming.py

示例7: isException

# 需要導入模塊: from pyspark import streaming [as 別名]
# 或者: from pyspark.streaming import StreamingContext [as 別名]
def isException(machine, signal):
# assunzioni: da parametrizzare come parametro o letto dinamicamente da fonte
  exceptions = [(11,19)]
  return (int(machine), signal) in exceptions 

# Create a local StreamingContext with two working thread and batch interval of 1 second 
開發者ID:matteoredaelli,項目名稱:pyspark-examples,代碼行數:8,代碼來源:signals.py

示例8: streaming_context

# 需要導入模塊: from pyspark import streaming [as 別名]
# 或者: from pyspark.streaming import StreamingContext [as 別名]
def streaming_context(sc):
    return StreamingContext(sc, 1) 
開發者ID:logicalclocks,項目名稱:maggy,代碼行數:4,代碼來源:conftest.py


注:本文中的pyspark.streaming.StreamingContext方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。