當前位置: 首頁>>代碼示例>>Python>>正文


Python SparkContext._active_spark_context方法代碼示例

本文整理匯總了Python中pyspark.SparkContext._active_spark_context方法的典型用法代碼示例。如果您正苦於以下問題:Python SparkContext._active_spark_context方法的具體用法?Python SparkContext._active_spark_context怎麽用?Python SparkContext._active_spark_context使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在pyspark.SparkContext的用法示例。


在下文中一共展示了SparkContext._active_spark_context方法的15個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: from_vocabulary

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def from_vocabulary(cls, vocabulary, inputCol, outputCol=None, minTF=None, binary=None):
        """
        Construct the model directly from a vocabulary list of strings,
        requires an active SparkContext.
        """
        sc = SparkContext._active_spark_context
        java_class = sc._gateway.jvm.java.lang.String
        jvocab = CountVectorizerModel._new_java_array(vocabulary, java_class)
        model = CountVectorizerModel._create_from_java_class(
            "org.apache.spark.ml.feature.CountVectorizerModel", jvocab)
        model.setInputCol(inputCol)
        if outputCol is not None:
            model.setOutputCol(outputCol)
        if minTF is not None:
            model.setMinTF(minTF)
        if binary is not None:
            model.setBinary(binary)
        model._set(vocabSize=len(vocabulary))
        return model 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:21,代碼來源:feature.py

示例2: from_labels

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def from_labels(cls, labels, inputCol, outputCol=None, handleInvalid=None):
        """
        Construct the model directly from an array of label strings,
        requires an active SparkContext.
        """
        sc = SparkContext._active_spark_context
        java_class = sc._gateway.jvm.java.lang.String
        jlabels = StringIndexerModel._new_java_array(labels, java_class)
        model = StringIndexerModel._create_from_java_class(
            "org.apache.spark.ml.feature.StringIndexerModel", jlabels)
        model.setInputCol(inputCol)
        if outputCol is not None:
            model.setOutputCol(outputCol)
        if handleInvalid is not None:
            model.setHandleInvalid(handleInvalid)
        return model 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:18,代碼來源:feature.py

示例3: _new_java_array

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def _new_java_array(pylist, java_class):
        """
        Create a Java array of given java_class type. Useful for
        calling a method with a Scala Array from Python with Py4J.

        :param pylist:
          Python list to convert to a Java Array.
        :param java_class:
          Java class to specify the type of Array. Should be in the
          form of sc._gateway.jvm.* (sc is a valid Spark Context).
        :return:
          Java Array of converted pylist.

        Example primitive Java classes:
          - basestring -> sc._gateway.jvm.java.lang.String
          - int -> sc._gateway.jvm.java.lang.Integer
          - float -> sc._gateway.jvm.java.lang.Double
          - bool -> sc._gateway.jvm.java.lang.Boolean
        """
        sc = SparkContext._active_spark_context
        java_array = sc._gateway.new_array(java_class, len(pylist))
        for i in xrange(len(pylist)):
            java_array[i] = pylist[i]
        return java_array 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:26,代碼來源:wrapper.py

示例4: _transfer_params_to_java

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def _transfer_params_to_java(self):
        """
        Transforms the embedded params to the companion Java object.
        """
        pair_defaults = []
        for param in self.params:
            if self.isSet(param):
                pair = self._make_java_param_pair(param, self._paramMap[param])
                self._java_obj.set(pair)
            if self.hasDefault(param):
                pair = self._make_java_param_pair(param, self._defaultParamMap[param])
                pair_defaults.append(pair)
        if len(pair_defaults) > 0:
            sc = SparkContext._active_spark_context
            pair_defaults_seq = sc._jvm.PythonUtils.toSeq(pair_defaults)
            self._java_obj.setDefault(pair_defaults_seq) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:18,代碼來源:wrapper.py

示例5: _transfer_params_from_java

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def _transfer_params_from_java(self):
        """
        Transforms the embedded params from the companion Java object.
        """
        sc = SparkContext._active_spark_context
        for param in self.params:
            if self._java_obj.hasParam(param.name):
                java_param = self._java_obj.getParam(param.name)
                # SPARK-14931: Only check set params back to avoid default params mismatch.
                if self._java_obj.isSet(java_param):
                    value = _java2py(sc, self._java_obj.getOrDefault(java_param))
                    self._set(**{param.name: value})
                # SPARK-10931: Temporary fix for params that have a default in Java
                if self._java_obj.hasDefault(java_param) and not self.isDefined(param):
                    value = _java2py(sc, self._java_obj.getDefault(java_param)).get()
                    self._setDefault(**{param.name: value}) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:18,代碼來源:wrapper.py

示例6: approx_count_distinct

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def approx_count_distinct(col, rsd=None):
    """Aggregate function: returns a new :class:`Column` for approximate distinct count of
    column `col`.

    :param rsd: maximum estimation error allowed (default = 0.05). For rsd < 0.01, it is more
        efficient to use :func:`countDistinct`

    >>> df.agg(approx_count_distinct(df.age).alias('distinct_ages')).collect()
    [Row(distinct_ages=2)]
    """
    sc = SparkContext._active_spark_context
    if rsd is None:
        jc = sc._jvm.functions.approx_count_distinct(_to_java_column(col))
    else:
        jc = sc._jvm.functions.approx_count_distinct(_to_java_column(col), rsd)
    return Column(jc) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:18,代碼來源:functions.py

示例7: grouping_id

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def grouping_id(*cols):
    """
    Aggregate function: returns the level of grouping, equals to

       (grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)

    .. note:: The list of columns should match with grouping columns exactly, or empty (means all
        the grouping columns).

    >>> df.cube("name").agg(grouping_id(), sum("age")).orderBy("name").show()
    +-----+-------------+--------+
    | name|grouping_id()|sum(age)|
    +-----+-------------+--------+
    | null|            1|       7|
    |Alice|            0|       2|
    |  Bob|            0|       5|
    +-----+-------------+--------+
    """
    sc = SparkContext._active_spark_context
    jc = sc._jvm.functions.grouping_id(_to_seq(sc, cols, _to_java_column))
    return Column(jc) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:23,代碼來源:functions.py

示例8: rand

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def rand(seed=None):
    """Generates a random column with independent and identically distributed (i.i.d.) samples
    from U[0.0, 1.0].

    .. note:: The function is non-deterministic in general case.

    >>> df.withColumn('rand', rand(seed=42) * 3).collect()
    [Row(age=2, name=u'Alice', rand=1.1568609015300986),
     Row(age=5, name=u'Bob', rand=1.403379671529166)]
    """
    sc = SparkContext._active_spark_context
    if seed is not None:
        jc = sc._jvm.functions.rand(seed)
    else:
        jc = sc._jvm.functions.rand()
    return Column(jc) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:18,代碼來源:functions.py

示例9: randn

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def randn(seed=None):
    """Generates a column with independent and identically distributed (i.i.d.) samples from
    the standard normal distribution.

    .. note:: The function is non-deterministic in general case.

    >>> df.withColumn('randn', randn(seed=42)).collect()
    [Row(age=2, name=u'Alice', randn=-0.7556247885860078),
    Row(age=5, name=u'Bob', randn=-0.0861619008451133)]
    """
    sc = SparkContext._active_spark_context
    if seed is not None:
        jc = sc._jvm.functions.randn(seed)
    else:
        jc = sc._jvm.functions.randn()
    return Column(jc) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:18,代碼來源:functions.py

示例10: when

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def when(condition, value):
    """Evaluates a list of conditions and returns one of multiple possible result expressions.
    If :func:`Column.otherwise` is not invoked, None is returned for unmatched conditions.

    :param condition: a boolean :class:`Column` expression.
    :param value: a literal value, or a :class:`Column` expression.

    >>> df.select(when(df['age'] == 2, 3).otherwise(4).alias("age")).collect()
    [Row(age=3), Row(age=4)]

    >>> df.select(when(df.age == 2, df.age + 1).alias("age")).collect()
    [Row(age=3), Row(age=None)]
    """
    sc = SparkContext._active_spark_context
    if not isinstance(condition, Column):
        raise TypeError("condition should be a Column")
    v = value._jc if isinstance(value, Column) else value
    jc = sc._jvm.functions.when(condition._jc, v)
    return Column(jc) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:21,代碼來源:functions.py

示例11: log

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def log(arg1, arg2=None):
    """Returns the first argument-based logarithm of the second argument.

    If there is only one argument, then this takes the natural logarithm of the argument.

    >>> df.select(log(10.0, df.age).alias('ten')).rdd.map(lambda l: str(l.ten)[:7]).collect()
    ['0.30102', '0.69897']

    >>> df.select(log(df.age).alias('e')).rdd.map(lambda l: str(l.e)[:7]).collect()
    ['0.69314', '1.60943']
    """
    sc = SparkContext._active_spark_context
    if arg2 is None:
        jc = sc._jvm.functions.log(_to_java_column(arg1))
    else:
        jc = sc._jvm.functions.log(arg1, _to_java_column(arg2))
    return Column(jc) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:19,代碼來源:functions.py

示例12: ntile

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def ntile(n):
    """
    Window function: returns the ntile group id (from 1 to `n` inclusive)
    in an ordered window partition. For example, if `n` is 4, the first
    quarter of the rows will get value 1, the second quarter will get 2,
    the third quarter will get 3, and the last quarter will get 4.

    This is equivalent to the NTILE function in SQL.

    :param n: an integer
    """
    sc = SparkContext._active_spark_context
    return Column(sc._jvm.functions.ntile(int(n)))


# ---------------------- Date/Timestamp functions ------------------------------ 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:18,代碼來源:functions.py

示例13: months_between

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def months_between(date1, date2, roundOff=True):
    """
    Returns number of months between dates date1 and date2.
    If date1 is later than date2, then the result is positive.
    If date1 and date2 are on the same day of month, or both are the last day of month,
    returns an integer (time of day will be ignored).
    The result is rounded off to 8 digits unless `roundOff` is set to `False`.

    >>> df = spark.createDataFrame([('1997-02-28 10:30:00', '1996-10-30')], ['date1', 'date2'])
    >>> df.select(months_between(df.date1, df.date2).alias('months')).collect()
    [Row(months=3.94959677)]
    >>> df.select(months_between(df.date1, df.date2, False).alias('months')).collect()
    [Row(months=3.9495967741935485)]
    """
    sc = SparkContext._active_spark_context
    return Column(sc._jvm.functions.months_between(
        _to_java_column(date1), _to_java_column(date2), roundOff)) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:19,代碼來源:functions.py

示例14: to_date

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def to_date(col, format=None):
    """Converts a :class:`Column` of :class:`pyspark.sql.types.StringType` or
    :class:`pyspark.sql.types.TimestampType` into :class:`pyspark.sql.types.DateType`
    using the optionally specified format. Specify formats according to
    `SimpleDateFormats <http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html>`_.
    By default, it follows casting rules to :class:`pyspark.sql.types.DateType` if the format
    is omitted (equivalent to ``col.cast("date")``).

    >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
    >>> df.select(to_date(df.t).alias('date')).collect()
    [Row(date=datetime.date(1997, 2, 28))]

    >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
    >>> df.select(to_date(df.t, 'yyyy-MM-dd HH:mm:ss').alias('date')).collect()
    [Row(date=datetime.date(1997, 2, 28))]
    """
    sc = SparkContext._active_spark_context
    if format is None:
        jc = sc._jvm.functions.to_date(_to_java_column(col))
    else:
        jc = sc._jvm.functions.to_date(_to_java_column(col), format)
    return Column(jc) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:24,代碼來源:functions.py

示例15: to_timestamp

# 需要導入模塊: from pyspark import SparkContext [as 別名]
# 或者: from pyspark.SparkContext import _active_spark_context [as 別名]
def to_timestamp(col, format=None):
    """Converts a :class:`Column` of :class:`pyspark.sql.types.StringType` or
    :class:`pyspark.sql.types.TimestampType` into :class:`pyspark.sql.types.DateType`
    using the optionally specified format. Specify formats according to
    `SimpleDateFormats <http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html>`_.
    By default, it follows casting rules to :class:`pyspark.sql.types.TimestampType` if the format
    is omitted (equivalent to ``col.cast("timestamp")``).

    >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
    >>> df.select(to_timestamp(df.t).alias('dt')).collect()
    [Row(dt=datetime.datetime(1997, 2, 28, 10, 30))]

    >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
    >>> df.select(to_timestamp(df.t, 'yyyy-MM-dd HH:mm:ss').alias('dt')).collect()
    [Row(dt=datetime.datetime(1997, 2, 28, 10, 30))]
    """
    sc = SparkContext._active_spark_context
    if format is None:
        jc = sc._jvm.functions.to_timestamp(_to_java_column(col))
    else:
        jc = sc._jvm.functions.to_timestamp(_to_java_column(col), format)
    return Column(jc) 
開發者ID:runawayhorse001,項目名稱:LearningApacheSpark,代碼行數:24,代碼來源:functions.py


注:本文中的pyspark.SparkContext._active_spark_context方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。