本文整理汇总了Python中thunder.rdds.fileio.seriesloader.SeriesLoader.fromText方法的典型用法代码示例。如果您正苦于以下问题:Python SeriesLoader.fromText方法的具体用法?Python SeriesLoader.fromText怎么用?Python SeriesLoader.fromText使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类thunder.rdds.fileio.seriesloader.SeriesLoader
的用法示例。
在下文中一共展示了SeriesLoader.fromText方法的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: loadSeries
# 需要导入模块: from thunder.rdds.fileio.seriesloader import SeriesLoader [as 别名]
# 或者: from thunder.rdds.fileio.seriesloader.SeriesLoader import fromText [as 别名]
def loadSeries(self, datapath, nkeys=None, nvalues=None, inputformat='binary', minPartitions=None,
conffile='conf.json', keytype=None, valuetype=None):
"""
Loads a Series object from data stored as text or binary files.
Supports single files or multiple files stored on a local file system, a networked file system (mounted
and available on all cluster nodes), Amazon S3, or HDFS.
Parameters
----------
datapath: string
Path to data files or directory, specified as either a local filesystem path or in a URI-like format,
including scheme. A datapath argument may include a single '*' wildcard character in the filename. Examples
of valid datapaths include 'a/local/relative/directory/*.stack", "s3n:///my-s3-bucket/data/mydatafile.tif",
"/mnt/my/absolute/data/directory/", or "file:///mnt/another/data/directory/".
nkeys: int, optional (but required if `inputformat` is 'text')
dimensionality of data keys. (For instance, (x,y,z) keyed data for 3-dimensional image timeseries data.)
For text data, number of keys must be specified in this parameter; for binary data, number of keys must be
specified either in this parameter or in a configuration file named by the 'conffile' argument if this
parameter is not set.
nvalues: int, optional (but required if `inputformat` is 'text')
Number of values expected to be read. For binary data, nvalues must be specified either in this parameter
or in a configuration file named by the 'conffile' argument if this parameter is not set.
inputformat: {'text', 'binary'}. optional, default 'binary'
Format of data to be read.
minPartitions: int, optional
Explicitly specify minimum number of Spark partitions to be generated from this data. Used only for
text data. Default is to use minParallelism attribute of Spark context object.
conffile: string, optional, default 'conf.json'
Path to JSON file with configuration options including 'nkeys', 'nvalues', 'keytype', and 'valuetype'.
If a file is not found at the given path, then the base directory given in 'datafile'
will also be checked. Parameters `nkeys` or `nvalues` that are specified as explicit arguments to this
method will take priority over those found in conffile if both are present.
Returns
-------
data: thunder.rdds.Series
A newly-created Series object, wrapping an RDD of series data. This RDD will have as keys an n-tuple
of int, with n given by `nkeys` or the configuration passed in `conffile`. RDD values will be a numpy
array of length `nvalues` (or as specified in the passed configuration file).
"""
checkparams(inputformat, ['text', 'binary'])
from thunder.rdds.fileio.seriesloader import SeriesLoader
loader = SeriesLoader(self._sc, minPartitions=minPartitions)
if inputformat.lower() == 'text':
data = loader.fromText(datapath, nkeys=nkeys)
else:
# must be either 'text' or 'binary'
data = loader.fromBinary(datapath, conffilename=conffile, nkeys=nkeys, nvalues=nvalues,
keytype=keytype, valuetype=valuetype)
return data
示例2: loadSeries
# 需要导入模块: from thunder.rdds.fileio.seriesloader import SeriesLoader [as 别名]
# 或者: from thunder.rdds.fileio.seriesloader.SeriesLoader import fromText [as 别名]
def loadSeries(self, dataPath, nkeys=None, nvalues=None, inputFormat='binary', minPartitions=None,
confFilename='conf.json', keyType=None, valueType=None, keyPath=None, varName=None):
"""
Loads a Series object from data stored as binary, text, npy, or mat.
For binary and text, supports single files or multiple files stored on a local file system,
a networked file system (mounted and available on all cluster nodes), Amazon S3, or HDFS.
For local formats (npy and mat) only local file systems currently supported.
Parameters
----------
dataPath: string
Path to data files or directory, as either a local filesystem path or a URI.
May include a single '*' wildcard in the filename. Examples of valid dataPaths include
'local/directory/*.stack", "s3n:///my-s3-bucket/data/", or "file:///mnt/another/directory/".
nkeys: int, optional (required if `inputFormat` is 'text'), default = None
Number of keys per record (e.g. 3 for (x, y, z) coordinate keys). Must be specified for
text data; can be specified here or in a configuration file for binary data.
nvalues: int, optional (required if `inputFormat` is 'text')
Number of values per record. Must be specified here or in a configuration file for binary data.
inputFormat: {'text', 'binary', 'npy', 'mat'}. optional, default = 'binary'
inputFormat of data to be read.
minPartitions: int, optional, default = SparkContext.minParallelism
Minimum number of Spark partitions to use, only for text.
confFilename: string, optional, default 'conf.json'
Path to JSON file with configuration options including 'nkeys', 'nvalues',
'keyType', and 'valueType'. If a file is not found at the given path, then the base
directory in 'dataPath' will be checked. Parameters will override the conf file.
keyType: string or numpy dtype, optional, default = None
Numerical type of keys, will override conf file.
valueType: string or numpy dtype, optional, default = None
Numerical type of values, will override conf file.
keyPath: string, optional, default = None
Path to file with keys when loading from npy or mat.
varName : str, optional, default = None
Variable name to load (for MAT files only)
Returns
-------
data: thunder.rdds.Series
A Series object, wrapping an RDD, with (n-tuples of ints) : (numpy array) pairs
"""
checkParams(inputFormat, ['text', 'binary', 'npy', 'mat'])
from thunder.rdds.fileio.seriesloader import SeriesLoader
loader = SeriesLoader(self._sc, minPartitions=minPartitions)
if inputFormat.lower() == 'binary':
data = loader.fromBinary(dataPath, confFilename=confFilename, nkeys=nkeys, nvalues=nvalues,
keyType=keyType, valueType=valueType)
elif inputFormat.lower() == 'text':
if nkeys is None:
raise Exception('Must provide number of keys per record for loading from text')
data = loader.fromText(dataPath, nkeys=nkeys)
elif inputFormat.lower() == 'npy':
data = loader.fromNpyLocal(dataPath, keyPath)
else:
if varName is None:
raise Exception('Must provide variable name for loading MAT files')
data = loader.fromMatLocal(dataPath, varName, keyPath)
return data