本文整理汇总了Python中disco.core.Disco.tag方法的典型用法代码示例。如果您正苦于以下问题:Python Disco.tag方法的具体用法?Python Disco.tag怎么用?Python Disco.tag使用的例子?那么, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类disco.core.Disco
的用法示例。
在下文中一共展示了Disco.tag方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: fit_predict
# 需要导入模块: from disco.core import Disco [as 别名]
# 或者: from disco.core.Disco import tag [as 别名]
def fit_predict(training_data, fitting_data, tau=1, samples_per_job=0, save_results=True, show=False):
from disco.worker.pipeline.worker import Worker, Stage
from disco.core import Job, result_iterator
from disco.core import Disco
"""
training_data - training samples
fitting_data - dataset to be fitted to training data.
tau - controls how quickly the weight of a training sample falls off with distance of its x(i) from the query point x.
samples_per_job - define a number of samples that will be processed in single mapreduce job. If 0, algorithm will calculate number of samples per job.
"""
try:
tau = float(tau)
if tau <= 0:
raise Exception("Parameter tau should be >= 0.")
except ValueError:
raise Exception("Parameter tau should be numerical.")
if fitting_data.params["id_index"] == -1:
raise Exception("Predict data should have id_index set.")
job = Job(worker=Worker(save_results=save_results))
job.pipeline = [
("split", Stage("map", input_chain=fitting_data.params["input_chain"], init=simple_init, process=map_predict))
]
job.params = fitting_data.params
job.run(name="lwlr_read_data", input=fitting_data.params["data_tag"])
samples = {}
results = []
tau = float(2 * tau ** 2) # calculate tau once
counter = 0
for test_id, x in result_iterator(job.wait(show=show)):
if samples_per_job == 0:
# calculate number of samples per job
if len(x) <= 100: # if there is less than 100 attributes
samples_per_job = 100 # 100 samples is max per on job
else:
# there is more than 100 attributes
samples_per_job = len(x) * -25 / 900.0 + 53 # linear function
samples[test_id] = x
if counter == samples_per_job:
results.append(_fit_predict(training_data, samples, tau, save_results, show))
counter = 0
samples = {}
counter += 1
if len(samples) > 0: # if there is some samples left in the the dictionary
results.append(_fit_predict(training_data, samples, tau, save_results, show))
# merge results of every iteration into a single tag
ddfs = Disco().ddfs
ddfs.tag(job.name, [[list(ddfs.blobs(tag))[0][0]] for tag in results])
return ["tag://" + job.name]