Python SciPy optimize.least_squares用法及代碼示例

本文簡要介紹 python 語言中 scipy.optimize.least_squares 的用法。

用法: scipy.optimize.least_squares(fun, x0, jac='2-point', bounds=(-inf, inf), method='trf', ftol=1e-08, xtol=1e-08, gtol=1e-08, x_scale=1.0, loss='linear', f_scale=1.0, diff_step=None, tr_solver=None, tr_options={}, jac_sparsity=None, max_nfev=None, verbose=0, args=(), kwargs={})#

求解具有變量界限的非線性最小二乘問題。

給定殘差 f(x)(n 個實變量的 m-D 實函數)和損失函數 rho(s)(標量函數)，least_squares 找到成本函數 F(x) 的局部最小值：

minimize F(x) = 0.5 * sum(rho(f_i(x)**2), i = 0, ..., m - 1)
subject to lb <= x <= ub

損失函數 rho(s) 的目的是減少異常值對解的影響。

參數：：

fun：可調用的

計算殘差向量的函數，簽名為 fun(x, *args, **kwargs) ，即最小化相對於其第一個參數進行。傳遞給此函數的參數x 是一個形狀為 (n,) 的 ndarray(絕不是標量，即使 n=1 也是如此)。它必須分配並返回一個形狀為 (m,) 或標量的一維數組。如果參數 x 是複數或函數 fun 返回複數殘差，則必須將其包裝在實參數的實函數中，如示例部分末尾所示。

x0：數組形狀為 (n,) 或浮點數

對自變量的初步猜測。如果是浮點數，它將被視為具有一個元素的一維數組。

jac： {‘2-point’, ‘3-point’, ‘cs’, callable}, 可選

計算雅可比矩陣的方法(m-by-n 矩陣，其中元素 (i, j) 是 f[i] 對 x[j] 的偏導數)。關鍵字選擇用於數值估計的有限差分方案。方案“3-point”更準確，但需要的操作數是“2-point”(默認)的兩倍。方案‘cs’ 使用複雜的步驟，雖然可能是最準確的，但它僅適用於樂趣正確處理複雜的輸入，並且可以分析地繼續到複雜的平麵。方法 ‘lm’ 始終使用“2 點”方案。如果可調用，則用作jac(x, *args, **kwargs)並且應該返回雅可比行列式的良好近似值(或精確值)作為數組(應用 np.atleast_2d)、稀疏矩陣(csr_matrix 首選性能)或scipy.sparse.linalg.LinearOperator.

bounds：數組或 Bounds 的 2 元組，可選

指定邊界有兩種方法：

Instance of Bounds class

Lower and upper bounds on independent variables. Defaults to no bounds. Each array must match the size of x0 or be a scalar, in the latter case a bound will be the same for all variables. Use np.inf with an appropriate sign to disable bounds on all or some variables.

method： {‘trf’, ‘dogbox’, ‘lm’}，可選

執行最小化的算法。

‘trf’ : Trust Region Reflective algorithm, particularly suitable for large sparse problems with bounds. Generally robust method.

‘dogbox’ : dogleg algorithm with rectangular trust regions, typical use case is small problems with bounds. Not recommended for problems with rank-deficient Jacobian.

‘lm’ : Levenberg-Marquardt algorithm as implemented in MINPACK. Doesn’t handle bounds and sparse Jacobians. Usually the most efficient method for small unconstrained problems.

默認為‘trf’。有關詳細信息，請參閱注釋。

ftol：浮點數或無，可選

因成本函數變化而終止的容差。默認值為 1e-8。當 dF < ftol * F 時停止優化過程，並且在最後一步中局部二次模型與真實模型之間存在足夠的一致性。

如果 None 並且 ‘method’ 不是 ‘lm’，則禁用此條件的終止。如果‘method’ 是‘lm’，則此容差必須高於機器 epsilon。

xtol：浮點數或無，可選

因自變量變化而終止的容差。默認值為 1e-8。確切的條件取決於使用的方法：

For ‘trf’ and ‘dogbox’ : norm(dx) < xtol * (xtol + norm(x)).

For ‘lm’ : Delta < xtol * norm(xs), where Delta is a trust-region radius and xs is the value of x scaled according to x_scale parameter (see below).

如果 None 並且 ‘method’ 不是 ‘lm’，則禁用此條件的終止。如果‘method’ 是‘lm’，則此容差必須高於機器 epsilon。

gtol：浮點數或無，可選

梯度範數終止的容差。默認值為 1e-8。確切的條件取決於使用的方法：

For ‘trf’ : norm(g_scaled, ord=np.inf) < gtol, where g_scaled is the value of the gradient scaled to account for the presence of the bounds [STIR].

For ‘dogbox’ : norm(g_free, ord=np.inf) < gtol, where g_free is the gradient with respect to the variables which are not in the optimal state on the boundary.

For ‘lm’ : the maximum absolute value of the cosine of angles between columns of the Jacobian and the residual vector is less than gtol, or the residual vector is zero.

如果 None 並且 ‘method’ 不是 ‘lm’，則禁用此條件的終止。如果‘method’ 是‘lm’，則此容差必須高於機器 epsilon。

x_scale：數組或‘jac’，可選

每個變量的特征尺度。環境x_scale相當於用縮放變量重新表述問題xs = x / x_scale.另一種觀點是，信任區域沿第 j 維的大小與x_scale[j].可以通過設置來提高收斂性x_scale這樣沿著任何縮放變量的給定大小的步長對成本函數都有類似的影響。如果設置為‘jac’，則使用雅可比矩陣列的逆範數迭代更新比例(如在[JJ更多])。

loss： str 或可調用，可選

確定損失函數。允許使用以下關鍵字值：

‘linear’ (default) : rho(z) = z. Gives a standard least-squares problem.

‘soft_l1’ : rho(z) = 2 * ((1 + z)**0.5 - 1). The smooth approximation of l1 (absolute value) loss. Usually a good choice for robust least squares.

‘huber’ : rho(z) = z if z <= 1 else 2*z**0.5 - 1. Works similarly to ‘soft_l1’.

‘cauchy’ : rho(z) = ln(1 + z). Severely weakens outliers influence, but may cause difficulties in optimization process.

‘arctan’ : rho(z) = arctan(z). Limits a maximum loss on a single residual, has properties similar to ‘cauchy’.

如果可調用，它必須采用一維 ndarray z=f**2 並返回形狀為 (3, m) 的數組，其中第 0 行包含函數值，第 1 行包含一階導數，第 2 行包含二階導數。方法 ‘lm’ 僅支持 ‘linear’ 損失。

f_scale：浮點數，可選

內部和異常殘差之間的軟邊距值，默認為 1.0。損失函數評估如下rho_(f**2) = C**2 * rho(f**2 / C**2)，其中C是f_scale，和rho由失利範圍。該參數與loss='linear', 但對於其他失利值是至關重要的。

max_nfev：無或整數，可選

終止前函數評估的最大次數。如果 None (默認)，則自動選擇該值：

For ‘trf’ and ‘dogbox’ : 100 * n.

For ‘lm’ : 100 * n if jac is callable and 100 * n * (n + 1) otherwise (because ‘lm’ counts function calls in Jacobian estimation).

diff_step：無或類似數組，可選

確定雅可比行列式的有限差分逼近的相對步長。實際步長計算為x * diff_step.如果無(默認)，則diff_step被視為使用的有限差分方案的機器 epsilon 的常規 “optimal” 冪[NR].

tr_solver： {無，‘exact’, ‘lsmr’}，可選

解決trust-region子問題的方法，僅與‘trf’和‘dogbox’方法相關。

‘exact’ is suitable for not very large problems with dense Jacobian matrices. The computational complexity per iteration is comparable to a singular value decomposition of the Jacobian matrix.

‘lsmr’ is suitable for problems with sparse and large Jacobian matrices. It uses the iterative procedure scipy.sparse.linalg.lsmr for finding a solution of a linear least-squares problem and only requires matrix-vector product evaluations.

如果 None (默認)，則根據第一次迭代返回的 Jacobian 類型選擇求解器。

tr_options：字典，可選

傳遞給 trust-region 求解器的關鍵字選項。

tr_solver='exact': tr_options are ignored.

tr_solver='lsmr': options for scipy.sparse.linalg.lsmr. Additionally, method='trf' supports ‘regularize’ option (bool, default is True), which adds a regularization term to the normal equation, which improves convergence if the Jacobian is rank-deficient [Byrd] (eq. 3.4).

jac_sparsity： {無，數組，稀疏矩陣}，可選

定義用於有限差分估計的雅可比矩陣的稀疏結構，其形狀必須為 (m, n)。如果雅可比矩陣中隻有少數非零元素每個行，提供稀疏結構將大大加快計算速度[柯蒂斯].零條目意味著雅可比行列式中的相應元素相同為零。如果提供，則強製使用 ‘lsmr’ trust-region 求解器。如果無(默認)，則將使用密集差分。對‘lm’ 方法無效。

verbose： {0, 1, 2}，可選

算法的詳細程度：

0 (default) : work silently.

1 : display a termination report.

2 : display progress during iterations (not supported by ‘lm’ method).

args, kwargs：元組和字典，可選

傳遞給的附加參數樂趣和江淮.默認情況下兩者都是空的。調用簽名是fun(x, *args, **kwargs)和同樣的江淮.

result： OptimizeResult

OptimizeResult 定義了以下字段：

x ndarray, shape (n,)
Solution found.

cost float
Value of the cost function at the solution.

fun ndarray, shape (m,)
Vector of residuals at the solution.

jac ndarray, sparse matrix or LinearOperator, shape (m, n)
Modified Jacobian matrix at the solution, in the sense that J^T J is a Gauss-Newton approximation of the Hessian of the cost function. The type is the same as the one used by the algorithm.

grad ndarray, shape (m,)
Gradient of the cost function at the solution.

optimality float
First-order optimality measure. In unconstrained problems, it is always the uniform norm of the gradient. In constrained problems, it is the quantity which was compared with gtol during iterations.

active_mask ndarray of int, shape (n,)
Each component shows whether a corresponding constraint is active (that is, whether a variable is at the bound):

0 : a constraint is not active.

-1 : a lower bound is active.

1 : an upper bound is active.

Might be somewhat arbitrary for ‘trf’ method as it generates a sequence of strictly feasible iterates and active_mask is determined within a tolerance threshold.

nfev int
Number of function evaluations done. Methods ‘trf’ and ‘dogbox’ do not count function calls for numerical Jacobian approximation, as opposed to ‘lm’ method.

njev int or None
Number of Jacobian evaluations done. If numerical Jacobian approximation is used in ‘lm’ method, it is set to None.

status int
The reason for algorithm termination:

-1 : improper input parameters status returned from MINPACK.

0 : the maximum number of function evaluations is exceeded.

1 : gtol termination condition is satisfied.

2 : ftol termination condition is satisfied.

3 : xtol termination condition is satisfied.

4 : Both ftol and xtol termination conditions are satisfied.

message str
Verbal description of the termination reason.

success bool
True if one of the convergence criteria is satisfied (status > 0).

注意：

方法 ‘lm’ (Levenberg-Marquardt) 調用 MINPACK(lmder、lmdif)中實現的最小二乘算法的包裝器。它運行製定為 trust-region 類型算法的 Levenberg-Marquardt 算法。該實現基於論文[JJMore]，它非常健壯和高效，並且有很多聰明的技巧。對於無約束問題，它應該是您的首選。請注意，它不支持邊界。而且，當 m < n 時它不起作用。

方法‘trf’(信任區域反射)的動機是求解方程組的過程，該方程組構成了[STIR]中闡述的bound-constrained最小化問題的first-order最優條件。該算法迭代地解決由特殊對角二次項增強的 trust-region 子問題，並具有由距邊界的距離和梯度方向確定的 trust-region 形狀。此增強函數有助於避免直接進入邊界並有效地探索整個變量空間。為了進一步提高收斂性，該算法考慮了邊界反映的搜索方向。為了滿足理論要求，算法保持迭代嚴格可行。對於稠密雅可比行列式trust-region，子問題可以通過一種與 [JJMore] 中說明的方法非常相似的精確方法來解決(並在 MINPACK 中實現)。與 MINPACK 實現的不同之處在於每次迭代都會進行雅可比矩陣的奇異值分解，而不是 QR 分解和一係列吉文斯旋轉消除。對於大型稀疏雅可比行列式，使用解決 trust-region 子問題的二維子空間方法 [STIR]、[Byrd]。該子空間由縮放梯度和 scipy.sparse.linalg.lsmr 提供的近似 Gauss-Newton 解決方案組成。當不施加任何約束時，該算法與 MINPACK 非常相似，並且通常具有可比的性能。該算法在無界和有界問題中都非常穩健，因此被選為默認算法。

方法‘dogbox’在trust-region框架中運行，但考慮矩形信任區域而不是傳統的橢球體[Voglis]。當前信任區域和初始邊界的交集再次是矩形，因此在每次迭代中，受邊界約束的二次最小化問題可以通過鮑威爾狗腿法[NumOpt]近似解決。對於密集雅可比行列式，所需的 Gauss-Newton 步驟可以精確計算，對於大型稀疏雅可比行列式，可以通過 scipy.sparse.linalg.lsmr 近似計算。當雅可比行列式的秩小於變量的數量時，該算法可能會表現出緩慢的收斂性。在具有少量變量的有界問題中，該算法通常優於‘trf’。

魯棒損失函數的實現如[BA]中所述。這個想法是在每次迭代時修改殘差向量和雅可比矩陣，使得計算的梯度和Gauss-Newton Hessian 近似與成本函數的真實梯度和 Hessian 近似相匹配。然後算法以正常方式進行，即魯棒損失函數被實現為標準最小二乘算法的簡單包裝。

參考：

[STIR] (1,2,3)

M. A. Branch、T. F. Coleman 和 Y. Li，“Large-Scale Bound-Constrained 最小化問題的子空間、內部和共軛梯度方法”，SIAM 科學計算雜誌，卷。 21，第 1 期，第 1-23 頁，1999 年。

[NR]

威廉 H. 出版社等。 al.，“數字食譜。科學計算的藝術。第三版”，秒。 5.7.

[伯德] (1,2)

R. H. Byrd、R. B. Schnabel 和 G. A. Shultz，“通過在二維子空間上最小化的信任域問題的近似解”，數學。編程，40，第 247-263 頁，1988 年。

[柯蒂斯]

A. Curtis、M. J. D. Powell 和 J. Reid，“關於稀疏雅可比矩陣的估計”，數學研究所及其應用雜誌，13，第 117-120 頁，1974 年。

[JJ更多] (1,2,3)

J. J. More，“Levenberg-Marquardt 算法：實現和理論”，數值分析，編輯。 G. A. Watson，數學講義 630，Springer Verlag，第 105-116 頁，1977 年。

[沃格利斯]

C. Voglis 和 I. E. Lagaris，“A Rectangular Trust Region Dogleg Approach for Unconstrained and Bound Constrained Nonlinear Optimization”，WSEAS 國際應用數學會議，希臘科孚島，2004 年。

[NumOpt]

J. Nocedal 和 S. J. Wright，“數值優化，第 2 版”，第 4 章。

[BA]

B.特裏格斯等。 al.，“Bundle Adjustment - A Modern Synthesis”，視覺算法國際研討會論文集：理論與實踐，第 298-372 頁，1999 年。

例子：

在這個例子中，我們找到了一個對自變量沒有限製的 Rosenbrock 函數的最小值。

>>> import numpy as np
>>> def fun_rosenbrock(x):
...     return np.array([10 * (x[1] - x[0]**2), (1 - x[0])])

請注意，我們隻提供殘差的向量。該算法將成本函數構造為殘差的平方和，從而給出 Rosenbrock 函數。確切的最小值在 x = [1.0, 1.0] 。

>>> from scipy.optimize import least_squares
>>> x0_rosenbrock = np.array([2, 2])
>>> res_1 = least_squares(fun_rosenbrock, x0_rosenbrock)
>>> res_1.x
array([ 1.,  1.])
>>> res_1.cost
9.8669242910846867e-30
>>> res_1.optimality
8.8928864934219529e-14

我們現在約束變量，使之前的解決方案變得不可行。具體來說，我們要求x[1] >= 1.5，和x[0]不受約束。為此，我們指定界限參數為least_squares在表格中bounds=([-np.inf, 1.5], np.inf).

我們還提供解析雅可比行列式：

>>> def jac_rosenbrock(x):
...     return np.array([
...         [-20 * x[0], 10],
...         [-1, 0]])

把這一切放在一起，我們看到新的解決方案是有限的：

>>> res_2 = least_squares(fun_rosenbrock, x0_rosenbrock, jac_rosenbrock,
...                       bounds=([-np.inf, 1.5], np.inf))
>>> res_2.x
array([ 1.22437075,  1.5       ])
>>> res_2.cost
0.025213093946805685
>>> res_2.optimality
1.5885401433157753e-07

現在我們求解一個包含 100000 個變量的 Broyden 三對角函數 vector-valued 的方程組(即，成本函數至少應為零)：

>>> def fun_broyden(x):
...     f = (3 - x) * x + 1
...     f[1:] -= x[:-1]
...     f[:-1] -= 2 * x[1:]
...     return f

對應的雅可比矩陣是稀疏的。我們告訴算法通過有限差分來估計它，並提供雅可比的稀疏結構來顯著加速這個過程。

>>> from scipy.sparse import lil_matrix
>>> def sparsity_broyden(n):
...     sparsity = lil_matrix((n, n), dtype=int)
...     i = np.arange(n)
...     sparsity[i, i] = 1
...     i = np.arange(1, n)
...     sparsity[i, i - 1] = 1
...     i = np.arange(n - 1)
...     sparsity[i, i + 1] = 1
...     return sparsity
...
>>> n = 100000
>>> x0_broyden = -np.ones(n)
...
>>> res_3 = least_squares(fun_broyden, x0_broyden,
...                       jac_sparsity=sparsity_broyden(n))
>>> res_3.cost
4.5687069299604613e-23
>>> res_3.optimality
1.1650454296851518e-11

讓我們還使用穩健的損失函數來解決曲線擬合問題，以處理數據中的異常值。將模型函數定義為 y = a + b * exp(c * t) ，其中 t 是預測變量，y 是觀察值，a、b、c 是要估計的參數。

首先，定義生成帶有噪聲和異常值的數據的函數，定義模型參數並生成數據：

>>> from numpy.random import default_rng
>>> rng = default_rng()
>>> def gen_data(t, a, b, c, noise=0., n_outliers=0, seed=None):
...     rng = default_rng(seed)
...
...     y = a + b * np.exp(t * c)
...
...     error = noise * rng.standard_normal(t.size)
...     outliers = rng.integers(0, t.size, n_outliers)
...     error[outliers] *= 10
...
...     return y + error
...
>>> a = 0.5
>>> b = 2.0
>>> c = -1
>>> t_min = 0
>>> t_max = 10
>>> n_points = 15
...
>>> t_train = np.linspace(t_min, t_max, n_points)
>>> y_train = gen_data(t_train, a, b, c, noise=0.1, n_outliers=3)

定義用於計算殘差和參數初始估計的函數。

>>> def fun(x, t, y):
...     return x[0] + x[1] * np.exp(x[2] * t) - y
...
>>> x0 = np.array([1.0, 1.0, 0.0])

計算標準最小二乘解：

>>> res_lsq = least_squares(fun, x0, args=(t_train, y_train))

現在計算兩個具有兩個不同魯棒損失函數的解決方案。參數 f_scale 設置為 0.1，這意味著內部殘差不應顯著超過 0.1(使用的噪聲水平)。

>>> res_soft_l1 = least_squares(fun, x0, loss='soft_l1', f_scale=0.1,
...                             args=(t_train, y_train))
>>> res_log = least_squares(fun, x0, loss='cauchy', f_scale=0.1,
...                         args=(t_train, y_train))

最後，繪製所有曲線。我們看到，通過選擇適當的損失，即使存在強異常值，我們也可以獲得接近最優的估計值。但請記住，通常建議先嘗試‘soft_l1’ 或‘huber’ 損失(如果有必要)，因為其他兩個選項可能會導致優化過程中的困難。

>>> t_test = np.linspace(t_min, t_max, n_points * 10)
>>> y_true = gen_data(t_test, a, b, c)
>>> y_lsq = gen_data(t_test, *res_lsq.x)
>>> y_soft_l1 = gen_data(t_test, *res_soft_l1.x)
>>> y_log = gen_data(t_test, *res_log.x)
...
>>> import matplotlib.pyplot as plt
>>> plt.plot(t_train, y_train, 'o')
>>> plt.plot(t_test, y_true, 'k', linewidth=2, label='true')
>>> plt.plot(t_test, y_lsq, label='linear loss')
>>> plt.plot(t_test, y_soft_l1, label='soft_l1 loss')
>>> plt.plot(t_test, y_log, label='cauchy loss')
>>> plt.xlabel("t")
>>> plt.ylabel("y")
>>> plt.legend()
>>> plt.show()

在下一個示例中，我們將展示如何使用 least_squares() 優化複變量的 complex-valued 殘差函數。考慮以下函數：

>>> def f(z):
...     return z - (0.5 + 0.5j)

我們將其包裝成一個實變量函數，該函數通過簡單地將實部和虛部作為自變量處理來返回實數殘差：

>>> def f_wrap(x):
...     fx = f(x[0] + 1j*x[1])
...     return np.array([fx.real, fx.imag])

因此，我們優化 2n 個實變量的 2m-D 實函數，而不是原來的 m-D n 個複變量的複函數：

>>> from scipy.optimize import least_squares
>>> res_wrapped = least_squares(f_wrap, (0.1, 0.1), bounds=([0, 0], [1, 1]))
>>> z = res_wrapped.x[0] + res_wrapped.x[1]*1j
>>> z
(0.49999999999925893+0.49999999999925893j)

相關用法

注：本文由純淨天空篩選整理自scipy.org大神的英文原創作品 scipy.optimize.least_squares。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數 ：：

返回 ：：

注意：

參考：

例子：

參數：：

返回：：