shap.utils.hclust

shap.utils.hclust(X: _ArrayLike, y: _ArrayLike | None = None, linkage: Literal['single', 'complete', 'average'] = 'single', metric: str = 'auto', random_state: int | np.random.RandomState = 0) → npt.NDArray[Any]

针对目标变量 y，为特征 X 拟合层次聚类模型。

有关聚类方法的更多信息，请参阅 scipy.cluster.hierarchy.linkage()。

有关 scipy 距离度量的更多信息，请参阅 scipy.spatial.distance.pdist()。

参数:

X: 2d-array-like

要聚类的特征

y: array-like 或 None

目标变量

linkage: str

定义计算聚类之间距离的方法。必须是“single”、“complete”或“average”之一。

metric: str

Scipy 距离度量或“xgboost_distances_r2”。

如果为 xgboost_distances_r2，则使用 shap.utils.xgboost_distances_r2() 估计特征 X 相对于目标变量 y 的冗余距离。
否则，使用给定的距离度量计算特征之间的距离。
如果为 auto（默认），如果提供了目标变量，则使用 xgboost_distances_r2，否则使用 cosine 距离度量。

random_state: int 或 np.random.RandomState

Numpy 随机状态，默认为 0。

返回:

clustering: np.array: 编码为链接矩阵的层次聚类。