Run DLBCLone KNN Classification
DLBCLone_KNN.RdWeighted KNN on a feature (mutation) matrix with optional upweighting of user-specified "core" features, optional exclusion of "hidden" features, and optional optimization of an explicit outgroup (e.g. "Other"). WARNING: This function is not one of the core DLBCLone functions. You should probably be using DLBCLone_predict instead!
Usage
DLBCLone_KNN(
features_df,
metadata,
core_features = NULL,
core_feature_multiplier = 1.5,
hidden_features = NULL,
min_k = 5,
max_k = 60,
truth_column = "lymphgen",
truth_classes = c("EZB", "BN2", "ST2", "MCD", "N1", "Other"),
other_class = "Other",
optimize_for_other = TRUE,
predict_unlabeled = FALSE,
plot_samples = NULL,
DLBCLone_KNN_out = NULL,
seed = 12345,
epsilon = 0.001,
weighted_votes = TRUE,
skip_umap = FALSE
)Arguments
- features_df
Numeric matrix/data.frame (rows = samples, cols = features). Row names must be sample IDs.
- metadata
Data frame with at least
sample_idand the ground-truth label column given intruth_column.- core_features
Character vector of feature names to upweight (optional).
- core_feature_multiplier
Numeric multiplier for
core_features.Character vector of feature names to drop (optional).
- min_k, max_k
Integer K range to explore when optimizing.
- truth_column
Name of metadata column with ground-truth class labels.
- truth_classes
Character vector of all classes to consider (including
other_classif you intend to optimize for it).- other_class
Name of the explicit outgroup class (default: "Other").
- optimize_for_other
Logical; if TRUE, computes a separate "other" score (ratio) and searches a purity threshold; if FALSE, treats all classes symmetrically.
- predict_unlabeled
If TRUE, re-runs KNN to classify samples that were present in
features_dfbut not inmetadata.- plot_samples
Optional vector of sample_ids to keep in example plots.
- DLBCLone_KNN_out
Optional prior result; if supplied, its learned parameters are reused (skip optimization).
- seed
Random seed.
- epsilon
Small value added to distances before weighting.
- weighted_votes
If FALSE, neighbors are unweighted (equal votes).
- skip_umap
If TRUE, skip layout optimization plots at the end.
Value
A list with fields including:
- predictions
Per-sample vote/score summary and predicted labels
- DLBCLone_k_best_k
Best K found
- DLBCLone_k_purity_threshold
Best purity threshold (if applicable)
- DLBCLone_k_accuracy
Best accuracy metric achieved
- truth_classes, truth_column
Echoed arguments
- unlabeled_predictions
Predictions for unlabeled samples (if requested)
- df
Annotated layout for plotting (when built in this run)
- plot_truth, plot_predicted
ggplots when built in this run