Ensemble aggregation of spatial summaries (e.g. Ripley's K) using a combination of random weighting and weighting by the number of points

Usage

combo.weight.avg(
  data,
  group,
  outcome,
  cens = NULL,
  model = "survival",
  adjustments = NULL,
  n.ensemble = 1000,
  seed = NULL
)

Arguments

data: data.frame or tibble containing data, including patient IDs, image IDs, outcomes, and spatial summaries.
group: Grouping variable for images, e.g. patient IDs.
outcome: Column name for outcome in data.
cens: Column name for event indicator column. If not using a survival outcome, leave as NULL.
model: Model type. Options are "logistic" for logistic regression and "survival" for a Cox proportional hazards model.
adjustments: Column names for columns in data to adjust for in each ensemble replication
n.ensemble: How many ensemble replications should be done? Default is 1000.
seed: A seed for reproducible results. If left NULL, the funciton will generate a seed for you and return it.

Value

A list of length 2 containing the overall p-value obtained using the Cauchy combination test (`pval`) and the seed used for reproducibility (`seed`). If the user provided a seed, then `seed` will match what was given. Otherwise, the function randomly selects a seed and this is returned.

Examples

# Pick a radius to evaluate Ripley's K
r <- 30

# Save the image IDs
ids <- unique(simdata$id)

# Compute Ripley's K for tumor cells at r
K.vec <- c()
npoints.vec <- c()
for (i in 1:length(ids)) {
  # Save the ith image
  image.i <- simdata %>%
    dplyr::filter(id == ids[i]) %>%
    dplyr::select(x,y,type)

  # Convert to a point process object
  w <- spatstat.geom::convexhull.xy(image.i$x, image.i$y)
 image.i.subset <- image.i %>% dplyr::filter(type == "a")
 image.ppp <- spatstat.geom::as.ppp(image.i.subset, W = w)

  # Compute Kest
  Ki <- spatstat.explore::Kest(image.ppp, r = 0:30)

  # Calculate the number of points
  npoints.vec[i] <- spatstat.geom::npoints(image.ppp)

  # Save the result
  K.vec[i] <- Ki$iso[31]
}

data.subset <- simdata %>%
  dplyr::select(tidyselect::all_of(c("id", "PID", "out"))) %>%
  dplyr::distinct()

data.subset$spatial <- K.vec
data.subset$npoints <- npoints.vec

# Remove NaNs
data.subset <- data.subset %>% dplyr::filter(!is.na(spatial))

# Test
combo.weight.avg(data = data.subset, group = "PID", outcome = "out", model = "logistic")
#> $pval
#> [1] 0.02052924
#> 
#> $seed
#> [1] 147083781
#>