Skip to contents

Ensemble aggregation of spatial summaries (e.g. Ripley's K) using a combination of random weighting and weighting by the number of points

Usage

combo.weight.avg(
  data,
  group,
  outcome,
  cens = NULL,
  model = "survival",
  adjustments = NULL,
  n.ensemble = 1000,
  seed = NULL
)

Arguments

data

data.frame or tibble containing data, including patient IDs, image IDs, outcomes, and spatial summaries.

group

Grouping variable for images, e.g. patient IDs.

outcome

Column name for outcome in data.

cens

Column name for event indicator column. If not using a survival outcome, leave as NULL.

model

Model type. Options are "logistic" for logistic regression and "survival" for a Cox proportional hazards model.

adjustments

Column names for columns in data to adjust for in each ensemble replication

n.ensemble

How many ensemble replications should be done? Default is 1000.

seed

A seed for reproducible results. If left NULL, the funciton will generate a seed for you and return it.

Value

A list of length 2 containing the overall p-value obtained using the Cauchy combination test (`pval`) and the seed used for reproducibility (`seed`). If the user provided a seed, then `seed` will match what was given. Otherwise, the function randomly selects a seed and this is returned.

Examples

# Pick a radius to evaluate Ripley's K
r <- 30

# Save the image IDs
ids <- unique(simdata$id)

# Compute Ripley's K for tumor cells at r
K.vec <- c()
npoints.vec <- c()
for (i in 1:length(ids)) {
  # Save the ith image
  image.i <- simdata %>%
    dplyr::filter(id == ids[i]) %>%
    dplyr::select(x,y,type)

  # Convert to a point process object
  w <- spatstat.geom::convexhull.xy(image.i$x, image.i$y)
 image.i.subset <- image.i %>% dplyr::filter(type == "a")
 image.ppp <- spatstat.geom::as.ppp(image.i.subset, W = w)

  # Compute Kest
  Ki <- spatstat.explore::Kest(image.ppp, r = 0:30)

  # Calculate the number of points
  npoints.vec[i] <- spatstat.geom::npoints(image.ppp)

  # Save the result
  K.vec[i] <- Ki$iso[31]
}

data.subset <- simdata %>%
  dplyr::select(tidyselect::all_of(c("id", "PID", "out"))) %>%
  dplyr::distinct()

data.subset$spatial <- K.vec
data.subset$npoints <- npoints.vec

# Remove NaNs
data.subset <- data.subset %>% dplyr::filter(!is.na(spatial))

# Test
combo.weight.avg(data = data.subset, group = "PID", outcome = "out", model = "logistic")
#> $pval
#> [1] 0.02052924
#> 
#> $seed
#> [1] 147083781
#>