[SOLVED] How to do Wilcox test function when reading data from dataframe, in R?

Issue

I’m trying to make this function work, but am failing.
What I need is a function that reads the names from a dataframe columns and uses them to perform a Wilcoxon test on each of those columns. "result" would be the main final product, a table with the genus names and their p-values on each row. I’ve added also a plotting feature for visualizing the values among groups for each column, that I would save naming them after the corresponding genus.

library("dplyr")
library("ggpubr")
library(PairedData)
library(tidyr)
         
    process <- function(data, genus){
          group_by(data,group) %>%summarise(
              count = n(),
              median = median(genus, na.rm = TRUE),
              IQR = IQR(genus, na.rm = TRUE)
            )
          # Subset data before and after treatment
          T0 <- subset(data,  group == "T0", genus,drop = TRUE)
          T2 <- subset(data,  group == "T2", genus,drop = TRUE)
          #Wilcoxon test for paired data, I want a table of names and corresponding p-values
          res <- wilcox.test(T0, T2, paired = TRUE)
          res$p.value
          result <- spread(genus,res$p.value)
          # Plot paired data, with title depending on the data and its p-value (this last one could be optional)
          pd <- paired(T0, T2)
          tiff(genus".tiff", width = 600, height = 400)
           plot(pd, type = "profile") + labs(title=print(data[,genus]", paired p-value="res[,p.value]) +theme_bw()
          dev.off()
      }
        
        l <- length(my_data)
        glist <- list(colnames(my_data[3:l])) #bacteria start at col 3
        wilcoxon <- process(data = my_data, genus = glist)

A reproducible dataset could be

my_data    
    Patient group   Subdoligranulum Agathobacter
    pt_10T0 T0  0.02    0.00 
    pt_10T2 T2  10.71   19.89 
    pt_15T0 T0  29.97   0.28 
    pt_15T2 T2  16.10   7.70 
    pt_20T0 T0  2.39    0.44 
    pt_20T2 T2  20.48   3.35 
    pt_32T0 T0  12.23   0.17 
    pt_32T2 T2  37.11   1.87 
    pt_36T0 T0  0.64    0.03 
    pt_36T2 T2  0.02    0.08 
    pt_39T0 T0  0.04    0.01 
    pt_39T2 T2  0.36    0.05 
    pt_3t0  T0  13.23   1.34 
    pt_3T2  T2  19.22   1.51 
    pt_9T0  T0  11.69   0.57 
    pt_9T2  T2  34.56   3.52 

I’m not very familiar with functions, and haven’t found yet a good tutorial on how to make them from a dataframe… so this is my best attempt, I hope some of you can make it work.
Thank you for the help!

Solution

Simply, return the needed value at end of processing. Below does not test the plot step (with unknown packages) but adjusted for proper R grammar:

proc_wilcox <- function(data, genus){
    # Subset data before and after treatment
    T0 <- data[[genus]][data$group == "T0"]
    T2 <- data[[genus]][data$group == "T2"]

    # Wilcoxon test for paired data
    res <- wilcox.test(T0, T2, paired = TRUE)

    # Plot paired data, with title depending on the data and its p-value
    # pd <- paired(T0, T2)
    # tiff(paste0(genus, ".tiff"), width = 600, height = 400)
    # plot(pd, type = "profile") + 
    #   labs(title=paste0(genus, " paired p-value= ", res$p.value)) + 
    #   theme_bw()
    # dev.off()

    return(res$p.value)
}

Then, call the method with an apply function such as sapply or slightly faster vapply designed to process across iterables and return same length.

# VECTOR OF RESULTS (USING sapply)
wilcoxon_results <- sapply(
  names(my_data)[3:ncol(my_data)], 
  function(col) proc_wilcox(my_data, col)
)

# VECTOR OF RESULTS (USING vapply)
wilcoxon_results <- vapply(
  names(my_data)[3:ncol(my_data)], 
  function(col) proc_wilcox(my_data, col),
  numeric(1)
)

wilcoxon_results
# Subdoligranulum    Agathobacter 
#       0.1484375       0.0078125 

wilcoxon_df <- data.frame(wilcoxon_results)
wilcoxon_df
#                 wilcoxon_results
# Subdoligranulum        0.1484375
# Agathobacter           0.0078125

Answered By – Parfait

Answer Checked By – Marilyn (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *