-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hello,
I am a beginner in bioinformatics and still learning the standard practices, so I may be missing something, but I wanted to report a behavior I observed when using lefser’s ldaFunction.
When the class variable is a factor with two levels (e.g., "CL_B" and "CL_A") instead of numeric values 0 and 1, the calculation of effect_size appears to fail.
In the code, effect_size is computed as:
effect_size <- abs(mean(LD[data[,"class"] == 1]) - mean(LD[data[,"class"] == 0]))
This seems to assume that the class values are literally 1 and 0.
If they are not (for example, factors or characters), the comparison data[,"class"] == 1 always returns FALSE, resulting in an effect_size of 0 or NA, and thus invalid LDA results.
The main lefser function does not seem to automatically convert character/factor class labels into numeric values before passing them to ldaFunction. The class is converted to a factor but then added as-is to the dataframe, which leads to effect_size not being calculated as intended.
I could fix this locally by recoding the classes to numeric (0 and 1), but I just wanted to check if:
This behavior is expected (and I missed it in the documentation)
Or if it would make sense to update the function to handle factors directly or issue a warning
Thank you very much for your work on this package, and apologies if I am overlooking a standard workflow here.
str(relab_sub_t_df$class)
Factor w/ 2 levels "OCE22","OCE23": 1 1 1 1 1 1 1 1 1 1 ...test_data <- relab_sub_t_df
test_data$class <- ifelse(test_data$class == "OCE22", 0, 1)str(test_data$class)
num [1:24] 0 0 0 0 0 0 0 0 0 0 ...A <- ldaFunction(test_data)
B <- ldaFunction(relab_sub_t_df)mean( max(abs(A - B)) / abs(mean(c(A,B))) )
[1] 0.02602692