. Advertisement .
..3..
. Advertisement .
..4..
I am working on r, but I found the following warning message:
In predict.lm(y[[i]], C2[[i]]) :
prediction from a rank-deficient fit may be misleading
Is there any way to stabilize the issue “prediction from a rank-deficient fit may be misleading”? I read a lot of topics about this, but all of them were trying to install anything. Is this the correct way, or any recommendation for me? Please find the beginning command below:
# Fit regression model to each cluster
y <- list()
length(y) <- k
vars <- list()
length(vars) <- k
f <- list()
length(f) <- k
for (i in 1:k) {
vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"])
f[[i]] <- as.formula(paste("Death ~", paste(vars[[i]], collapse= "+")))
y[[i]] <- lm(f[[i]], data=C1[[i]]) #training set
C1[[i]] <- cbind(C1[[i]], fitted(y[[i]]))
C2[[i]] <- cbind(C2[[i]], predict(y[[i]], C2[[i]])) #test set
}
The cause: A matrix that lacks “full rank” is called as the “rank deficient.” The trouble is that
predict.lm
will issue this warning even if your matrices are full rank (not rank deficient), sincepredict.lm
pulls a quick one under the hood and modifies your full rank input to be rank-deficient by tossing out what it considers worthless data. It then is expressed by giving a warning. Also, it appears that this announcement is a catch-all for other instances, such as when you have too many input features and your data density is too limited, and it is advising you that your predictions are fragile.This is an example of predicting while giving full rank matrices,
predict.lm
is still warning about the lack of rank.Solution: Disregard the warning if
predict
is providing positive results. But ifpredict.lm
delivers its opinion based on a lack of perspective, you can eliminate errors on the predict step as follows:body(predict.lm)
allows you to inspect the predict function. This line will appear:This warns you if your data matrix rank is less than the number of parameters that you wish to fit. It can be invoked by using collinear covariates.
In
data
, x3 and 4 have the same direction. The multiple of the other is one. You can check this withlength(fit$coefficients) > fit$rank
A second way is to have more parameters than the variables available: