. Advertisement .
..3..
. Advertisement .
..4..
I want to train data with the panda’s data frame.
I encounter this error ”valueerror: input contains nan, infinity or a value too large for dtype(‘float64’).” when standardizing data using scikit-learn’s StandardScaler
.
from sklearn.preprocessing import StandardScaler
#Training data (pandas.DataFrame type)
X = training_data()
# Standardization
sc = StandardScaler()
sc.fit(X)
Then I get this error message:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
What can I do about the ”valueerror: input contains nan, infinity or a value too large for dtype(‘float64’).” issue? Is there a better approach?
The cause: This error happens because you didn’t remove NaN and infinity from the input data.
Solution: To avoid this error, NaN and infinity must be eliminated from the input data.
For instance, you can eliminate a column from X if it has at least one NaN with the code below.
Each function’s description:
np.isnan(X)
: Get True for NaN elements, False matrix for other elementsnp.isnan(X).any()
: Get a list of True for columns containing NaN and False for other columnsX.columns[np.isnan(X).any()]
: Get column names containing NaNX.drop('col', axis = 1)
: Remove a column with column name col from X