Working with missing data is a common task in data preprocessing. Although sometimes missing values signify a meaningful event in the data, they often represent unreliable or unusable data points. In either case, MATLAB® has many options for handling missing data.
MATLAB中缺少值所采用的形式取决于数据类型。例如,数字数据类型,例如double
采用NaN
(not a number) to represent missing values.
x = [NaN 1 2 3 4];
You can also use themissing
表示缺少数字数据或其他类型的数据的价值,例如约会时间
,string
, 和分类
。MATLAB自动转换missing
value to the data's native type.
xDouble = [缺少1 2 3 4]
xDouble =1×5NaN 1 2 3 4
xDatetime = [missing datetime(2014,1:4,1)]
xDatetime =1x5 datetime array第1至3列NAT 01-JAN-2014 00:00:00 01-FEB-2014 00:00:00列4至5 01-MAR-2014 00:00:00:00 01-APR-2014 00:00:00:00:00
xString = [missing“一种”“ B”“C”“ D”]
xString =1x5 string array<缺少>“ a”“ b”“ c”“ d”
xCategorical = [缺少分类({{'cat1''cat2''cat3''cat4'})]
xCategorical =1x5 categorical array< undefined> cat1 cat2 cat3 cat4
A data set might contain values that you want to treat as missing data, but are not standard MATLAB missing values in MATLAB such asNaN
。You can use thestandardizeMissing
function to convert those values to the standard missing value for that data type. For example, treat 4 as a missingdouble
value in addition toNaN
。
xStandard = standardizeMissing(xDouble,[4 NaN])
xStandard =1×5NaN 1 2 3 NaN
Suppose you want to keep missing values as part of your data set but segregate them from the rest of the data. Several MATLAB functions enable you to control the placement of missing values before further processing. For example, use the'MissingPlacement'
option with the种类
function to moveNaN
s到数据的末尾。
xSort = sort(xStandard,'MissingPlacement','last')
xSort =1×51 2 3 Nan Nan
Even if you do not explicitly create missing values in MATLAB, they can appear when importing existing data or computing with the data. If you are not aware of missing values in your data, subsequent computation or analysis can be misleading.
For example, if you unknowingly plot a vector containing aNaN
价值,NaN
没有出现,因为plot
function ignores it and plots the remaining points normally.
nandata = [1:9 nan];情节(1:10,Nandata)
但是,如果计算数据的平均值,结果是NaN
。在这种情况下,事先知道数据包含一个NaN
, 和then choose to ignore or remove it before computing the average.
sundata =平均值(nandata)
meanData = NaN
One way to findNaN
s in data is by using theisnan
function, which returns a logical array indicating the location of anyNaN
value.
TF = isnan(nanData)
TF =1x10逻辑数组0 0 0 0 0 0 0 0 0 1 1
Similarly, theismissing
function returns the location of missing values in data for multiple data types.
tfdouble =ismissing(xDouble)
tfdouble =1x5 logical array1 0 0 0 0
TFdatetime = ismissing(xDatetime)
TFdatetime =1x5 logical array1 0 0 0 0
假设您正在使用由具有多种数据类型的变量组成的表或时间表。您可以通过一个呼叫找到所有缺失值ismissing
,无论他们的类型如何。
Xtable = table(xdouble',xdateTime',xstring',xcategorical')
xTable =5×4桌Var1 Var2 Var3 Var4 ____ ____________________ _________ ___________ NaN NaT1 01-Jan-2014 00:00:00 "a" cat1 2 01-Feb-2014 00:00:00 "b" cat2 3 01-Mar-2014 00:00:00 "c" cat3 4 01-Apr-2014 00:00:00 "d" cat4
TF = ismissing(xTable)
TF =5 x4逻辑阵列1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Missing values can represent unusable data for processing or analysis. Usefillmissing
to replace missing values with another value, or usermmissing
完全删除缺失值。
xFill = fillmissing(xStandard,'constant',0)
xFill =1×50 1 2 3 0
xremove =rmmissing(xStandard)
xremove =1×31 2 3
Many MATLAB functions enable you to ignore missing values, without having to explicitly locate, fill, or remove them first. For example, if you compute the sum of a vector containingNaN
values, the result isNaN
。However, you can directly ignoreNaN
通过使用'omitnan'
option with thesum
function.
sumNan = sum(xDouble)
sumnan = nan
sumOmitnan = sum(xDouble,'omitnan')
sumOmitnan = 10
fillmissing
|ismissing
|missing
|standardizeMissing