拆分标签
根据指定的比例查找索引以拆分标签
Syntax
Description
当您处理机器或深度学习分类问题时,请使用此功能,并希望将数据集分为培训,测试和验证集,以持有相同比例的标签值。
Examples
分裂元音
Read William Shakespeare's sonnets with thefileread
function. Extract all the vowels from the text and convert them to lowercase.
十四行诗= fileread(“十四行诗”);元音= lower(十四行诗(REGEXP)(十四行诗,“ [aeiouaeiou]”)))';
Count the number of instances of each vowel.
cnts = countlabels(vowels)
cnts=5×3 table实验室el Count Percent _____ _____ _______ a 4940 18.368 e 9028 33.569 i 4895 18.201 o 5710 21.232 u 2321 8.6302
将元音分成一个训练集,其中包含每个元音的500个实例,包含300个验证集以及其余的测试集。所有元音在前两组中均具有相等的权重,但在第三组中却没有。
spltn = splitlabels(元音,[500 300]);forkj = 1:length(spltn) cntsn{kj} = countlabels(vowels(spltn{kj}));endcntsn {:}
ans=5×3 table标签计数百分比_____ _____ _______ A 500 20 E 500 20 I 500 20 O 500 20 U 500 20
ans=5×3 table标签计数百分比_____ _____ _______ A 300 20 E 300 20 I 300 20 O 300 20 U 300 20
ans=5×3 table实验室el Count Percent _____ _____ _______ a 4140 18.083 e 8228 35.94 i 4095 17.887 o 4910 21.447 u 1521 6.6437
Split the vowels into a training set containing 50% of the instances, a validation set containing another 30%, and a testing set with the rest. All vowels are represented with the same weight across all three sets.
spltp = splitlabels(元音,[0.5 0.3]);forkj = 1:length(spltp) cntsp{kj} = countlabels(vowels(spltp{kj}));endcntsp {:}
ans=5×3 table实验室el Count Percent _____ _____ _______ a 2470 18.367 e 4514 33.566 i 2448 18.203 o 2855 21.23 u 1161 8.6333
ans=5×3 table标签计数百分比_____ _____ _______ A 1482 18.371 E 2708 33.569 I 1468 18.198 O 1713 21.235 U 696 8.6277
ans=5×3 table实验室el Count Percent _____ _____ _______ a 988 18.368 e 1806 33.575 i 979 18.2 o 1142 21.231 u 464 8.6261
分裂元音和Consonants
Read William Shakespeare's sonnets with thefileread
function. Remove all nonalphabetic characters from the text and convert to lowercase.
十四行诗= fileread(“十四行诗”);letters = lower(sonnets(regexp(sonnets,“ [A-Z]”)))';
Classify the letters as consonants or vowels and create a table with the results. Show the first few rows of the table.
type = repmat("consonant",size(letters)); type(regexp(letters',"[aeiou]")) =“元音”;t =表(字母,类型,'variablenames',[["Letter""Type"]); head(T)
ans=8×2桌Letter Type ______ ___________ t "consonant" h "consonant" e "vowel" s "consonant" o "vowel" n "consonant" n "consonant" e "vowel"
显示每个类别的实例数。
cnt = countlabels(T,'TableVariable',"Type")
cnt=2×3桌Type Count Percent _________ _____ _______ consonant 46516 63.365 vowel 26894 36.635
Split the table into two sets, one containing 60% of the consonants and vowels and the other containing 40%. Display the number of instances of each category.
splt = splitlabels(T,0.6,'TableVariable',"Type");sixty = countlabels(T(splt{1},:),'TableVariable',"Type")
sixty=2×3桌类型计数百分比_________ _____ _______辅音27910 63.366元音16136 36.634
四十= countlabels(t(splt {2},:),'TableVariable',"Type")
forty=2×3桌Type Count Percent _________ _____ _______ consonant 18606 63.363 vowel 10758 36.637
Split the table into two sets, one containing 60% of each particular letter and the other containing 40%. Exclude the lettery,有时充当辅音,有时是元音。显示每个类别的实例数。
splt = splitlabels(T,0.6,'Exclude',“ Y”);sixti = countlabels(T(splt{1},:),'TableVariable',"Type")
sixti =2×3桌类型计数百分比_________ _____ _______辅音26719 62.346元音16137 37.654
forti = countlabels(T(splt{2},:),'TableVariable',"Type")
forti =2×3桌类型计数百分比_________ _____ _______辅音17813 62.349元音10757 37.651
Split the table into two sets of the same size. Include only the letterse和s。随机化集合。
halves = splitlabels(T,0.5,“随机”,'包括',[["e"“ S”]); cnt = countlabels(T(halves{1},:))
cnt=2×3桌字母计数百分比______ _____ _______ e 4514 64.385 S 2497 35.615
Split Data in Datastore
Create a dataset that consists of 100 Gaussian random numbers. Label 40 of the numbers asA
,30 asB
,30C
。Store the data in a combined datastore containing two datastores. The first datastore has the data and the second datastore contains the labels.
dsdata = arraydatastore(randn(100,1));dslabels = arraydatastore([repmat(("A",40,1); repmat("B",30,1);repmat("C",30,1)]); dsDataset = combine(dsData,dsLabels); cnt = countlabels(dsDataset,“基础DatastoreIndex”,2)
cnt=3×3 table实验室el Count Percent _____ _____ _______ A 40 40 B 30 30 C 30 30
将数据集分为两组,一组包含60%的数字,另一个包含其他数字。
splitIndices = splitlabels(dsdataset,0.6,,“基础DatastoreIndex”,2); dsDataset1 = subset(dsDataset,splitIndices{1}); cnt1 = countlabels(dsDataset1,“基础DatastoreIndex”,2)
cnt1=3×3 table实验室el Count Percent _____ _____ _______ A 24 40 B 18 30 C 18 30
dsdataset2 = subset(dsdataset,splitIndices {2});cnt2 = countlabels(dsdataset2,“基础DatastoreIndex”,2)
cnt2=3×3 table实验室el Count Percent _____ _____ _______ A 16 40 B 12 30 C 12 30
Input Arguments
lblsrc
—Input label source
categorical vector|字符串向量|逻辑向量|数字向量|单元阵列|table|数据存储|组合的datastore
object
Input label source, specified as one of these:
A categorical vector.
字符串向量或字符向量的单元格数组。
A numeric vector or a cell array of numeric scalars.
逻辑向量或逻辑标量的单元格数组。
A table with variables containing any of the previous data types.
A datastore whose
readall
函数返回以前的任何数据类型。A
组合的datastore
object containing an underlying datastore whosereadall
函数返回以前的任何数据类型。In this case, you must specify the index of the underlying datastore that has the label values.
lblsrc
必须包含可以转换为具有离散类别集的向量的标签。
Example:lblsrc= categorical(["B" "C" "A" "E" "B" "A" "A" "B" "C" "A"],["A" "B" "C" "D"])
将标签源创建为具有四个类别的十个样本分类向量:A
,B
,C
, 和D
。
Example:lblsrc= [0 7 2 5 11 17 15 7 7 11]
creates the label source as a ten-sample numeric vector.
数据类型:单身的
|double
|int8
|int16
|int32
|INT64
|UINT8
|uint16
|UINT32
|Uint64
|logical
|char
|细绳
|table
|cell
|categorical
p
—Proportions or numbers of labels
整数标量|scalar in (0, 1)|向量的整数|vector of fractions
Proportions or numbers of labels, specified as an integer scalar, a scalar in the range (0, 1), a vector of integers, or a vector of fractions.
If
p
是标量,拆分标签
找到两个分裂索引集并返回一个两元件单元格数组idxs
。If
p
是整数,是idxs
contains a vector of indices pointing to the firstp
values of each label category. The second element ofidxs
contains indices pointing to the remaining values of each label category.If
p
是范围(0,1)的值lblsrc
有Kielements in theith category, the first element ofidxs
contains a vector of indices pointing to the firstp
×Kivalues of each label category. The second element ofidxs
contains the indices of the remaining values of each label category.
If
p
是一个vector withN元素formp1,p2, …,pN,拆分标签
findsN+ 1拆分索引集并返回(N+ 1)-element cell array inidxs
。If
p
是一个向量的整数, the first element ofidxs
是一个vector of indices pointing to the firstp1每个标签类别的值,下一个元素idxs
contains the nextp2每个标签类别的值,依此类推。最后一个元素idxs
包含每个标签类别的其余索引。If
p
是一个vector of fractions andlblsrc
有Ki元素ith category, the first element ofidxs
是索引的矢量串联p1×Kivalues of each category, the next element ofidxs
contains the nextp2×Ki每个标签类别的值,依此类推。最后一个元素idxs
包含每个标签类别的其余索引。
Note
If
p
包含分数,然后其元素的总和不得大于一个。If
p
contains numbers of label values, then the sum of its elements must not be greater than the smallest number of labels available for any of the label categories.
数据类型:单身的
|double
|int8
|int16
|int32
|INT64
|UINT8
|uint16
|UINT32
|Uint64
姓名-Value Arguments
Specify optional comma-separated pairs of姓名,Value
arguments.姓名
is the argument name and价值
是相应的值。姓名
must appear inside quotes. You can specify several name and value pair arguments in any order as姓名1,Value1,...,NameN,ValueN
。
'tablevariable',“ reakode”,'reford',[“ 617”“ 508”]
specifies that the function split labels based on telephone area code and exclude numbers from Boston and Natick.
Include
—标签要包括在索引集中
vector of label categories|标签类别的单元格数组
实验室els to include in the index sets, specified as a vector or cell array of label categories. The categories specified with this argument must be of the same type as the labels inlblsrc
。向量或单元格数组中的每个类别都必须匹配一个标签类别之一lblsrc
。
排除
—标签要排除在索引集中
vector of label categories|标签类别的单元格数组
实验室els to exclude from the index sets, specified as a vector or cell array of label categories. The categories specified with this argument must be of the same type as the labels inlblsrc
。向量或单元格数组中的每个类别都必须匹配一个标签类别之一lblsrc
。
TableVariable
—表变量读取
first table variable(默认)|角色向量|字符串标量
表变量来读,指定为一个角色vector or string scalar. If this argument is not specified, then拆分标签
uses the first table variable.
基础DatastoreIndex
—Underlying datastore index
整数标量
基础数据存储索引,指定为整数标量。此参数适用lblsrc
是一个组合的datastore
目的。拆分标签
计算使用使用该标签的数据存储中的标签UnderlyingDatastores
property oflblsrc
。
输出参数
idxs
— Splitting indices
单元阵列
分裂索引,返回为单元格数组。
Apri esempio
Si Discone di Una版本di Questo Esempio。Desideri Aprire Questo Esempio con le tue modifiche?
Comando Matlab
Hai fatto clic su un collegamento che corrisponde a questo comando MATLAB:
Esegui il comando inserendolo nella finestra di comando MATLAB. I browser web non supportano i comandi MATLAB.
Select a Web Site
选择一个网站以获取可用的翻译内容,并查看本地事件和优惠。根据您的位置,我们建议您选择:。
您还可以从以下列表中选择一个网站:
如何获得最佳网站性能
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)