You can create and save a datastore on a platform that loads and works seamlessly on a different platform by setting up the'AlternateFileSystemRoots'
property of the datastore. Use this property when:
You create a datastore on a local machine, and need to access and process the data on another machine (possibly running a different operating system).
You process your datastore with parallel and distributed computing involving different platforms, cloud or cluster machines.
This example demonstrates the use of the'AlternateFileSystemRoots'
property forTabularTextDatastore
. However, you can use the same syntax for any of these datastores:SpreadsheetDatastore
,ImageDatastore
,ParquetDatastore
,FileDatastore
,KeyValueDatastore
, andTallDatastore
. To use the'AlternateFileSystemRoots'
functionality for custom datastores, seematlab.io.datastore.DsFileSet
andDevelop Custom Datastore.
Create a datastore on one file system that loads and works seamlessly on a different machine (possibly of a different operating system). For example, create a datastore on a Windows®machine, save it, and then load it on a Linux®machine.
First, before you create and save the datastore, identify the root paths for your data on the different platforms. The root paths will differ based on the machine or file system. For instance, if you have data on your local machine and a copy of the data on a cluster, then get the root paths for accessing the data:
"Z:\DataSet"
for your local Windows machine.
"/nfs-bldg001/DataSet"
for your Linux cluster.
Then, associate these root paths by using the'AlternateFileSystemRoots'
parameter of the datastore.
altRoots = ["Z:\DataSet","/nfs-bldg001/DataSet"]; ds = tabularTextDatastore('Z:\DataSet','AlternateFileSystemRoots',altRoots);
Examine theFiles
property of datastore. In this instance, theFiles
property contains the location of your data as accessed by your Windows machine.
ds.Files
ans = 5×1 cell array {'Z:\DataSet\datafile01.csv'} {'Z:\DataSet\datafile02.csv'} {'Z:\DataSet\datafile03.csv'} {'Z:\DataSet\datafile04.csv'} {'Z:\DataSet\datafile05.csv'}
saveds_saved_on_Windows.matds
Files
property. Since the root path'Z:\DataSet'
is not accessible on the Linux cluster, at load time, the datastore function automatically updates the root paths based on the values specified in the'AlternateFileSystemRoots'
parameter. TheFiles
数据库现在包含的属性更新root paths for your data on the Linux cluster.loadds_saved_on_Windows.matds.Files
ans = 5×1 cell array {'/nfs-bldg001/DataSet/datafile01.csv'} {'/nfs-bldg001/DataSet/datafile02.csv'} {'/nfs-bldg001/DataSet/datafile03.csv'} {'/nfs-bldg001/DataSet/datafile04.csv'} {'/nfs-bldg001/DataSet/datafile05.csv'}
To process your datastore with parallel and distributed computing that involves different platforms, cloud or cluster machines, you must predefine the'AlternateFileSystemRoots'
parameter. This example demonstrates how to create a datastore on your local machine, analyze a small portion of the data, and then use Parallel Computing Toolbox™ andMATLAB®Parallel Server™to scale up the analysis to the entire dataset.
Create a datastore and assign a value to the'AlternateFileSystemRoots'
property. To set the value for the'AlternateFileSystemRoots'
property, identify the root paths for your data on the different platforms. The root paths differ based on the machine or file system. For example, identify the root paths for data access from your machine and your cluster:
"Z:\DataSet"
from your local Windows Machine.
"/nfs-bldg001/DataSet"
from theMATLAB Parallel ServerLinux Cluster.
Then, associate these root paths using theAlternateFileSystemRoots
property.
altRoots = ["Z:\DataSet","/nfs-bldg001/DataSet"]; ds = tabularTextDatastore('Z:\DataSet','AlternateFileSystemRoots',altRoots);
Analyze a small portion of the data on your local machine. For instance, get a partitioned subset of the data, clean the data by removing any missing entries, and examine a plot of the variables.
tt = tall(partition(ds,100,1)); summary(tt);% analyze your datatt = rmmissing(tt); plot(tt.MyVar1,tt.MyVar2)
Scale up your analysis to the entire dataset by usingMATLAB Parallel Servercluster (Linux cluster). For instance, start a worker pool using the cluster profile, and then perform analysis on the entire dataset by using parallel and distributed computing capabilities.
parpool('MyMjsProfile') tt = tall(ds); summary(tt);% analyze your datatt = rmmissing(tt); plot(tt.MyVar1,tt.MyVar2)
数据存储
|TabularTextDatastore
|SpreadsheetDatastore
|ImageDatastore
|FileDatastore
|KeyValueDatastore
|TallDatastore