Cette page a été traduite par traduction automatique. Cliquez ici pour voir la dernière version en anglais.

Tall array et `mapreduce`

Analyser de grands ensembles de données en parallèle à l'aide de tableaux et de datastore de grande taille MATLAB^® ou mapreduce sur des clusters Spark™ et Hadoop^® et des pools parallèles

Vous pouvez utiliser Parallel Computing Toolbox™ pour évaluer des expressions de tableau de grande taille en parallèle à l’aide d’un pool parallèle sur votre desktop. L'utilisation de tall arrays vous permet d'exécuter des applications de données volumineuses qui ne tiennent pas dans la mémoire de votre machine. Vous pouvez également utiliser Parallel Computing Toolbox pour augmenter le traitement des tableaux de grande taille en vous connectant à un pool parallèle exécuté sur un cluster MATLAB Parallel Server™. Vous pouvez également utiliser un cluster Hadoop compatible Spark exécutant MATLAB Parallel Server. Pour plus d'informations, voir Big Data Workflow Using Tall Arrays and Datastores.

Fonctions

développer tout

Fonctions importantes

`tall`	Create tall array
`datastore`	Create datastore for large collections of data
`mapreduce`	Programming technique for analyzing data sets that do not fit in memory
`mapreducer`	Define parallel execution environment for mapreduce and tall arrays
`partition`	Partition a datastore
`numpartitions`	Number of datastore partitions

Classes

développer tout

Classes importantes

`parallel.Pool`	Pool parallèle de workers
`parallel.cluster.Hadoop`	Hadoop cluster for mapreducer, mapreduce and tall arrays
`parallel.cluster.Spark`	Spark cluster for mapreducer, mapreduce and tall arrays (depuis R2022b)

Exemples et procédures

Big Data Workflow Using Tall Arrays and Datastores
Learn about typical workflows using tall arrays to analyze big data sets.
Use Tall Arrays on a Parallel Pool
Discover tall arrays in Parallel Computing Toolbox and MATLAB Parallel Server.
Process Big Data in the Cloud
This example shows how to access a large data set in the cloud and process it in a cloud cluster using MATLAB® capabilities for big data.
Use Parallel Computing to Optimize Big Data Set for Analysis
This example shows how to optimize data preprocessing for analysis using parallel computing. (depuis R2024a)
Use Tall Arrays on a Spark Cluster
Create and use tall tables on Spark clusters without changing your MATLAB code.
Run MapReduce on a Parallel Pool
Try mapreduce for advanced analysis of big data using Parallel Computing Toolbox.
Run mapreduce on a Hadoop Cluster
Learn about mapreduce for advanced big data analysis on a Hadoop cluster.
Partition a Datastore in Parallel
Use partition to split your datastore into smaller parts.

Concepts

Exécuter du code sur des pools parallèles
Découvrir comment démarrer et arrêter les pools parallèles, la taille du pool et la sélection du cluster.