Effacer les filtres
Effacer les filtres

Dimension reduction for logical arrays

11 vues (au cours des 30 derniers jours)
André
André le 14 Nov 2014
Réponse apportée : Nihal le 11 Sep 2024 à 5:48
I have measurements of 4 devices at two different points of time. A measurement basically consists of an array of ones and zeros corresponding to a bit value at the corresponding location:
whos measurement1_dev1_time1
Name Size Bytes Class Attributes
measurement1_dev1_time1 4096x8 32768 logical
I assume that for a specific device the changes between time 1 and 2 of the measurements are unique. However, since I am dealing with 32768 bits at different locations, it is quite hard to visualize if there is some kind of dependency.
As every bit at location ``x``can be regarded as one dimension of an observation I thought to use PCA to reduce the number of dimensions.
Thus, for every of the 5 devices:
1. I randomly sample ``n`` measurements at point ``t1``and ``t2`` seperatly
2. I prepare an array as input for ``pca()`` with ``m``*n columns (``m``< 32768; its a subset of all the observed bits, as the original data might be too big for pca) and 4 rows (one row for each device).
3. On this array ``A`` I calculate the pca: ``[coeff score latent] = pca(zscore(A))```
4. Then I try to visualize it using ``biplot``: ``biplot(coeff(:,1:2), 'score', score(:,1:2))``
However, this gives me really strange results. Maybe PCA is not the right approach for this problem? I also modified the input data to do the PCA not on the logical bit array itself. Instead, I created a vector, which holds the indices where there is a '1' in the original measurement array. Also this produces strange results.
As I am completely new to PCA I want to ask you if you either see a flaw in the process or if PCA is just not the right approach for my goal and I better look for other dimension reduction approaches or clustering algorithms.

Réponses (1)

Nihal
Nihal le 11 Sep 2024 à 5:48
Principal Component Analysis (PCA) can be a powerful tool for dimensionality reduction and visualization, but applying it correctly requires careful consideration of the data and preprocessing steps. Let's break down your approach and identify potential issues and improvements.Issues and Improvements
  1. Data Representation:PCA is typically used on continuous data. Applying PCA directly on binary data (logical arrays) might not yield meaningful results because PCA assumes a Gaussian distribution of the data. Converting the binary data into a different form before applying PCA might be necessary.
  2. Data Transformation: Instead of using the logical array directly, you could convert the binary data into a continuous form. One common approach is to use the Hamming distance between the binary arrays as a measure of similarity. This way, you can create a distance matrix and then apply PCA.
  3. Sampling and Data Preparation:Ensure that the sampling and preparation of data are done correctly. Each row in the matrix for PCA should represent an observation, and each column should represent a feature. If you are combining data from different devices, make sure they are appropriately aligned.
Hope it helps

Catégories

En savoir plus sur Dimensionality Reduction and Feature Extraction dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by