I need to extract specific data and sum them up in different variables

10 vues (au cours des 30 derniers jours)
Laura Lennuyeux-Comnene
Laura Lennuyeux-Comnene le 28 Mai 2022
Commenté : William Rose le 7 Juin 2022
I have a table of data - essentially two columns and about 8000 rows. Among those rows is information i need to collate to analyse later. These were in a long data text file. It looks like this:
Column 1 Column 2
BEGIN STUDY
Block 1
Ratio Known Outcome known
Ratio 30/70
green
red
red
green
Decision Bean count 4
I have about 40 participants and each participant does four blocks (four different conditions - configurations of ratio known or unknown, outcome known or unknown) and each condition has four ratios. Needless to say, for each participant and across participants, and conditions/ratios, this 'decision bean count' could be different, so the number of rows between 'ratio' and 'decision bean count' is also different.
I had this crazy idea i could do a while loop, but have no idea how to phrase this ... find the beginning of each participant/block/ratio, followed by: while decision bean count = false, row x + 1 (yes, you can laugh, this is how much of a novice I am).
I need to extract the data (i.e. the outcome measure of interest which is the Decision Bean Count) for:
Each participant
Differentiating different conditions
Differentiating different ratios
At the end i need to have two matrixes, which four columns each:
1 with all the Decision Bean count scores, per participant, per condition
1 with all the Decision Bean count scores, per participant, per ratio
I am a total beginner in Matlab, the only thing I have managed to do, is get Matlab to create a table with my data. Whatever I try next, it just gives me the same table in the output ...
This feels very complex to me, but if it is crystal clear for anyone out there who could help, I would be so grateful
kind regards
Laura

Réponses (1)

William Rose
William Rose le 29 Mai 2022
I think I and others will be able to understand you problem better if you post a spreadsheet with columns for each of the different quantities. I expect that there will be columns for:
Subject number (1 to 40); "configuration of ratios known" (true or false); "outcome known" (true or false); ratio (four possible values). Every row should include a value for all of these values, even if the value is repeated from the preceding row. In other words, there will be many rows with each Subject Number, and so on.
There will be more columns, but I am not certain what they will be, because I do not understand your experimental protocol. Maybe there is a column for color (red or green). Maybe there is a column for Bean Count, or Decision Bean Count, or both. Is the value in Decision Bean Count a number, or a True/False? Do you want to compute the value of Decision Bean Count based on the values in the preceding coumns and rows? This is where I do not understand the problem.
  15 commentaires
William Rose
William Rose le 7 Juin 2022
When the script runs, ir produces this output on the console. You should get a similar result.
>> processTextFileLLC
Input file: Jellybeans2a.txt, output file: Jellybeans2aResults.txt.
Number of subjects=33, blocks=132, trials=528, DBCs=528, Confs=528.
>>
The script produces a tab-delimited text file. Here are the first 5 lines and the last 4 lines of the results file created by the script.
You should get the same. You can open this text file with Excel or Matlab or your favorite statistics program for analysis. For example, you might want to know if the mean DBC is different for different conditions. Or you might want to ask if Confidence is different for different ratios. This file has eight, not four, ratios, as I explained in an earlier post. If you want to have only be four ratios, then we can make an adjustment in the script to make this happen. If you make this adjustment, then 40 red:60 green will be considered equivalent to 40 green:60 red, for analysis purposes.
"It may take me a while to work out what is going on, as I am rather slow at these things." Compared to people who have been programming in Matlab or some other language for many years, maybe you are slow, or maye not. It is hard reading other peoples' code. They had a strategy in mind when they wrote it, which is usually not obvious. It makes sense to them, and it works, but it can be hard to decipher. And even if you are slow compared to some others, you are trying to learn a programming language, which most invesitgators will not do.
William Rose
William Rose le 7 Juin 2022
What software do you want to use to do your statistical analysis of the data? Matlab, Excel, R, something else?
What questions do you want to ask and answer with the data you have collected? You and your advisor probably figured this out before you started collecting data. I suspect that you would like to know if DBC varies with conditon, and if DBC varies with ratio. I suspect you would like to know if Confidence varies with Condtion and if Confidence varies with ratio. But maybe I am wrong.
Other thoughts: Again, you and your advisor probably have already figured out what follows. You have two pairs of conditions. Therefore you could analyze condition with a one-factor ANOVA, with four possible values for the one factor. Or you could do a two-factor ANOVA, with two possible values for each factor. There is also the quesiton of whether to do standard ANOVA with an F-test, or do a non-parametric test. The Kruksal-Wallis test is a viable non-parametric alternative to the one-factor ANOVA. There is not a good non-parametric alternative to the two-factor ANOVA. (The Friedman test, which is non-parametric, is not an option, since you have repeated measures in each condition.) See this discussion and this discussion for more info on the lack of a non-parametric alternative to 2-factor ANOVA.

Connectez-vous pour commenter.

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by