Sort according to specific string contained in file name

I have two lists of file names (including the whole path), at some point within the file name there is a subject ID and both lists contain exactly the same 25 IDs because there are two sets of files from each study participant. I need to sort the two lists so that the IDs correspond at each row, i.d. I want something like
List A List B
010822_AB030391 240922_AB030391
130922_FS120387 050322_FS120387
but right now what I have is
List A List B
010822_AB030391 050322_FS120387
130922_FS120387 240922_AB030391
because the lists are just sorted according to the first character and so the IDs don't correspond.
I had several ideas but they all seem too complicated or don't work well, e.g. I tried to split the file names at the underscore, to sort alphabetically and then merge the parts again. I also tried isolating the ID from one list, looping through that isolated list and finding the corresponding entry in the second list that contains a specific ID. But I think there should be a more elegant way to do this and I'd be happy to hear any tips! Right now both lists are character arrays, but maybe they should be a struct or something more easily manipulated.

2 commentaires

The IDs are after the underscore ( _ ) ?
Yes exactly, however there are some more underscores in the path before the file name, e.g.
'D:\DATA\XY\Project_XY\\label_sPR12345_AB67890-0011.nii '
The number of characters before the ID is always the same, also the number of delimiters before it

Connectez-vous pour commenter.

 Réponse acceptée

Karim
Karim le 24 Juin 2022
Modifié(e) : Karim le 24 Juin 2022
You can try spliting the lists, then ordering them and then using the indixes to sort the original list:
ListA = ["010822_AB030391";
"130922_FS120387"];
ListB = ["050322_FS120387";
"240922_AB030391"];
% temporarly split the string to use the ID
tmpListA = split(ListA,"_");
tmpListB = split(ListB,"_");
% sort list A
[ListA_sort, orderA] = sort(tmpListA(:,2));
% find the corresponding order for list B
[~,orderB] = ismember(ListA_sort,tmpListB(:,2));
% order the original list
ListA = ListA(orderA)
ListA = 2×1 string array
"010822_AB030391" "130922_FS120387"
ListB = ListB(orderB)
ListB = 2×1 string array
"240922_AB030391" "050322_FS120387"

3 commentaires

TL
TL le 24 Juin 2022
Modifié(e) : TL le 24 Juin 2022
That's (almost) perfect, thanks so much! The only problem is that there are several underscores before the ID and also some characters after the ID:
D:\DATA\XY\Project_XY\\label_sPR12345_AB67890-0011.nii '
and I can't figure out how to easily extract just the ID (e.g. "extract between the fourth underscore and minus"). I'm doing this in many steps now which is really cumbersome but I assume there is no other way
You can use the same idea/concept, but use it in a couple of steps. See below with some random data.
With the final list u can use the same procedure as in the answer.
MyFile = [ "D:\DATA\XY\Project_XY\\label_sPR12345_AB67890-0011.nii";
"D:\DATA\XY\Project_XY\\label_sPR40922_AB03091-0011.nii";
"D:\DATA\XY\Project_XY\\label_sPR30922_FS12038-0011.nii"];
tmpListA = split(MyFile,"\")
tmpListA = 3×6 string array
"D:" "DATA" "XY" "Project_XY" "" "label_sPR12345_AB67890-0011.nii" "D:" "DATA" "XY" "Project_XY" "" "label_sPR40922_AB03091-0011.nii" "D:" "DATA" "XY" "Project_XY" "" "label_sPR30922_FS12038-0011.nii"
% pick the last column of the first tmp list
tmpListA = split(tmpListA(:,end) ,"_")
tmpListA = 3×3 string array
"label" "sPR12345" "AB67890-0011.nii" "label" "sPR40922" "AB03091-0011.nii" "label" "sPR30922" "FS12038-0011.nii"
% again pick the last column
tmpListA = split(tmpListA(:,end) ,"-")
tmpListA = 3×2 string array
"AB67890" "0011.nii" "AB03091" "0011.nii" "FS12038" "0011.nii"
% finaly only keep the first column
tmpListA = tmpListA(:,1)
tmpListA = 3×1 string array
"AB67890" "AB03091" "FS12038"
Works perfectly, thank you so much! This will be very useful long term, this issue keeps coming up in my project

Connectez-vous pour commenter.

Plus de réponses (1)

S = ["D:\DATA\XY\Project_XY\label_sPR12345_AB67890-0011.nii";
"D:\DATA\XY\Project_XY\label_sPR40922_AB03091-0011.nii";
"D:\DATA\XY\Project_XY\label_sPR30922_FS12038-0011.nii"];
[~,X] = sort(regexp(S,'[A-Z]+\d+\-','match','once'));
S = S(X)
S = 3×1 string array
"D:\DATA\XY\Project_XY\label_sPR40922_AB03091-0011.nii" "D:\DATA\XY\Project_XY\label_sPR12345_AB67890-0011.nii" "D:\DATA\XY\Project_XY\label_sPR30922_FS12038-0011.nii"

Catégories

Produits

Version

R2021a

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by