Merging/Superimposing two tables without using loops

16 vues (au cours des 30 derniers jours)
Nikhil Tawakley
Nikhil Tawakley le 16 Mar 2017
Commenté : Nikhil Tawakley le 10 Avr 2017
I have two very large (200000x200) tables, Table_1 & Table_2. They have columns/variables that store different data types- strings, doubles, integers etc.
I want to create a third table, Table_3, that superimposes Table 2 over Table 1 if Table 1 has a missing value. i.e.
if Table_1(row,column)<>{''},
Table_3(row,column)=Table_1(row,column)
else
Table_3(row,column)=Table_2(row,column)
For example,
Table 1 Table 2 Table 3
Record Name Age Record Name Age Record Name Age
1 John 10 1 John 1 John 10
2 Daniela 2 Daniela 5 2 Daniela 5
3 Austin 30 3 Austin 30 3 Austin 30
4 45 4 Mark 45 4 Mark 45
I want to do this without using for loops because the time to traverse them is very very slow.
To consider:
  • Is there a way to do this using the fillmissing() function? This would be the most natural approach however, Table_3=fillmissing(Table_1,'Constant',Table_2) doesn't work. (The 2nd argument (Table_2) cannot be multidimensional.)
  • If this were a simple matrix, Table_3(Table_1=='')=Table_2 would've worked; but I can't seem to do this with tables.
Thanks, NT
  1 commentaire
Nikhil Tawakley
Nikhil Tawakley le 17 Mar 2017
Thanks Peter- it's "like" a join but not quite- one typically does a join if there are extra variables to capture; these tables are identical in structure/size. Data in Table_1 is just more reliable than the other (but is comparatively sparsely populated).
I wan't to "superimpose" the two tables such that data from Table 1 is given priority over Table 2.
There could be situation where Mark's age in Table_1 is 85- in that case I would like the output to read:
Table 1 Table 2 Table 3
Record Name Age Record Name Age Record Name Age
4 85 4 Mark 45 4 Mark 85
There's gotta be a way to do this! Thanks again for looking into this.

Connectez-vous pour commenter.

Réponse acceptée

Peter Perkins
Peter Perkins le 16 Mar 2017
This isn't exactly a join, but it's close. It's not entirely clear to me what you want to do, for example, Mark is missing from reconrd 4 in table 1, but Age == 45 is not. However, in tables 2 and 3, where Mark is present, it's the same Age: 45. Is that always true? If so, I would think you could delete records from table 1 that had missing values, then outerjoin with table 2, then repeat for table 3.
Otherwise, maybe you're looking for some kind of setdiff operation.
  4 commentaires
Peter Perkins
Peter Perkins le 20 Mar 2017
Modifié(e) : Peter Perkins le 20 Mar 2017
OK, I've been misinterpreting your example, and I didn't realize that "table 3" was you expected output. So after re-reading, I see what you want.
You say, "no loops", but I think you mean "no loops over rows." Here's a version that loops over variables, making use of ismissing. With only 200 vars, I don't think that will be an issue. I'm gonna assume that the two tables contain the same record numbers, if that's not true you would have to do some kind of join before this:
>> t1 = table( ...
[1;2;3;4],{'John';'Daniela';'Austin';''},[10;NaN;30;45], ...
'VariableNames',{'Record' 'Name' 'Age'})
t1 =
4×3 table
Record Name Age
______ _________ ___
1 'John' 10
2 'Daniela' NaN
3 'Austin' 30
4 '' 45
>> t2 = table( ...
[1;2;3;4],{'John';'Daniela';'Austin';'Mark'},[NaN;5;30;45], ...
'VariableNames',{'Record' 'Name' 'Age'})
t2 =
4×3 table
Record Name Age
______ _________ ___
1 'John' NaN
2 'Daniela' 5
3 'Austin' 30
4 'Mark' 45
>> isempty(setxor(t1.Record,t2.Record)) % check the assumption
ans =
logical
1
>> t1 = sortrows(t1,'Record'); % just in case
>> t2 = sortrows(t2,'Record');
>> t3 = t1;
>> missingValues = ismissing(t1);
>> for varIndex = 2:width(t1) % index by var number, not name
i = missingValues(:,varIndex);
t3.(varIndex)(i) = t2.(varIndex)(i);
end
>> t3
t3 =
4×3 table
Record Name Age
______ _________ ___
1 'John' 10
2 'Daniela' 5
3 'Austin' 30
4 'Mark' 45
Nikhil Tawakley
Nikhil Tawakley le 10 Avr 2017
Thanks for the detailed response, Peter; this was very helpful. I ended up "solving" the issue by looping over variables. It's still slow but fortunately it's acceptable :)

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Matrix Indexing dans Help Center et File Exchange

Tags

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by