Synchronizing time series, but first join?

3 vues (au cours des 30 derniers jours)
Nathan McIntosh
Nathan McIntosh le 24 Jan 2022
Réponse apportée : Chetan le 27 Sep 2023
clc; clear;
Here we have two simple tables
left_time = seconds([1;5;10]);
left_vals = ["a";"b";"c"];
left = timetable(left_time, left_vals)
left = 3×1 timetable
left_time left_vals _________ _________ 1 sec "a" 5 sec "b" 10 sec "c"
right_time = seconds([1;2;3;6;7]);
right_vals = [1;2;3;6;7];
right = timetable(right_time, right_vals)
right = 5×1 timetable
right_time right_vals __________ __________ 1 sec 1 2 sec 2 3 sec 3 6 sec 6 7 sec 7
And we combine them into a single table. The type of synchronization I need to perform is sometimes called an "as-of" join/merge. See these sources for how other table/dataframe libraries perform this:
The idea is that for each row in left, we find the row in right that is equal to or less than that time, and join it.
joined = synchronize(left, right, "first","previous")
joined = 3×2 timetable
left_time left_vals right_vals _________ _________ __________ 1 sec "a" 1 5 sec "b" 3 10 sec "c" 7
This is great! It's pretty fast, and a one liner.
But, my primary use case is slightly harder. Here using an example from the pandas documentation
left_time = seconds([23;23;30;41;48;49;72;75]);
ticker = ["GOOG";"MSFT";"MSFT";"MSFT";"GOOG";"AAPL";"GOOG";"MSFT";];
bid = [720.50; 51.95; 51.97; 51.99; 720.50; 97.99; 720.50; 52.01];
ask = [720.93; 51.96; 51.98; 52.00; 720.93; 98.01; 720.88; 52.03];
quotes = timetable(left_time, ticker, bid, ask)
quotes = 8×3 timetable
left_time ticker bid ask _________ ______ _____ ______ 23 sec "GOOG" 720.5 720.93 23 sec "MSFT" 51.95 51.96 30 sec "MSFT" 51.97 51.98 41 sec "MSFT" 51.99 52 48 sec "GOOG" 720.5 720.93 49 sec "AAPL" 97.99 98.01 72 sec "GOOG" 720.5 720.88 75 sec "MSFT" 52.01 52.03
right_time = seconds([23;38;48;48;48]);
ticker = ["MSFT";"MSFT";"GOOG";"GOOG";"AAPL"];
price = [51.95; 51.95; 720.77; 720.92; 98.0];
quantity = [75; 155; 100; 100; 100];
trades = timetable(right_time, ticker, price, quantity)
trades = 5×3 timetable
right_time ticker price quantity __________ ______ ______ ________ 23 sec "MSFT" 51.95 75 38 sec "MSFT" 51.95 155 48 sec "GOOG" 720.77 100 48 sec "GOOG" 720.92 100 48 sec "AAPL" 98 100
When we join here, we want to make sure we only match up MSFT rows with MSFT rows, and GOOG rows with other GOOG rows, etc.
In pandas, I would write this: pd.merge_asof(trades, quotes, on="time", by="ticker") which would produce the following
time ticker price quantity bid ask
23 MSFT 51.95 75 51.95 51.96
38 MSFT 51.95 155 51.97 51.98
48 GOOG 720.77 100 720.50 720.93
48 GOOG 720.92 100 720.50 720.93
48 AAPL 98.00 100 NaN NaN
How can I do this Matlab? I've played around with the various ways join() can combine the tables, but haven't yet found a way to only synchronize various chunks of each with eachother.
In the example above, I am only "grouping by" one column. In my actual work, I am grouping by multiple columns.

Réponses (1)

Chetan
Chetan le 27 Sep 2023
I understand that you're facing issues while joining two tables and synchronizing the time series. To address this, you have a couple of options:
  1. You can use the outerjoin function in MATLAB. This function allows you to perform an outer join operation on the tables. You can find more information about outerjoin” in MathWorks documentation: https://www.mathworks.com/help/matlab/ref/table.outerjoin.html
  2. Alternatively, you can write your custom logic to find the nearest preceding time and implement the "as-of" join. This involves iterating over the rows of one table and finding the corresponding value in the other table based on the nearest preceding time.
Here is the sample code for this:
clc; clear;
% Define the quotes table
left_time = seconds([23;23;30;41;48;49;72;75]);
ticker = ["GOOG";"MSFT";"MSFT";"MSFT";"GOOG";"AAPL";"GOOG";"MSFT"];
bid = [720.50; 51.95; 51.97; 51.99; 720.50; 97.99; 720.50; 52.01];
ask = [720.93; 51.96; 51.98; 52.00; 720.93; 98.01; 720.88; 52.03];
quotes = timetable(left_time, ticker, bid, ask);
% Define the trades table
right_time = seconds([23;38;48;48;48]);
ticker = ["MSFT";"MSFT";"GOOG";"GOOG";"AAPL"];
price = [51.95; 51.95; 720.77; 720.92; 98.0];
quantity = [75; 155; 100; 100; 100];
trades = timetable(right_time, ticker, price, quantity);
% Sort the quotes table by time
quotes = sortrows(quotes, 'left_time');
% Perform the "as-of" join based on nearest preceding time
matched = trades;
matched.bid = NaN(height(trades), 1);
matched.ask = NaN(height(trades), 1);
for i = 1:height(trades)
idx = find(quotes.left_time <= trades.right_time(i), 1, 'last');
if ~isempty(idx)
matched.bid(i) = quotes.bid(idx);
matched.ask(i) = quotes.ask(idx);
end
end
% Display the matched table
disp(matched);
I hope these suggestions help you resolve the issue you are facing

Catégories

En savoir plus sur MATLAB dans Help Center et File Exchange

Produits


Version

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by