identify key variable for joining tables

I have two large tables (562000*2500) imported from a sql database with non-identical headers. I know these tables have 1 or 2 idetical column values which I want to use for joining the two tables. How is it possible to identify the potential key column for joining them?

11 commentaires

Geoff Hayes
Geoff Hayes le 23 Mai 2019
Johnny - how many columns/fields in each table? (Are you really saying there are 2500?)
Johnny Birch
Johnny Birch le 23 Mai 2019
Yes 2500 columns and > 500.000 rows.
Geoff Hayes
Geoff Hayes le 23 Mai 2019
What are the data types of the columns that are common to both tables? Do you know if they represent the unique identifiers for the rows in the table?
Johnny Birch
Johnny Birch le 23 Mai 2019
the common datatype is strings. I am not sure what you mean by the second question, sorry
Geoff Hayes
Geoff Hayes le 23 Mai 2019
Don't rows/records in tables usually have unique identifiers (the keys)? It's been a while since I've done any database development. But since the common data type is string, then this might not be a key. Are the strings unique for each record/row, or can multiple records have the same string?
Guillaume
Guillaume le 23 Mai 2019
So you have no a-priori knowledge which of the table columns may be the shared key?
And they're not the primary keys of the tables in the db (assuming your tables have primary keys) or indexed keys (which could reduce the search space)?
As Geoff points out primary keys are typically numerical integers. For strings, you'd use a hash to speed up search.
I've never heard of qxl databases.
Johnny Birch
Johnny Birch le 23 Mai 2019
You are right Guillaume. I have no a-priori knowledge about how the two tables are connected, except that they are. And I know they share one or two common column. They may be numerical, I am unsure since these two are large tables. It is from a sql database and I have no experience with sql.
Johnny Birch
Johnny Birch le 23 Mai 2019
@Geoff. I may be wrong. The common column may not be a string. I have not looked into all the values as the tables a quite large.
Adam Danz
Adam Danz le 23 Mai 2019
So many of these questions (and more) could be clarified by including a mini example that represents your data.
Guillaume
Guillaume le 23 Mai 2019
With no a priori knowledge and no way to narrow it down, this sounds like a monumental task, if not impossible. In theory, you'd have to do up to 562000 ^ 3123750 comparisons to find the two columns with the most collisions.
If the two tables come from the same database, is there something in the database schema that could tell you the common column, i.e is there foreign keys?
Geoff Hayes
Geoff Hayes le 23 Mai 2019
Johnny - how do you even know that there are common columns between the two tables? What has led you to believe that?

Connectez-vous pour commenter.

Réponses (0)

Catégories

Produits

Version

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by