I am working with different files full of data for 30 years (1 year per file). Each of these files contains a list of variables along with a corresponding ID for that variable. The problem is, some years have the same variable names but a different ID. I am getting past this by matching "by name".
Within each of these variables there are perhaps 10 or 100 different "values" that have the same problem (this is gov't data). They have a name or description that's the same for every year but contain a different ID. But now I can't simply match by name since I have to report a change in ID over time (e.g., to help a researcher). Therefore I end up with stuff like this:
For a value with the name "sleet and hail":
year::ID
1975::5
1976::5
1977::6
1978::5
Ideally, I would collate this data into a simplified version using year ranges, like so:
1975-1976::5
1977::6
1978::5
What is this called, and is there an algorithm made for doing this? I do have one that works for me, although it is in VB and pretty messy.
Within each of these variables there are perhaps 10 or 100 different "values" that have the same problem (this is gov't data). They have a name or description that's the same for every year but contain a different ID. But now I can't simply match by name since I have to report a change in ID over time (e.g., to help a researcher). Therefore I end up with stuff like this:
For a value with the name "sleet and hail":
year::ID
1975::5
1976::5
1977::6
1978::5
Ideally, I would collate this data into a simplified version using year ranges, like so:
1975-1976::5
1977::6
1978::5
What is this called, and is there an algorithm made for doing this? I do have one that works for me, although it is in VB and pretty messy.
