I am sure I am missing something here. I have a complex data flow that has to do look ups from 3 different tables, so to make it easier to understand I split each complex look-up into a separate Query transform and then want to rejoin the data again to output. When I do that however it starts to duplicate the data hundreds of times.
In this data flow it works file, it brings 100 records over and puts 100 records each in the output tables. i checked all of the tables and each only has 100 records
In this data flow it keeps running, I killed the job to stop it and it was at 20 Million records? Am I supposed to do anything in the last Query transform to make sure it doesn't duplicate data.
Note that I made sure that the table was empty before I ran the query, though that should not matter as it is a template table. I also made sure that the Flat file was set to only return 100 records.