What is Join Rank?
You can use join rank to control the order in which sources (tables or files) are joined in a dataflow. The highest ranked source is accessed first to construct the join.
Best Practices for Join Ranks:
- Define the join rank in the Query editor.
- For an inner join between two tables, in the Query editor assign a higher join rank value to the larger table and, if possible, cache the smaller table.
Performance Improvement:
Controlling join order can often have a huge effect on the performance of producing the join result. Join ordering is relevant only in cases where the Data Services engine performs the join. In cases where the code is pushed down to the database, the database server determines how a join is performed.
Where Join Rank to be used?
When code is not full push down and sources are with huge records then join rank may be considered. Join rank plays important role in performance optimization as in such cases DS engine performs the join. The Data Services Optimizer considers join rank and uses the source with the highest join rank as the left source.
You can print a trace message to the Monitor log file which allows you to see the order in which the Data Services Optimizer performs the joins. This information may help you to identify ways to improve the performance. To add the trace, select Optimized Data Flow in the Trace tab of the "Execution Properties" dialog.
Article shall continue with a real time example on Join Rank soon.