Performant extraction of most recent data from a history table

You have a table that contains multiple time stamped records for a given primary key:

Key	Att	Timestamp
03	747	2012.11.11 04:17:30
01	ABC	2014.09.30 17:45:54
02	UVW	2014.04.16 17:45:23
01	DEF	2014.08.17 16:16:27
02	XYZ	2014.08.25 18:15:45
01	JKL	2012.04.30 04:00:00
03	777	2014.07.15 12:45:12
01	GHI	2013.06.08 23:11:26
03	737	2010.12.06 06:43:52

Output required is the most recent record for every key value:

Key	Att	Timestamp
01	ABC	2014.09.30 17:45:54
02	XYZ	2014.08.25 18:15:45
03	777	2014.07.15 12:45:12

Solution #1: Use the gen_row_num_by_group function

Build a dataflow as such:

In the first query transform, sort the input stream according to Key and Timestamp desc(ending). The sort will be pushed to the underlying database, which is often good for performance.

Key	Att	Timestamp
01	ABC	2014.09.30 17:45:54
01	DEF	2014.08.17 16:16:27
01	GHI	2013.06.08 23:11:26
01	JKL	2012.04.30 04:00:00
02	XYZ	2014.08.25 18:15:45
02	UVW	2014.04.16 17:45:23
03	777	2014.07.15 12:45:12
03	747	2012.11.11 04:17:30
03	737	2010.12.06 06:43:52

In the second query transform, add a column Seqno and map it to gen_row_num_by_group(Key).

Key	Att	Timestamp	Seqno
01	ABC	2014.09.30 17:45:54	1
01	DEF	2014.08.17 16:16:27	2
01	GHI	2013.06.08 23:11:26	3
01	JKL	2012.04.30 04:00:00	4
02	XYZ	2014.08.25 18:15:45	1
02	UVW	2014.04.16 17:45:23	2
03	777	2014.07.15 12:45:12	1
03	747	2012.11.11 04:17:30	2
03	737	2010.12.06 06:43:52	3

In the third query transform, add a where-clause Seqno = 1 (and don’t map the Seqno column).

Key	Att	Timestamp
01	ABC	2014.09.30 17:45:54
02	XYZ	2014.08.25 18:15:45
03	777	2014.07.15 12:45:12

Solution #2: use a join

Suppose we’re talking Big Data here, there are millions of records in the source table. On HANA. Obviously. Although the sort is pushed down to the database, the built-in function is not. Therefore every single record has to be pulled into DS memory; and then eventually written back to the database.

Now consider this approach:

The first query transform selects two columns from the source table only: Key and Timestamp. Define a group by on Key and set the mapping for Timestamp to max(Timestamp).

Key	Timestamp
01	2014.09.30 17:45:54
02	2014.08.25 18:15:45
03	2014.07.15 12:45:12

In the second transform, (inner) join on Key and Timestamp and map all columns from the source table to the output.

Key	Att	Timestamp
01	ABC	2014.09.30 17:45:54
02	XYZ	2014.08.25 18:15:45
03	777	2014.07.15 12:45:12

If you uncheck bulk loading of the target table, you’ll notice that the full sql (read and write) will be pushed to the underlying database. And your job will run so much faster!

Note: This second approach produces correct results only if there are no duplicate most recent timestamps within a given primary key.

Performant extraction of most recent data from a history table

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112