Hi,
Both steps removes duplicate rows and leaves only unique row occurrences.
For Unique Rows, input stream should be sorted; otherwise, only consecutive double rows are evaluated correctly.
The Unique Rows (HashSet) step tracks duplicates in memory and does not require a sorted input to process duplicate rows.