Advanced Options
XSLT Filter
This filter applies an XSLT transformation to XML text.
Merge files

The merge files filter is used to join files together. The content from each file is simply appended to the content from the previous file, with no extra characters e.g. if you joined files A, B and C, the merge files filter would result in output like this:
[file A][file B][file C]
This filter can be used for:
- Joining together binary files that have been split
- Joining together the processed results from large numbers of smaller files e.g. extracting email addresses or URLs from a large number of web pages, then using the merge files filter to join the results together so that a sort and remove duplicates can be performed.
Count duplicate lines

The Count Duplicate Lines filter outputs a count of the number of times a line has been repeated (only if there are 2 or more occurrences). The file should be sorted prior to this filter so that duplicate lines are placed next to each other. A single copy of each duplicated line is output, with a count of the number of duplicates at the start of each line. From a set of duplicate lines, the line that gets output is the last duplicate line of the set, unless the set starts on the first line of the file, in which case the first line gets output (when Ignore Case is checked, the duplicate lines can be different).
- Ignore case
If ignore case is checked, lines do not need to be cased identically to be considered duplicates. Two identical lines, one in upper case, and one in lower case, would be considered duplicates and removed by this filter. If ignore case is unchecked, the lines must be identical to be considered duplicates. The case checking routines are ANSI aware, so their behaviour may change depending on your locale.
- Start column
The comparison can also ignore leading characters if desired by setting the start column higher than 1. This can be used to skip line numbers, which can be used to find duplicates that are not adjacent. To skip line numbers, set the Start Column to 6 (or so), and set the length to 4096, or a length greater than your maximum line length.
- Length
The comparison can also ignore trailing characters if desired, by setting the length to less than the length of the line.
- Include counts of 1
Normally this filter only outputs lines with counts of 2 or more (ie, they are duplicates). When this box is checked.
Move columns

TextPipe will move columns to a new position on the line. The new position is specified assuming that the moved columns have been removed from the line.
Copy fields (CSV, Tab, Pipe etc)
TextPipe will copy CSV-delimited fields to a new position on the line. TextPipe ensures that all the delimiters on the line are correctly maintained, both at the end of the line and where the copied fields are inserted.
Repeat file's contents
The file's contents are repeated the specified number of times. This filter needs to keep a copy of the entire file in memory so it can output it a given number of times. This filter is useful for generating test data sets of a given size.
Randomize lines
This filter put lines into random order. This is useful when a random sample of data is required for statistical purposes - just follow this filter with a head/tail of file filter.
The lines output will differ from one run to the next. The order the lines are chosen is determined by a pseudo-random number generator.
Reverse line order
Each line is output reversed from left to right. This can be useful to
- extract domain names from web site log files - use this filter to reverse each line, use an extract matches filter of [\w\d]+\.[\w\d]+ to extract each domain name, then reverse each line again.
- reversing lines of Hebrew text that are in mirror script
|