Advanced Features
Task Bar Menu
The task bar shows a small TextPipe icon while TextPipe is running.

By right clicking on the icon, you get a menu of options. The options below the line are filters that you can run.

Typically the filters process text that is on the clipboard. This is so that you can copy text from your editor or Word Processor to the clipboard, run a TextPipe filter via the taskbar icon, and then paste the converted text back into your application.
Logging
The filter options pane can be reached by selecting the topmost node in the filter tree. In the registered version it allows you to set logging profiles. Logging allows you to track when a filter was run, who ran it, how long it ran for, which files it processed or skipped, the input size and output size of each file and the changes made to it by each filter. It also keeps track of any errors that occurred during a filter job.
There are three levels of log entry - Info, Warning and Error. Info lines are reported in the Results tab of the Status Window, and Warnings and Errors are reported in the Errors tab. All three are written to the log file. Logged errors modify TextPipe's exit code.

What is Data Mining?
- Data mining or text mining is where a source of information is processed to extract information.
- Process a web site to extract product catalog and cost information. This can then be used to compare prices between different suppliers.
- Process a web site to extract email addresses or web URLs.
- Harvest the data on a web site for your own purposes.
- Extracted data is designed to be easily loaded into a database for further analysis.
How can TextPipe help?
TextPipe can be used to generate an extract from any text data source, including web sites. TextPipe can also be used to perform data cleansing or any additional processing e.g.
- add a header record (e.g. provide column titles for .CSV files)
- remove unwanted data
- replace specific text
- convert line feeds to DOS/Unix/Mac
- expand tabs
- fix capitalization
- convert from EBCDIC to ASCII
- remove multiple whitespace
- remove columns, lines or fields
- remove duplicate records
- sort
- extract email addresses from specific fields
- discard records matching a pattern
- and much more
Optimizing Performance
Memory
If you're sorting large files, give TextPipe as much memory as possible. Close EVERY unnecessary application.
Once TextPipe starts sorting, try not to start any new programs because TextPipe 'memory full' benchmark will be incorrect. TextPipe assumes you're going to give it as much memory as possible, and that it won't decrease, while it is performing a sort.
Disk I/O and virus scanners
The slowest operation that TextPipe performs is reading from and writing to disk. You can improve performance by making sure that all files being processed are stored on local disks rather than on network servers. You can also increase speed by an order of magnitude by using RAM drives - a disk held in memory, although naturally this won't help if the files you process are very large.
- Disable any virus scanning while TextPipe is running.
- TextPipe utilizes specific Windows API calls to enhance the speed of reading data files.
Temporary files
TextPipe doesn't use temporary files at all, except for sorting where they are unavoidable. TextPipe only ever writes out the completed output file so far. It uses a file name like TXPxxx.tmp until the file is completely written out, then it renames it to the actual output filename.
If you have enough memory, the entire sort is performed in memory for speed. Every 10000 lines, TextPipe checks to see if there is less than 16MB of physical memory (not virtual memory!). If so, it writes the sorted results so far to a temporary file and then continues. If the Output Filter is set to File Output or Single File Output, any temporary files are written to the same folder as the output file. If the Output Filter is set to Clipboard Output then any temporary files are written to the current folder. All temporary files are removed as soon as possible as the sort progresses.
Pattern matching
You can improve matching performance by an order of magnitude by allowing patterns to fail earlier by limiting what wildcards like .* can match. If you can replace .* (match any character 0 or more times) with [^>]* (match any character except '>' 0 or more times) or [^>]{0,200} (match any character except '>' up to 200 times) then your patterns will match/fail to match far more quickly.
Null characters - Clipboard Limitations
Clipboard Limitations
The Windows clipboard cannot be used to process text that contains null characters (ASCII code 0). This is because the clipboard contents are defined as null terminated, and operations on it may halt prematurely. Note that this is a Windows limitation – not a TextPipe limitation. Binary data should be place in a file for TextPipe to process it.
Regular expression limitations
While TextPipe's Perl pattern matcher DOES allow nulls to be search for, TextPipe's egrep implementation does not allow null characters to be searched for.
|