Grepper
Jump to navigation
Jump to search
Small usefull command line program called grepper which is a usefull to complement POSIX grep.
The idea is that you have a file that can be 'whatever' delimited and then you want to extract entire rows, but you only want to do the search in specific columns.
- You supply a file containing all the keys that will be searched for in the datafile. The program will only do a single pass of the datafile.
- You specify which column to use for the search.
The program supports the following options which should make it somewhat similar to grep
- -i don't care about cases (capital letter vs small letter)
- -w seach for whole words
- -v do complement
Brief overview
usage: grepper [OPTION] -k keyfile datafile.gz usage: gunzip -c datafile.gz | grepper [OPTION] -k keyfile options: -c [int]: which column to use for grepping (1 indexed) -d [char] delimitor for the datafile -w search for whole words (similar to grep -w option) -v complement grep (similar to grep -v option) -i ignore case (similar to grep -i option)
Installation
Program is a single c++ file that can be downloaded [1]
wget http://popgen.dk/software/download/grepper.cpp g++ grepper.cpp -O3 -o grepper -lz
Examples
Assuming the keys file:
"Setosa" "Virginica"
And the datafile
"Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" "1" 5.1 3.5 1.4 0.2 "setosa" "2" 4.9 3 1.4 0.2 "Setosa" "3" 4.7 3.2 1.3 0.2 "setosa" "4" 4.6 3.1 1.5 0.2 "setosa" "5" 5 3.6 1.4 0.2 "setosa" "6" 5.4 3.9 1.7 0.4 "setosa" "7" 4.6 3.4 1.4 0.3 "setosa" "8" 5 3.4 1.5 0.2 "setosa" "9" 4.4 2.9 1.4 0.2 "setosa" "101" 6.3 3.3 6 2.5 "virginica" "102" 5.8 2.7 5.1 1.9 "virginica" "103" 7.1 3 5.9 2.1 "virginica" "104" 6.3 2.9 5.6 1.8 "virginica" "105" 6.5 3 5.8 2.2 "virginica" "106" 7.6 3 6.6 2.1 "virginica" "107" 4.9 2.5 4.5 1.7 "virginica" "108" 7.3 2.9 6.3 1.8 "virginica" "109" 6.7 2.5 5.8 1.8 "Virginica" "110" 7.2 3.6 6.1 2.5 "virginica" "51" 7 3.2 4.7 1.4 "versicolor" "52" 6.4 3.2 4.5 1.5 "versicolor" "53" 6.9 3.1 4.9 1.5 "versicolor" "54" 5.5 2.3 4 1.3 "versicolor" "55" 6.5 2.8 4.6 1.5 "versicolor" "56" 5.7 2.8 4.5 1.3 "versicolor" "57" 6.3 3.3 4.7 1.6 "versicolor" "58" 4.9 2.4 3.3 1 "versicolor" "59" 6.6 2.9 4.6 1.3 "versicolor" "60" 5.2 2.7 3.9 1.4 "versicolor"
1.
We have single space delimited datafile and we want to grep from column6
./grepper -k key datafile -d ' ' -c 6 "2" 4.9 3 1.4 0.2 "Setosa" "109" 6.7 2.5 5.8 1.8 "Virginica"
2.
If we don't care about the cases add -i
./grepper -k key datafile -d ' ' -c 6 -i "1" 5.1 3.5 1.4 0.2 "setosa" "2" 4.9 3 1.4 0.2 "Setosa" "3" 4.7 3.2 1.3 0.2 "setosa" "4" 4.6 3.1 1.5 0.2 "setosa" "5" 5 3.6 1.4 0.2 "setosa" "6" 5.4 3.9 1.7 0.4 "setosa" "7" 4.6 3.4 1.4 0.3 "setosa" "8" 5 3.4 1.5 0.2 "setosa" "9" 4.4 2.9 1.4 0.2 "setosa" "101" 6.3 3.3 6 2.5 "virginica" "102" 5.8 2.7 5.1 1.9 "virginica" "103" 7.1 3 5.9 2.1 "virginica" "104" 6.3 2.9 5.6 1.8 "virginica" "105" 6.5 3 5.8 2.2 "virginica" "106" 7.6 3 6.6 2.1 "virginica" "107" 4.9 2.5 4.5 1.7 "virginica" "108" 7.3 2.9 6.3 1.8 "virginica" "109" 6.7 2.5 5.8 1.8 "Virginica" "110" 7.2 3.6 6.1 2.5 "virginica"
3.
say we want the complement of example 1.
./grepper -k key datafile -d ' ' -c 6 -v "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" "1" 5.1 3.5 1.4 0.2 "setosa" "3" 4.7 3.2 1.3 0.2 "setosa" "4" 4.6 3.1 1.5 0.2 "setosa" "5" 5 3.6 1.4 0.2 "setosa" "6" 5.4 3.9 1.7 0.4 "setosa" "7" 4.6 3.4 1.4 0.3 "setosa" "8" 5 3.4 1.5 0.2 "setosa" "9" 4.4 2.9 1.4 0.2 "setosa" "101" 6.3 3.3 6 2.5 "virginica" "102" 5.8 2.7 5.1 1.9 "virginica" "103" 7.1 3 5.9 2.1 "virginica" "104" 6.3 2.9 5.6 1.8 "virginica" "105" 6.5 3 5.8 2.2 "virginica" "106" 7.6 3 6.6 2.1 "virginica" "107" 4.9 2.5 4.5 1.7 "virginica" "108" 7.3 2.9 6.3 1.8 "virginica" "110" 7.2 3.6 6.1 2.5 "virginica" "51" 7 3.2 4.7 1.4 "versicolor" "52" 6.4 3.2 4.5 1.5 "versicolor" "53" 6.9 3.1 4.9 1.5 "versicolor" "54" 5.5 2.3 4 1.3 "versicolor" "55" 6.5 2.8 4.6 1.5 "versicolor" "56" 5.7 2.8 4.5 1.3 "versicolor" "57" 6.3 3.3 4.7 1.6 "versicolor" "58" 4.9 2.4 3.3 1 "versicolor" "59" 6.6 2.9 4.6 1.3 "versicolor" "60" 5.2 2.7 3.9 1.4 "versicolor"