Manipulating text
cat
Means concatenate
Can be used to print contents of a file
cat file
Can also be used to concatenate files together like
cat file1 file2 #puts the content of file2 after the ones from file1 into stdout
cat file1 file2 > newFile #Same as above but saves to a new file (overwrite)
cat file1 file2 >> newFile #Same as above but appends to a file
cat > file #All lines typed will be written in the file until Ctrl+D
cat >> file #Same as above but with append
cat > file << EOF
text
EOF
# Allows to signal a file ending without Ctrl+D
There is an inverse command
tacwhich prints the lines in reverse order, but has the same functionalities
echo
Simply displays a string Can be used to put strings into:
- stdout
- a new file with
> - an existing file with
>>
Using the -e flag, allows for characters like newline (\n) and tab(\t) in the string.
It is widely used to print environment variables (value)
Example:
echo $USERNAME # Print the value of the env variable
echo hello > newFile #Puts hello into newFile (overwrites)
echo hello >> file # Appends hello to file
Manipulating Large files
- Using
lessis more efficient than an editor (does not try to load everything into memory) headallows to see only the first n linestailallows to see only the last n lines (with-fwill continuously monitor)
Viewing compressed files
There are utilities that work directly on compressed files.
For gzip compressed files zcat, zless, zgrep, zdiff There are also for the other formats bzcat, bzless xzcat, xzless
sed
-
Stream editor
-
Modifies the content of an input stream (stdin, file, etc)
-
Data from input is moved to a working space, there all operations/modifications are made and moved to the stdoutor output stream (file, etc.)
-
When used with the
-eflag, more than one editing command can be passed -
The
-fflag allows to pass a scriptfile with sed commands
Substitution
One big use case is substituting strings
sed s/pattern/replace_string/ file # Substitutes first occurrence of pattern in every line
sed s/pattern/replace_string/g file #All occurrences of pattern in every line
sed 1,2s/pattern/replace_string/g file #All occurrences in a range of lines 1-3
sed -i s/pattern/replace_string/g file #Rewrites the file with the changes (Not recommended)
Delimiting character can be chosen!
awk
- Used to extract and print information from files
- It works well with fields (containing a single piece of data, essentially a column) and records (a collection of fields, essentially a line in a file).
- Also has the
-fflag to provide a scriptfile with commands - When dealing with simple fields:
awk '{ print $0 }' file # Prints the whole file awk -F: '{ print $1 $7 }' file #Sets the field separator to : and prints the first and seventh fieldsFile Manipulating
sort
- Allows to sort the lines of a file (based on a key)
- By default sorts alphabetically
Examples:
sort file #Sorts alphabetically with the first character on each line
cat file1 file2 | sort #Sorts the contents of two files and prints
sort -r file #Sorts in reverse
sort -k 3 file #Sorts by the 3rd field in each line (not the beginning)
sort -u file #SOrts and gets rid of repeated lines (same as sort | uniq)
uniq
- Simplifies files by eliminating repeated lines that are consecutive
- Normally used with
sortfirst, because of consecutive sort -udoes both in one command
paste
- Used to join files “horizontally”
- It is based on delimiters
- Default delimiter is
\tbut can be changed with the-dflag - Tee
-sflag allow for serial manner (That is a line per file, separated by delimiters)
Examples:
This files
Colombia
Argentina
Suiza
Alemania
Italia
Australia
Bogota
Buenos Aires
Berna
Berlin
Roma
Canberra
give out the following:
$ paste file1 file2 -d ":"
Colombia:Bogota
Argentina:Buenos Aires
Suiza:Berna
Alemania:Berlin
Italia:Roma
Australia:Canberra
$ paste file1 file2 -d ":" -s
Colombia:Argentina:Suiza:Alemania:Italia:Australia
Bogota:Buenos Aires:Berna:Berlin:Roma:Canberra
join
- Enhanced version of
paste - Joins two files that have a common field
split
- Used to split large file into smaller ones
- By default breaks it into files with 1000 lines (changed with the
-lflag) - By default creates file with
- Prefix: Default is
x, can be set withsplit file some_prefix - Sufix: Default is
aa,ab, etc. With-dcan be numeric
- Prefix: Default is
- Can also split by size (
-b 16will result in pieces of 16 bytes) or amount of files (-n)
grep
- Scan files with matching regex
- Returns lines that match by default
Some examples:
grep "^some_pattern$" file
grep -v "^some_pattern$" file #Returns line that do not match
grep -C 3 "^some_pattern$" file #Returns 3 lines(before and after) of context and the one that matches
grep -e "^some_pattern$" -e "some_other" file #Multiple patterns
stringscan be used to extract strings from binaries as well