Bash Command Line
Basics
ls -F show files or directories
ls -F -a == ls -Fa show all files and directories (including hidden ones) (there is . before the name)
cd —> to home directory (user)
cd / —-> to the top unix directory (root)
cd - —> back to previous directory
man —-> manual page
- q quit
- b previous page
- space next page
ls —help
cd c —-> give all the possible expansion starts with ‘c’
Creating things
mkdir
nano —- create and edit txt file
cat ——- show file content
rm -r —- recursively remove directories
rmdir —- remove directories (cannot remove when there are files in the directories)
ls -l —- file attributes
mv
- mv chapter1/draft.txt chapter1/backup.txt ./ (move files)
- rename files
cp
Compress and extract files
tar cvf data.tar data-shell/
- cvf —- c creating/archive v visualize f put all files in data-shell/ into data.tar (using the following tar archive for the operation)
- xvf — x unpack
- cvfz —- archive and using gzip to compress
- tar cvfz data.tar.gz data-shell/
- xvfz — unzip the gz and unarchive the tar
gzip
- more efficient compression to ‘.gz’
unzip
- Unzip the zip file
cltrl + L —- clean the screen
history —- show history commands
tar
is by far the most widely used archiving tool on UNIX-like systems. Since it was originally designed for sequential write/read on magnetic tapes, it does not index data for random access to its contents. A number of 3rd-party tools can add indexing to tar
. However, there is a modern version of tar
called DAR (stands for Disk ARchiver) that has some nice features:
- each DAR archive includes an index for fast file list/restore,
- DAR supports full / differential / incremental backup,
- DAR has build-in compression on a file-by-file basis to make it more resilient against data corruption and to avoid compressing already compressed files such as video,
- DAR supports strong encryption,
- DAR can detect corruption in both headers and saved data and recover with minimal data loss,
and so on. Learning DAR is not part of this course. In the future, if you want to know more about working with DAR, please watch our DAR webinar (scroll down to see it).
File Transfer
securely transfer file between remote system and local system
scp
: need RSA key
sftp
scp
is useful, but what if we don’t know the exact location of what we want to transfer? Or perhaps we’re simply not sure which files we want to transfer yet. sftp
is an interactive way of downloading and uploading files. Let’s connect to a cluster with sftp
:
1 | [local]$ sftp userXXX@cassiopeia.c3.ca |
This will start what appears to be a shell with the prompt sftp>
. However, we only have access to a limited number of commands. We can see which commands are available with help
:
1 | sftp> help |
Notice the presence of multiple commands that make mention of local and remote. We are actually browsing two filesystems at once, with two working directories!
1 | sftp> pwd # show our remote working directory |
And we can recursively put/get files by just adding -r
. Note that the directory needs to be present beforehand:
1 | sftp> mkdir content |
To quit, type exit
or bye
.
Exercise: Using one of the above methods, try transferring files to and from the cluster. For example, you can download bfiles.tar.gz to your laptop. Which method do you like best?
Note on Windows:
- When you transfer files to from a Windows system to a Unix system (Mac, Linux, BSD, Solaris, etc.) this can cause problems. Windows encodes its files slightly different than Unix, and adds an extra character to every line.
- On a Unix system, every line in a file ends with a
\n
(newline). On Windows, every line in a file ends with a\r\n
(carriage return + newline). This causes problems sometimes. - You can identify if a file has Windows line endings with
cat -A filename
. A file with Windows line endings will have^M$
at the end of every line. A file with Unix line endings will have$
at the end of a line. - Though most modern programming languages and software handles this correctly, in some rare instances, you may run into an issue. The solution is to convert a file from Windows to Unix encoding with the
dos2unix filename
command. Conversely, to convert back to Windows format, you can rununix2dos filename
.
Note on syncing: there also a command rsync
for synching two directories. It is super useful, especially for work in progress. For example, you can use it the download all the latest PNG images from your working directory on the cluster.
Tapping the power of Unix
Wildcards, redirection to files and pipes
ls p*
ls *th*
wc —word count
- wc ethan.pdb
- 12 84 622 ethane.pdb
- #of lines # of words # of characters
- also 622 bytes
- wc -l *.pdb
- wc -l *.pdb > list.txt
- write the output into the file (standard output redirection to a file)
sort -n list.txt > sort.txt
- -n numerically
head -3 sort.txt
- print first 3 lines of the file
tail -3 sort.txt
- print last 3 lines of the file
Constructing complex commands with Unix pipes
For example,
wc -l *.pdb > list.txt
sort -n list.txt > sort.txt
head -1 sort.txt
construct these three lines into a single command ?
wc -l *.pdb | sort -n | head -1
Aliases
Aliases are one-line shortcuts/abbreviation to avoid typing a longer command, e.g.
1 | $ alias ls='ls -aFh' |
Now, instead of typing ssh -Y cedar.computecanada.ca
, you can simply type cedar
. To see all your defined aliases, type alias
. To remove, e.g. the alias cedar
, type unalias cedar
.
You may want to put all your alias definitions into the file ~/.bashrc
which is run every time you start a new local or remote shell.
Bash Loops
echo —- print whatever behind this command
To print the value of a variable, we need echo $variable
A easy example of for
loop
1 | for file in *.dat |
or we could write this easy example in one line by using semicolon.
1 | for file in *.dat; do echo $file; done |
A collection is required behind in
.
We could create collections in several examples:
1 | echo {1..10} |
Above two commands will create and output two collections as
1 2 3 4 5 6 7 8 9 10
1 2 5
a b c
Note the collection is not a string.
substrings
${variable:0:3} —— the first 3 characters of string variable
Example: substract characters from strings
Exercise1
writing info into .bashrc is better.
.bashrc_profile
> v.s. >>
- > will overwrite the contents of the file
- >> will contancate the content to the end of the file
diff
— compare files and folders
touch
touch a{1..100}.txt —— create 100 empty files named as a*.txt
echo {a..z}{1..2} ---- a1 a2 b1 b2 .....
echo a{1..3}.{txt,py}
ps
show all the process
ps aux —- show all users’
kill
kill PID
kill -9 PID ——> strongest killer
uniq —-
rsync
rsync -Pva —inplace user120@cassiopeia.c3.ca:thesis/ .
Using Unix pipes, write a one-line command to show the name of the longest *.pdb file (by the number of lines). Paste your answer here.
wc -l *pdb | sort -n | tail -2 | head -1
PS1=”\u@\h \w> “ —- changing the prompt
this variable is just in this shell
[user144@login1 ~]$ echo tmp/data-shell/molecules/*
tmp/data-shell/molecules/a.txt tmp/data-shell/molecules/cubane.pdb tmp/data-shell/molecules/ethane.pdb tmp/data-shell/molecules/list.txt tmp/data-shell/molecules/methane.pdb tmp/data-shell/molecules/octane.pdb tmp/data-shell/molecules/pentane.pdb tmp/data-shell/molecules/propane.pdb tmp/data-shell/molecules/sort.txt
for i in hello 1 2 * bye; do echo $i; done
This command will print the hello 1 2 and all files and directories in current directory and bye
1 | var="sun" |
Redirection
Default to the terminal
mkdirr tmp 2> error.txt
The error will go into the file
mkdirr tmp 1>error.txt
only to terminal
mkdirr tmp &> error.txt
both to file and terminal
/dev/null is a ‘blackhole’
mkdir tmp ; cd tmp
run second command no matter the results of first command
mkdirr tmp && cd tmp
only run second command when first command is successful
1 | myvar="hello" |
/l/L replace first l with L
//l/L replace all l with L
1 | touch hello "hello there" "hi there" "good morning, everyone" |
Question 20
1 | Write a loop that concatenates all .pdb files in data-shell/molecules subdirectory into one file called allmolecules.txt, prepending each fragment with the name of the corresponding .pdb file, and separating different files with an empty line. Run the loop, make sure it works, bring it up with the "up arrow" key and paste in here. |
1 | for file in *.pdb |
1 | TOPICCreate a loop that writes into 10 files chapter01.md, chapter02.md, ..., chapter10.md. Each file should contain chapter-specific lines, e.g. chapter05.md will contain exactly these lines: |
1 | for i in {01..10} |
Its safe to put the expressions in quotes.
Scripts and functions
Shell scripts
.sh
1 |
The shebang tells where to find interpreters.
Run:
1) bash process.sh
2) change it to executable
attributes
rwx rwx rwx : the first set refers to ==the owner of the file (i.e., the user)==; the second set refers to ==the group that owns the file==; the third set refers to ==everybody else on the system==.
1 | chmod u+x process.sh |
Add executable permission of the user to this file.
then we run
1 | ./process.sh |
Note: You are unable to run the file through process.sh
, because this command is not the PATH
1 | # this is shebang, which tells the compiler where to find interpreters. |
When you run the above code, give the input as
1 | ./process.sh A B C D FUCK |
Example
1 | for molecule in $@ |
you could run:
1 | ./process.sh *.pdb |
Variables
1 | myvar=3 |
When using export
, the variable could be inherited by the scripts (say creating a new bash file. The variable could be accessed inside this shell.). However, without this, the variable is only available outside the script.
printenv
or env
will print all the environment variables.
To reset a variable, use
1 | unset myvar1 |
$HOME
variable — home directory
$PATH
variable, where shell will look for interpreters.
$PWD
variable —- stores current directory
$PS1
variable —— stores the format of the prompt
For example, if the $PS1 is [\u@\h \W]\$
, the prompt looks like this
Using which ls
could find out where ‘ls’ file locates.
Functions
Functions are similar to scripts, except that we reference a function by its name. Therefore, once defined, a function can be run in any directory, whereas running a script in another directory requires its path.
A convenient place to put all your function definitions is ~/.bashrc
file which is run every time you start a new shell (local or remote).
1 | # define a function |
$@
show the values of arguments
$#
show the number of arguments
we could write the function in a .sh file
1 | function greeting(){ |
after that we need to ==source function.sh
==to load the definition into the shell. Every time you change the definition of the function, you have to execute source again.
A complicated example of Function
$RANDOM
will generate random integer.
1 | function combine(){ |
.bashrc
: Could put the function definitions into this file. Then when the shell is started, the .bashrc
will be loaded. So you dont need to source your function everytime
Grep and find
Searching inside files with grep
1 | # partly matching |
grep
is case-sensitive- -i —-> set grep to case insensitive
- -n —-> return line number
- -v —-> print all the lines that DOESN’T match
- more flags: see
man grep
Finding files with find
1 | #search the file with name of 'haiku.txt' within current directory |
Combing find
and grep
1 | # aggregate the result |
==xargs==
The xargs
command in UNIX is a command line utility for building an execution pipeline from standard input. Whilst tools like grep
can accept standard input as a parameter, many other tools cannot. Using xargs
allows tools like echo
and rm
and mkdir
to accept standard input as arguments.
For more, refer to https://shapeshed.com/unix-xargs/
Text Manipulation
Text manipulation
Goals: learn sed
tr
—- tr —- stands for translate /// translate or delete characters
sed
: stream editor for filtering and transforming text
GNU version of sed and BSD version of sed have different options and arguments.
Using Address Ranges: Addresses let you target specific parts of a text stream. You can specify line or even a range of lines.
1 | # Using sed to convert all 'invisible' to 'supervisible' |
For sed
command
s/regexp/replacement
: Attempt to match regexp (could apply regular expression) against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 throughout \9 to refer to the corresponding matching sub-expressions in the regexp
==The character after the s is the delimiter. Pick one you like. As long as it’s not in the string you are looking for, anything goes.== And remember that you need three delimiters. If you get a “Unterminated `s’ command” it’s because you are missing one of them.
For example:
1 | sed 's/\/usr\/local\/bin/\/common\/bin/' <old >new |
==`` is the same as \$( ). The command will execute within brackets and then the results will be returned. It’s better to use \$( ).==
1 | # Delete all punctuation marks |
More refer to:
3) https://www.geeksforgeeks.org/sed-command-in-linux-unix-with-examples/
4) https://www.geeksforgeeks.org/sed-command-linux-set-2/
Column-based Text processing with awk
scripting language
1 | awk '{print $1}' haiku.txt |
Fuzzy finder
Fuzzy finder fzf
is a third-party tool, not installed by default. With basic usage, it does interactive processing of standard input. At a more advanced level (not covered in the video below), it provides key bindings and fuzzy completion.
Fuzzy finder is an interactive finding tools. We could also pipe the return results into other commands.
1 | #Example 1 |
Excerise -Day2
ls -l $(which gcc)
grep ATOM $(find . -name “*.pdb”)
Question 8 : Write a function archive()
to replace directories with their gzipped archives.
1 | function archive(){ |
Question 9
Write a one-line command that finds 5 largest files in the current directory and prints only their names and file sizes in the human-readable format (indicating bytes, kB, MB, GB, …) in the decreasing file-size order. Hint: use find, xargs, and awk.
1 | find . -type f| xargs ls -lSh | awk '{print 5 " " 9}' | head -5 |
Appendix 1 Regular Expressions
Regular expressions could be used in almost every computer language.
Basic Regular expressions
Symbol | Descriptions |
---|---|
==^== | matches start of string |
==$== | matches end of string |
. | replaces any character |
==\\== | represent special characters |
() | groups regular expressions |
* | matches up zero or more times the preceding character |
1 | cat sample | grep ^a # search content that starts with 'a' |
Interval Regular Expressions
Expression | Descriptions |
---|---|
{n} | Matches the preceding character appearing n times exactly |
{n,m} | Matches the preceding character appearing n times but no more than m times |
{n,} | Matches the preceding character only when it appears n times or more |
1 | cat sample | grep -E p\{2} # -E --> extended regular expressions |
Extended Regular Expressions
Expression | Description |
---|---|
\+ | Matches one or more occurrence of the previous character |
\\? | Matches zero or one occurrence of the previous character |
1 | cat sample | grep "a\+t" |
Brace (大括号) expansion
The syntax for brace expansion is either a sequence or a comma separated list of items inside curly braces “{}”.
1 | {aa,bb,cc,dd} # ==> aa bb cc dd |
Shorthand Characters
Character | Description |
---|---|
\s | match whitespaces (a space, a tab or line break) |
\d | match digits == [0-9] |
\w | match all the word characters (A-Z a-z) AND _ |
\S | opposite of \s |
\D | opposite of \d |
\W | opposite of \w |
Word Boundaries
Character | Description |
---|---|
\\< | used for beginning of the ==word== |
\> | used for end of the ==word== |
\\b | used for either beginning or end of the ==word==, could replace \\< or \> |
1 | grep "e\>" sample #locate words with 'e' at the end |
Anchor
^ | used to beginning of the ==line== |
$ | used for end of the ==line== |
References
Q&A
1. Single quote, double quote, brackets and backstick
==`…` is the same as \$(…). The command will execute within brackets and then the results will be returned. It’s better to use \$( ).==
single quote will not interpolate anything, but double quotes will. (like variables, backticks, certain escapes). When you enclose characters or variable with single quote then it represents the literal value of the character. Besides, a single quote can’t be used within another single quote.
Example:
1
2
3
4
5num=3
echo '$num'
>> $num
echo "$num"
>> 3如果想让“ ” 里面输出$,需要加 \
If you use any space between the string values then they will be treated as separate value and print separately.
1
2
3
4
5printf '%s\n' "Ubuntu""Centos"
>> UbuntuCentos
printf '%s\n' "Ubuntu" "Centos"
>> Ubuntu
Centos
References:
https://linuxhint.com/bash_escape_quotes/#:~:text=The%20dollar%20sign%20(%20%24%20)%20and,backticks%2C%20double%20quote%20and%20backslash. and,backticks%2C double quote and backslash.)