Chapter 4 PLINK - Software for genomic analyses
Learning outcomes At the end of this chapter you will be able to run one of the most popular programs in genomics - PLINK.
You are almost ready to work with genomic data! The last thing before we take a deep dive into the world of genomics is to download a program to process everything (or at least most of the stuff) at this level.
A quick question: What is the most efficient way to process any kind of genomic data?
Answer: With a computer.
This might seem like an obvious answer, but let me explain. As of now, the days are officially gone, when you opened a data file and made a graph from two highlighted columns in MS Excel. Just the sheer size of the data makes this impossible or highly inefficient. Not to mention the possibility of errors you might introduce with each manual edit. You might rightfully ask: "But Gábor, you just told us to download some text editors in the Basic software chapter to look at the data!" Yes, I suggested some good text editors that you can use to look at the data, but not with the primary intent to change anything in it. So basically just to check the format before further processing. (Note: Later on, when you will know what you are doing, you can occasionally break the "No manual edits!" rule, at your own risk.)
For processing and a wide variety of analyses, my firm suggestion is the PLINK program (written in all caps). This is an easy to use program that is very widespread in the genomics community, especially when it comes to single nucleotide polymorphism (SNP) data. I will talk about practical details on SNP data in the Genotype files in practice chapter.
For now, all you need is the program itself, and to ensure it works. You can do this in several steps: 1) Download PLINK from the official website, from the binary downloads section. You should go for the newest stable version. Keep in mind the operating system you want to use it on. The Windows executable on Mac will not work. 2) Unpack the zip file on your computer, and copy just the plink.exe file to the directory where you intend to do your analyses. There is no harm done if you copy everything, but you make an unnecessary mess in your analysis folder. Also, there is no installation needed. You will just run the program as it is, using specific parameters. 3) Navigate to the analysis directory, via the command prompt, called Terminal in Mac and shell, terminal, or console on Linux. This process is a little different in each operating system, but you will surely find the appropriate way. I am going to assume Windows 10 here.
- Open the command prompt by typing cmd to the search bar and hit enter.
- It opens on the system drive by default. If your analysis directory is on another drive you change it first. For example, mine is on the D drive, so I type d: (i.e. the drive letter and colon) and hit enter.
- Navigate to the analysis directory using the cd (i.e. the change directory) command after which you type the name of the directory. Pro tip: Hitting the Tab key auto-fills the folder name. Try it out, it is really handy!
- Run the plink.exe program you copied there, as described in point 2. You can do this by simply typing: plink
If your command prompt prints out the message you above you are good to go! Congratulations!
At this point, PLINK does not do anything, because you did not include any data. We will include some data and much more in the following steps. Note to Linux and Mac users: You might need to run PLINK in your terminal as ./plink Note2 to Linux (and Mac?) users: Before the first run of the program you might need to make plink.exe executable.
4.1 Exercise
It is a very useful approach to intentionally generate errors, so you see how the program reacts. This way is you make an unintentional error, you will see the same message as before, and you can react quicker. So for this time:
Excercise 1) When trying to start PLINK, intentionally mistype the name, e.g. as pliiiink
Excercise 2) Delete the PLINK executable file from the folder and try to run it as described in the 3d) description above
The solutions and explanations to these exercises, with a bit of bonus content you will find on the accompanying YouTube channel.
If the embedded video does not start, click it again to "Watch on YouTube". Direct link: https://www.youtube.com/watch?v=4VL4z71Ht70