19 August 2017
on course website
Big Data Management and Analysis in UNIX
With the growing availability of extremely large datasets, these days, scientists and analysts need to use powerful supercomputers or computer clusters to store, manage and analyse their data. These servers typically run on UNIX, which requires some programming skills and understanding of relevant software packages to get the job done.
Our course introduces you to the UNIX command line environment, teaches you how to manage large datasets using text processing utilities such as sed and awk, shows the basics of shell scripting (if/else statements, loops, etc.) which you can use to automate analysing your data (for example, using a UNIX version of the freely available statistics program R), and familiarizes you with Git as a version control tool. You also learn how to present your data and results in customized plots and figures.
Dr. Aysu Okbay, Richard K. Linnér
• You understand the Unix philosophy and environment: files, processes, pipes, filters and basic utilities.
• You are familiar with login and logout procedures, including remote login using SSH, and setting, protecting and changing passwords.
• You can transfer files between systems with SFTP, SCP and RSYNC.
• You can manipulate text files with sed, awk, cut, paste, cat, etc.
• You can edit text using the VI editor.
• You are familiar with automation through shell scripts.
• You are familiar with version control using Git.
• You can work with R from the UNIX command line.
• You can plot in R.
45 contact hours
EUR 1000: Included in the tuition fee are:
• Airport pick-up service
• Orientation programme
• Course excursions
• On-site support
• 24/7 emergency assistance
• Transcript of records after completion of the course
• Early bird discount of €150
• €250 discount for students from partner universities.
• 10 scholarships available that cover the full tuition fee of one course.
• Combine 2 courses: €100 discount
• Combine 3 courses: €200 discount
on course website