
Creator: University of California San Diego
Category: Software > Computer Software > Educational Software
Topic: Data Analysis, Data Science
Tag: Automate, Data, files, Linux, tools
Availability: In stock
Price: USD 49.00
Many fields, such as data science, bioinformatics, and Linux systems administration, require the manipulation of textual data. Tasks include extracting fields or records meeting certain conditions from structured data (e.g., comma-separated files), combining content from multiple files, applying systematic changes to all lines of a document, sorting or randomizing data, and splitting larger files into smaller files.
While these operations could be done by hand, they tend to be time-consuming, tedious and, worst of all, error prone.
In this course we systematically explore the text processing tools found in Linux and Linux-like environments that enable you to simplify and automate these tasks. We’ll begin with the simplest utilities, covering the features of head, tail, paste, nl, sort, shuffle, split, tr and cut. We’ll then move onto the tools grep, awk and sed, which provide much more powerful capabilities for searching and manipulation. We conclude with an introduction to regular expressions (regexes) and explain how they can be used to specify richer and more complex patterns. Regex topics will include quantifiers, wildcards, anchors, character classes, grouping and alternation, along with advanced concepts such as word boundaries, lazy and greedy matching, and regex flavors.