Where to Begin with Computer Science Theory for Self-Taught Programmers

Computer screen with colorful programming terms.

Happy Halloween! For the spookiest day of the year we thought we’d discuss what many digital humanists find to be a scary topic: computer science theory. Most DHers are formal humanists and self-taught programmers. This usually means there is a greater focus on learning programming languages or perhaps a very specific content management system (CMS) for a project. In this as-I-need-it approach people will often skip the theory elements or learn them without realizing how important they are. This is not a complete list of theoretical ideas by any means nor is it a comprehensive overview of these concepts. I created it based on the most common concepts I encountered in my CS courses on data structures and algorithms to help self-taught programmers understand the theoretical constructs.

Propositional Logic

Propositional logic is the combination of and/or/not operators and if/then statements to specifically restrict information. They are typically used in loops or if statements to trigger an event when the specific condition is met. For example if an item has caramel  AND chocolate AND peanuts then it is a Snickers bar.  With these conditions a Snickers bar makes the statement true, but  a Twix bar only meets two of the three conditions therefore the statement is false and a Twix bar is not a Snickers bar. Working through this kind of logic is typically done with truth tables where you assess what the results of statements will be when different combinations of the statement (usually represented as Ps and Qs) are true or false. 

Regular Expressions

Regular expressions are a set of conditions to find a certain string of characters. This is the logic behind things like search and replace tools on word processors or to validate a response like an email address in a form.  Using regular expressions makes string validation in programs a lot easier when you have input more complicated than a single acceptable answer (even in this circumstance it can be difficult because a user could enter ‘true’ any number of capitalized and punctuated ways). 

For example if we were trying to validate an email address entered by a user we could use a regular expression like this: `[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}` This expression is looking for any letter A to Z or any number 0 to 9 or a period, underscore, percent sign, plus sign, dash AND an at sign AND any letter A to Z or any number 0 to 9 or a period or a dash AND a period with at least two letters that could be any letter A to Z. It takes a little while to understand the syntax and cheat sheets like [this one](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet) from Mozilla are very helpful.

Traveling Salesman Problem

The Traveling Salesman Problem is a common logic puzzle about finding the shortest path in a given set of points. Imagine you are a Salesman who has to visit a specific number of houses in a neighborhood and return to your store as efficiently as possible. You want to only visit each house once and walk the shortest distance while doing so. There are many variations on this problem that add different requirements to the Salesman’s travels. The logic behind this puzzle is important for creating efficient code and is the method behind things like routes on map apps.

Trees

A tree is an abstract  data structure that expresses hierarchy like a tree’s root and branch structure. Like its natural counterpart, data structure trees have roots, leaves, and many different varieties. One of the most common types of tree data structures is the red-black tree. Trees are typically used as methods for sorting items like letters or numbers based on their size or position when compared to an existing leaf on the tree. They are also an abstraction for a method of data storage.  

Binary, Hexadecimal, and Octal

The numerical system that is most commonly used by regular people is the decimal system or base 10 system. It consists of the numbers 0 to 9. In computer science there are a few other common numerical systems. Binary is the base two system used by computers to communicate information. Numbers are calculated based on the position of a 1 or 0 in a sequence. 1 and 0 act more as on and off switches than a factor to multiply by. A one in a position means add the result of 2 to the power of the position number.

23222120
1101
8401
8 + 4 + 1 = 13
Table showing how the number 13 is calculated in Binary

Octal is a base 8 system that goes from 0-7. Larger numbers are calculated based on the position of a the number in a sequence. The number is multiplied by 8 to the power of whatever position it is in then all the numbers in the sequence are added together. Octal numbers are preceded by a zero and a lowercase o.

8180
15
85
8 + 5 = 13
Table showing how the number 13 is calculated in Octal

Hexadecimal or hex is a base 16 system that goes from 0-F (letters A through F indicating decimal numbers 10 – 15). Creating larger numbers also relies on multiplying the number in a specific position by 16 to the power of the position. Hexadecimal numbers are preceded by a zero and a lowercase x. This system is most commonly seen as color codes for design. 

Command Line

The command line isn’t really a theoretical concept in computer science, but it is an important skill that self-taught programmers often skip because it seems very intimidating. Essentially you are using a text based interface to interact directly with your computer. You can do things that you would normally use a graphical user interface (GUI) for such as finding files, but some tasks do not have a GUI and must be added or completed on the command line like adding new programming languages. Having a cheatsheet for your operating system is very helpful. Some of the most common commands you’ll want to learn are how to change directories, print your working directory (where you are in the tree of files), list directories, and git commands (this is for keeping a history of document versions). A tip for navigating files in the command line, name your folders and files without spaces or other special characters as they have to be escaped when typed on the command line. 

Hopefully this list didn’t give you a fright and you can enjoy your Halloween night!

Two jack'o'lantern in the dark.
From David Menidrey on Unsplash

Anna Kroon is a second year in the Digital Humanities MA program. She is the Digital Systems Graduate Assistant and a Graduate Reference Assistant at the Loyola University libraries. Her research interests include: text encoding, adaptation, book history, and archival access.

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php