Notes

git & github


Arguably one of the best things you can do before starting a PhD is invest time in learning how to properly use version control. With version control, you can track, save, and revert changes to any kind of project. There are several options available, but I’m partial to Git & GitHub. Even if you never touch a piece of code, version control is very helpful.

I found a lot of information about Git & GitHub confusing. The documentation is written for software engineers and people that are immersed in writing code. I am not that person and a lot of the information isn’t relevant to my situation. This guide is written specifically for PhDs. I doubt I am leveraging these tools’ full functionality, but this is a good place to start if you’ve never heard of version control before or were overwhelmed by the documentation.

Why Use Version Control?

Version control is important because projects never work in a linear format. Regardless of whether you’re writing a paper or writing code, things change. Version control allows you to track those changes and keep notes on the logic behind why you did the things you did.

If you’re writing a paper, you might sometimes find yourself deleting paragraphs that you later need. With version control, you can save these paragraphs and revert to earlier writing if necessary.

With code, you can add or remove features without worrying about damaging existing, working code. I did not understand the full functionality of version control until I started working on cartography. A lot of cartography is trial and error (or at least it has been for me) and I got tired of having files named “Code_final” and “Code_final_final” or worse: “Really_final_version.” Version control lets me branch off sections of my code, add or remove functionality, without damaging my main code.

Definitions:

Repo/Repository: These are like directories or folders on your computer. It’s where all the information about a project is stored.

Git: is a version control system that allows you to track and revert changes incrementally. This means that as you make changes on a project, you can describe the changes and save them. If you make a change that you don’t like or that broke something, you can revert to a previous version. Git also allows you create branches or forks which are helpful for trying out changes without affecting the main code.

  • Branches: These are like tree branches, it allows you to test out features and code without worrying about affecting the overall stability of your repo. The main branch (sometimes called master) is where you push all your final edits. Let’s say you want to try creating a map with an info box using Leaflet & Shiny. You may not want to mess up your entire leaflet map while you work with Shiny. Branching will allow you to safely work with Shiny and if you like the changes, you can merge the branch with main. If you don’t, you can continue working on the branch until you’re happy or you can discard it.
  • Forks: are similar to branches but are mostly used for copying other people’s repos and making changes. Say you find a cool project repo, but you want to modify it for your own use. You would fork the repo. It would show up in your Git and you can make the changes you want without affecting the original repo.

  • GitHub: is a cloud-based Git service. It allows you to access your Git repositories from anywhere. You can use Git without GitHub, but you can’t use GitHub without Git.

    I might use Git & GitHub interchangeably which drives software engineers bonkers.

    Accessing Repositories

    There are three main ways of using Git and Github:
    1) GitHub Website
    2) GitHub Desktop
    3) Command line

    Purists will tell you that you should always use the command line while others will tell you it doesn’t matter. Honestly, I use all three ways based solely on whatever I’m feeling in the moment. I’m going to focus on command line usage because it will help illuminate what’s happening behind the scenes when you use either the website or the Desktop app.

    I’ve also written about using GitHub Desktop and the GitHub website. The information across all three is the same. My suggestion is to choose whichever one you’re most comfortable with and ignore anyone who makes you feel bad because you prefer to use an app over using command line. There’s more important things to worry about.

    Keeping Organized

    Repos are just directories. If you’re even mildly organized, you probably create a new folder on your hard drive for each project. If you don’t, I highly recommend it because it does help keep things neat and orderly. When you use either Git or GitHub, your repos will be stored in their own folders. This can get unwieldy very quickly. I like to store all my repos in one folder located in My Documents on my hard drive. I just created a folder called GitHub and anytime I initialize or clone a repo, I make sure I do so from this folder. Here is what my GitHub folder looks like:

    Repo Options

    As you can see, it’s located in the Documents folder on my PC. Each folder you see listed here is a repo in my GitHub and I don’t save anything else to this folder except initialized or cloned repos.

    Consoles & Terminals

    To use GitHub from command line you can either use the command prompt or powershell on Windows, terminal on Mac, or a console emulator like cmdr or cygwin. When you download Git it will also install Git CMD which does the same thing as any other console or terminal app. They all operate in the same way and do the same things, so pick whichever makes you happy.

    Before getting into the important commands, the first thing you’ll want to do is navigate to the folder you’re keeping your repos in. For me it’s the GitHub folder I created in my Documents above. There’s two ways you can do this.

      1) Either right click in the folder and select Git BASH here. If you're on Windows 11 it will be under "see more options." Also, if you're on Windows, you can type cmd in the navigation bar and hit enter and it will open a command prompt in that folder.

      2) Or you can just open a console / terminal from the start menu or app folder and then navigate to where you are storing your repos using console.
     cd Documents\GitHub
     

    cd stands for change directory. It’s followed by the path to the folder you want to open.

    Your computer should allow tab complete. This just means you can start typing Doc, hit the tab key on your keyboard and it should autocomplete. Sometimes you’ll need to go up a directory or two. Say, for example, I’m in the GitHub folder, but I want to go to the Documents folder using the command line. To do so, I would put

     cd ..
     

    If I were to put only one dot:

     cd .\

    I could navigate to another folder inside the Documents folder.

    These are just helpful navigation options so you don’t always have to put the entire path to a folder.

    Once you’re in the folder where you want to keep the repo, you’re ready to use Git for version control.

    Important Commands

    The Git documentation is full of commands, options, flags, and various other usage information. I’d say a solid 99% of it, I’ve never used. I’m only going to go over the commands I use on a daily basis because they’re all you really need to get started. I’m going to cover what each term means, then add basic workflows at the end.

    Note: Anything in [brackets] is where you would enter in information. You would omit the brackets. For example, git init using-github would initialize an new repo called using-github.

  • git init [project-directory-name]: You only need to initialize a repo once. Each repo in your GitHub has to have a unique-to-you name. That is, you can't have two repos named Project. Good repo names are short and descriptive. If I were to initialize a Git for this project I would name it using-github because that is what this guide is about.

  • git clone [project-url]: If you have previously initialized a repo (either on the website or through GitHub Desktop) and want to add it to your current hard drive you will need to clone (copy) it. This is useful if you use a desktop and a laptop. I often initialize repos on my desktop using git init and then clone them to my laptop using git clone. You can also use git clone to clone repos created by other people. If you find a cool project on GitHub.com that you want to modify for your own use, you can clone it to your hard drive using git clone project-url].
    The [project-url] can be found on the repo website. For example, if you wanted to clone my election guide repo, you would navigate to the repo's page here. On the right hand side is a green button that says Code. You'd select it then copy the URL from the drop down. The URL you copied would go in place of [project-url] above.

  • After you initialize or clone a repo, you can work on your project. You would do work like any other time. Just navigate to the folder where your repo is located and create or modify any files you need to. When you’re done working for the day, you’re ready to stage and commit your changes.

    There are four steps you’ll always follow when working with a repo:

    1. Adding
    2. Staging
    3. Committing
    4. Pushing

    First you have to add a file, then stage it, commit it, and push it.

  • git add [filename or directory name] You'll use git add to state individual files or directories. You just add the path to the file or directory after git add then hit enter. Once staged, you'll add a message, then commit.
  • git add -A will stage all modified files.
  • git commit -m "A useful message here" before you can push your files, you have to commit them. Committing just records a change on the local hard drive. It's like taking a snapshot of a project in its current state. Messages allow you to describe the change and its justification. Git won't let you commit without adding a message so here is a good guide on writing good commit messages.
  • git push origin [branch-name] usually, [branch-name] will be main. Sometimes it will be a different branch. Origin is a way of referencing a specific repo. This way you don't have to constantly refer to its url. Pushing sends your commits to GitHub. You can then access those changes on the website or on another computer by pulling the changes to the machine.
  • git pull origin [branch name] again, [branch-name] will usually be main. Pull brings changes you pushed to GitHub onto the local machine. Think of this as syncing changes. It's especially useful when using more than one computer.
  • If you do work from multiple computers, always be sure to pull changes before you start working in the repo. It prevents a lot of headaches.

  • One of the best things about Git & GitHub is that it allows branching and forking. Branching is the most useful because it creates a temporary space for you to work without threatening the integrity of the main project.

  • checkout -b [branch name] will allow you to create and switch to a new branch
  • git branch [branch name] is how you create a branch without switching to it.
  • git checkout [branch name] allows you to switch to an existing branch.
  • If you like what you did in a branch and want to merge it with main so that you can keep the updated version of the project you’ll need to switch to the main branch and then merge.

    1
    2
    
        git checkout main
        git merge [branch name]
    

    If you don’t like what you did in a branch and want to delete it entirely, here’s how:

    1
    2
    3
    4
    5
    
        ## to delete the branch on your local machine
        git branch -d localBranchName
    
        ## to delete it it from the remote repo
        git push origin --delete remoteBranchName
    

    Workflows

    These are basic workflows that you can use.

    Initializing a repo:

    1
    2
    3
    4
    5
    
        git init [repo name]
        ## do the work you need to
        git add -A
        git commit -m "a useful message"
        git push origin main
    

    Cloning an existing repo:

    1
    2
    3
    4
    5
    
        git clone https://github.com/liz-muehlmann/Election_Guides.git
        ## do the work you need to
        git add -A
        git commit -m "a useful message"
        git push origin main
    

    Create a branch and merge it with main:

    1
    2
    3
    4
    
        git branch [branch name]
        ## work on the branch until you are happy
        git checkout main
        git merge [branch name]
    

    Conclusion

    Those are really the only commands you need to know to use Git & GitHub. I can’t stress enough how important version control is when programming - especially when working on cartography projects. One problem you’ll inevitably run into is file size. GitHub will warn you if your file is over 50MB and it will reject your push if any of your files are over 100MB.

    There are two ways around GitHub’s file limits which I go over in my post about DVC (Data Version Control). If you’re only using Git & GitHub to version control your writing, you don’t really need to worry about large file sizes. However, once you start working with datasets the file size limit gets in the way quickly.

    If you’re uncomfortable with the command line, GitHub has a desktop application that you can use which is very user friendly. You can learn about GitHub Desktop and the GitHub Website through my other posts.

    Liz | 17 Sep 2022

    tags: githubtutorialversion control