How to Use Git & GitHub: Version Control for Code

How to Use Git & GitHub: Version Control for Code

This course is a part of the Additional Resources section of my Data Science Master's.


Course Description

From the course website:

"Effective use of version control is an important and useful skill for any developer working on long-lived (or even medium-lived) projects, especially if more than one developer is involved. This course, built with input from GitHub, will introduce the basics of using version control by focusing on a particular version control system called Git and a collaboration platform called GitHub."


Why Take This Course?

From the course website:

"Git is used by many tech companies, and a public GitHub profile serves as a great portfolio for any developer. But more than that, you’ll establish an efficient programming workflow that allows you to:

  • Keep track of multiple versions of a file
  • Track bugs by reverting to previous working versions of a file
  • Seamlessly collaborate with other developers on a project

The use of tools like Git and GitHub is essential for collaborating with other developers in most professional environments."


My Review

Review in development.

Udacity Data Analyst Nanodegree

Udacity Data Analyst Nanodegree

This program is part of the Data Science Core section of my Data Science Master's.


Program Description

From the program website:

"We built this program with expert analysts and scientists at leading technology companies to ensure you master the exact skills necessary to build a career in data science.

Learn to clean up messy data, uncover patterns and insights, make predictions using machine learning, and clearly communicate critical findings."


Why Take This Nanodegree?

From the program website:

"The Data Analyst Nanodegree program is specifically designed to prepare you for a career in Data Science. As a Data Analyst, you are responsible for obtaining, analyzing, and effectively reporting on data insights ranging from business metrics to user behavior and product performance. This program’s curriculum was developed with leading industry partners to ensure students master the most cutting-edge curriculum. Graduates will emerge fully prepared for this amazing career."


My Review

Here is my review on Medium.

Here are the individual courses contained within the Nanodegree:

  • Intro to Inferential Statistics
  • Intro to Descriptive Statistics
  • Intro to Data Analysis
  • Data Wrangling
  • SQL for Data Analysis
  • MongoDB for Data Analysis
  • Data Analysis with R
  • Intro to Machine Learning
  • Data Visualization and D3.js
  • A/B Testing

Listed below (in alphabetical order) are the software, libraries, frameworks, etc. that are covered in the Nanodegree:

  • Anaconda
  • d3.js
  • dimple.js
  • dplyr
  • ggplot2
  • JSON
  • knitr
  • matplotlib
  • MongoDB
  • NumPy
  • pandas
  • Python
  • iPython Notebook
  • R
  • R Studio
  • scikit-learn
  • SQL
  • XML

Differential Equations

Differential Equations

I took this course as a sophomore in my chemical engineering program. This course is similar to this Udacity course, but with a bit of extra detail. Officially called "Ordinary Differential Equations", the course description is as follows:


Course Description

This course is an introduction to solve ordinary differential equations. Topics include:

  • First-order differential equations.
  • Linear differential equations with constant coefficients.
  • Laplace transforms.
  • Systems of linear equations.
  • Examples involving the use of ordinary differential equations in mechanical systems.

How Is It Useful?

Differential equations are not frequently used in data science or computer science in general, however they are useful for modeling economics, finance, and biology.1 Check out this iPython notebook that outlines how differential equations relate to a very common data science tool, Markov Chains.

Multivariable Calculus

Multivariable Calculus

I took this course as a freshman in my chemical engineering program. Officially called "Calculus II", the course description is as follows:


Course Description

  • More integration techniques.
  • Numerical integration, improper integrals.
  • Curves, speed, velocity.
  • Functions of several variables, partial derivatives, differentials, error estimates, gradient, maxima and minima.
  • Sequences, series, power series.
  • Taylor polynomial approximations.
  • Double and triple integrals, polar and cylindrical coordinates.
  • Applications to mass, center of mass, moment, etc.

How Is It Useful?

Math is one of the key building blocks of data science. Calculus is essential for more advanced topics in data science such as machine learning, algorithms, and advanced statistics.1 Multivariate calculus is especially applicable in machine learning.2

Calculus

Calculus

I took this course as a freshman in my chemical engineering program. This course is very similar to this Coursera course. Officially called "Calculus I", the course description is as follows:


Course Description

  • Functions, limits, derivatives.
  • Optimization, rate problems, exponentials, logarithms, inverse trigonometric functions.
  • Exponential growth as an example of a differential equation.
  • Fundamental Theorem of Calculus, Riemann integral.
  • Applications to problems involving areas, volumes, mass, charge, work, etc.
  • Integration techniques.

How Is It Useful?

Math is one of the key building blocks of data science. Calculus is essential for more advanced topics in data science such as machine learning, algorithms, and advanced statistics.1

Statistics II

Statistics II

This course is a follow-up course to an introductory course in probability and statistics. Officially called "Strategies for Process Investigations", I took this course as a senior in my chemical engineering program. The course description is as follows (chemical engineering things italicized):


Course Description

This course is designed to give you a more comprehensive understanding of how models are estimated from data, and how experimental programs can be designed to make the resulting data as informative as possible. The focus of the course is largely on empirical models, i.e., models that are estimated from data. However, the techniques for estimating parameters, making decisions about parameters, and planning experiments also apply equally to fundamental or first-principles models.

The objectives of the course are:

  • To provide you with a strong background for developing empirical models between process variables through model building, including multiple linear regression with emphasis on evaluation and interpretation of the resulting model.
  • To provide you with basic techniques for the initial screening of process variables including 2-level, complete and fractional, factorial designs, and higher-order experimental designs.

How Is It Useful?

Statistics is one of the pillars of data science. Check out this Quora page for a great answer to the question "How do data scientists use statistics?"

Statistics I

Statistics I

I took this intro to statistics course as a sophomore in my chemical engineering program. This course is essentially this Udacity course, but with a chemical engineering flair. Officially called "Analysis of Process Data", the course description is as follows (chemical engineering things italicized):


Course Description

Statistical methods for analyzing and interpreting process data are discussed, with special emphasis on techniques for continuous improvement of process operations. Topics include:

  • Graphical and numerical summaries
  • Principles of valid inference
  • Probability distributions for discrete and continuous data
  • An introduction to linear regression analysis
  • Role of data in assessing process operation
  • Identifying major problems
  • Process capability
  • Comparing process performance to target values
  • Comparing performances of two processes
  • Control charts

How Is It Useful?

Statistics is one of the pillars of data science. Check out this Quora page for a great answer to the question "How do data scientists use statistics?"

Linear Algebra

Linear Algebra

I took this course as a freshman in my chemical engineering program. This course is similar to this Udacity course, but with a bit of extra detail and minus the Python. Officially called "Introduction to Linear Algebra", the course description is as follows:


Course Description

  • Vectors, dot and cross products, lines and planes, projections.
  • Vectors in n-space.
  • Systems of linear equations.
  • Matrix algebra and linear transformations, inverses.
  • Spaces and subspaces.
  • Linear independence, basis and coordinates, dimension, rank.
  • Determinants, Cramer's Rule.
  • Eigenvectors, eigenvalues and diagonalization with applications.
  • Orthonormal bases and symmetric matrices.

How Is It Useful?

Math is one of the key building blocks of data science. Linear algebra is essential for more advanced topics in data science such as machine learning, algorithms, and advanced statistics.1 Check out this Quora page for a great summary of the use cases of linear algebra in data science.

MIT Mathematics for Computer Science

MIT Mathematics for Computer Science

This course is a part of the Bridging Module for my Data Science Master's.


Course Description

From the course website:

"This course covers elementary discrete mathematics for computer science and engineering. It emphasizes mathematical definitions and proofs as well as applicable methods. Topics include:

  • Formal logic notation, proof methods.
  • Induction, well-ordering.
  • Sets, relations.
  • Elementary graph theory.
  • Integer congruences.
  • Asymptotic notation and growth of functions.
  • Permutations and combinations, counting principles.
  • Discrete probability.
  • Further selected topics may also be covered, such as recursive definition and structural induction, state machines and invariants, recurrences, and generating functions."

Why Take This Course?

A course in discrete mathematics is a requirement for the majority of undergraduate computer science programs. Completing this course, along with the other two courses in my bridging module, means I will have completed a standard first-year computer science curriculum, plus the full mathematical and statistical core.


My Review

I just completed this course! A full review will be posted soon.

Harvard CS50: Intro to Computer Science

Harvard CS50: Introduction to Computer Science

This course is a part of the Bridging Module for my Data Science Master's.

1.  Course Description
2.  Why Take This Course?
3.  My Review
       •   Course Overview
       •   Timeline
       •   Is It Worth the Price?
       •   Learning C
       •   How Challenging is It?
       •   Addressing the PHP Haters
       •   Closing Thoughts


Course Description

From the course website:

"This is CS50x, Harvard University's introduction to the intellectual enterprises of computer science and the art of programming for majors and non-majors alike, with or without prior programming experience. An entry-level course taught by David J. Malan, CS50x teaches students how to think algorithmically and solve problems efficiently. Topics include:

  • Abstraction
  • Algorithms
  • Data structures
  • Encapsulation
  • Resource management
  • Security
  • Software engineering
  • Web development

Languages include C, PHP (edit: Python replaces PHP in Fall 2016), and JavaScript plus SQL, CSS, and HTML. Problem sets inspired by real-world domains of biology, cryptography, finance, forensics, and gaming."


Why Take This Course?

A introductory level course in computer science is a requirement for virtually all undergraduate computer science programs. Completing this course, along with the other two courses in my bridging module, means I will have completed a standard first-year computer science curriculum, plus the full mathematical and statistical core.


My Review

April 12, 2016

View this entry on Medium

High praise for Harvard’s online introduction to computer science course is not difficult to find. "It’s a cultural touchstone, a lifestyle, a spectacle," says The Harvard Crimson. YouTube CEO Susan Wojcicki says CS50 changed her life. CS50 receives near perfect scores across the board on CourseTalk, Class Central, and edX. Here are my thoughts:

Course Overview

CS50 is a true, comprehensive introduction to computer science. The course is taught by the vivacious David Malan and hosted on edX. There are 13 weeks of instruction with 8 mandatory problem sets (psets) and a final project:

  • Week 0: Binary. ASCII. Algorithms. Pseudocode. Source code. Compiler. Object code. Scratch. Statements. Boolean expressions. Conditions. Loops. Variables. Functions. Arrays. Threads. Events.
  • Week 1: Linux. C. Compiling. Libraries. Types. Standard output. pset1
  • Week 2: Casting. Imprecision. Switches. Scope. Strings. Arrays. Cryptography. pset2
  • Week 3: Command-line arguments. Searching. Sorting. Bubble sort. Selection sort. Insertion sort. O. Ω .Θ. Recursion. Merge Sort. pset3
  • Week 4: Stack. Debugging. File I/O. Hexadecimal. Strings. Pointers. Dynamic memory allocation. pset4
  • Week 5: Heap. Buffer overflow. Linked lists. Hash tables. Tries. Trees. Stacks. Queues.
  • Week 6: TCP/IP. HTTP. pset5
  • Week 7: HTML. CSS. PHP (edit: Python replaces PHP in Fall 2016). pset6
  • Week 8: MVC. SQL. pset7
  • Week 9: JavaScript. Ajax. pset8
  • Week 10: Security. Artificial intelligence.
  • Week 11: Artificial intelligence, continued.
  • Week 12: Exciting conclusion. (Spoiler alert: montages, CS50 Family Feud, cake!) Final Project

There are two lectures per week. Each lecture is 50ish minutes long. Each week has a series of shorter videos as well:

  • Walkthroughs: 1–3 minute videos of David Malan walking you through the lecture’s sample code at a slower pace
  • Section: 5–30 minute videos of a Harvard teaching fellow explaining lecture concepts in depth
  • Shorts: 5–15 minute videos of other CS50 staff members explaining lecture concepts with additional (and sometimes quirky) examples (see video below)

Problem sets are “programming assignments that challenge you to apply concepts to problems inspired by real-world domains.” They are marked by an automated grading system. Your overall mark for a problem set is the fraction of tests that your code passes (1.0 = 100%).

Timeline

There is no to-the-hour estimate from Harvard. They do state problem sets take 10–20 hours to complete. With 8 problem sets, ~5 hours of video content per week, and a final project, a ballpark estimate would be somewhere just north of 200 hours. It took me 200 hours and five minutes, as tracked by Toggl.

I spent like a day and a half figuring out how web hosting works and editing this stupid video for my final project, both of which aren’t required, so 185 hours or so is probably a more reflective number.

Is It Worth The Price?

The course is free, so yes. You have access to all of the materials and grading at no cost. Probably the best undergraduate computer science course in the world is available to anybody who has an Internet connection. That’s still so cool.

You can pay for a verified certificate, which currently costs $90, if you’d like it for personal or professional reasons.

Learning C

The first 6 weeks of Harvard CS50 are spent in C, a language notorious for its complexity. Seems like a curious choice, no? Even though I likely won’t use C much going forward, I am very glad it was the language of choice for this reason:

The advantage to knowing C is that you have a very good idea of how a computer works. Not just how your programming model works, but how memory’s laid out, and suchlike. Knowing C also lets you appreciate how much less work you have to do in a higher level language … and the cost involved in working in that higher level language.
— Frank Shearar on Programmers Stack Exchange

The CS101 equivalents from Udacity and MIT (via edX) both use Python. Nick Parlante’s Stanford course uses C, but the difficulty level of that course isn’t on par with the aforementioned three.

How Challenging is It?

It’s tough, but it’s good tough. At no point did I feel lost. The course is well-structured and there are tons of additional resources. Basically every social media platform you can think of has an official CS50 community — I found Reddit and Stack Exchange the most useful.

The CS50 staff say that the pointers section of the course is the hardest and they are right. You intimately deal with pointers and allocating memory in pset5 and pset6. Mental gymnastics are required. Finally figuring out pointers is probably the thing I am most proud of coming out of the course.

The 10–20 hours for each problem set is accurate. Your code won’t work and you will get frustrated (especially because half of the course is in C), but frustration is good. Frustration conquered is learning.

Addressing the PHP Haters

Edit (August 2016): Fall 2016’s edition of CS50 replaces PHP with Python, another high-level programming language, which nullifies the following concern.

I almost didn’t enroll in CS50 after reading this popular Quora answer regarding CS50 teaching web development using PHP:

Trust me. You DO NOT want to spend your time learning PHP in 2014.
— Anubhav Sinha on Quora

That may be true, but it’s not particularly relevant to the decision to take this course. There is only one week of lecture content and two problem sets that use PHP. You do not dedicate a significant amount of time to it. The educational function of PHP is to demonstrate how higher-level programming languages are useful.

One of the reasons why they do use PHP in CS50 is because it is heavily inspired by C. By week 7, students have a solid C foundation. Picking up PHP at that point isn’t difficult because of the syntactical similarities.

Bottom line: do not let two weeks of PHP deter you from this one-of-a-kind learning experience.

Closing Thoughts

I’m having a hard time describing CS50 without sounding hyperbolic. The course was just so damn good. This piece in The Harvard Crimson is ridiculously dramatic, but it’s so true. The content is engaging. David Malan is too good at his job. The production value is absurd. Honestly, go check out a lecture for the production value alone. Fall 2016’s edition will be even crazier: they’re shooting lectures in 6K and VR.

I have now completed CS50, Stanford CS101, and half of Udacity CS101 and without question Harvard’s introduction to computer science is my favourite. A few weeks post-graduation, I’ve already had some legitimate nostalgia. It was an experience.

Rating: ★★★★★

Udacity Intro to Programming Nanodegree

Udacity Intro to Programming Nanodegree

This Nanodegree is a part of the Bridging Module for my Data Science Master's.

1.  Program Description
2.  Why Take This Nanodegree?
3.  My Review
       •   Program Overview
       •   Does the Program Accomplish Its Goals?
       •   Timeline
       •   Is It Worth the Price?
       •   Upbeat, Welcoming Learning Environment
       •   What I Didn't Like
       •   Closing Thoughts


Program Description

From the program website:

"This introductory Nanodegree program teaches you the foundational skills all programmers use, whether they program mobile apps, create web pages, or analyze data. It is ideal for beginners who want to learn new skills, make informed choices about career goals, and set themselves up for success in career-track Nanodegree programs."


Why Take This Nanodegree?

A introductory level course in programming is a requirement for virtually all undergraduate computer science programs. Completing this program, along with the other two courses in my bridging module, means I will have completed a standard first-year computer science curriculum, plus the full mathematical and statistical core.


My Review

April 12, 2016

View this entry on Medium

The Udacity Intro to Programming Nanodegree was the first course on my radar when I decided I was going to teach myself programming. I had taken Udacity’s CS101 standalone course previously and had a very positive experience. The Nanodegree was getting consistently positive reviews. The content breadth and depth appeared to be just right for what I was looking for in an introductory program. It seemed like the perfect fit.

Program Overview

This Nanodegree, like all Nanodegrees, is a curation of Udacity’s best courses laid out in a logical order. My edition of the Intro to Programming Nanodegree was broken into five mandatory stages, with a project bookending each stage:

  • Stage 0 (optional): A bit of HTML (project)
  • Stage 1: More HTML and CSS (project)
  • Stage 2: Python and general programming concepts (strings, variables, loops, etc.) (project)
  • Stage 3: Classes, functions, abstraction, and working with pre-existing code (project)
  • Stage 4: Back-end programming, databases, and template engines (project)
  • Stage 5: A bit of JavaScript, APIs, recursion and parallel computing, solving big programming problems, and responsive web design (project)

Instead of hour-long lectures, Udacity breaks their courses down into a bunch of mini-lessons. The vast majority of videos are under five minutes and are followed by multiple-choice and code quizzes. Projects are graded by Udacity Project Reviewers.

Note: Stage 4 and 5 were revamped in February 2016. In Stage 4, you now learn a bit about popular tech roles (front-end/back-end/mobile programming, data analysis) and in Stage 5 you do a deeper dive into one of those roles.

Other note: you can view the code that generated the above-linked projects in this GitHub repository.

Does The Program Accomplish Its Goals?

One of the first videos (this one isn’t available to the public) you watch in the program references four stages of expertise: ignorance, awareness, ability, and fluency. The program aims to get students to the awareness and ability stages.

A quote from the curriculum director, Andy Brown, in that video (I'm assuming Udacity is okay with me sharing this paywalled clip in this context):

You’re going to see a lot of new topics and new ideas in this Nanodegree. Because you’re going to see so many, it means we’re just not going to be able to focus on acquiring fluency. Fluency generally takes a really long time to acquire, in even one thing.

So instead, the goal is going to be to get to ability in the most important topics that are fundamental throughout programming. In addition, you’re going to gain awareness of many topics. Through this combination of awareness and ability, you’re going to have the skills to do amazing things, as well as the understanding of what you need to do next depending on your own personal goals.
— Andy Brown, Lead Instructor at Udacity

Prior to enrolling, I knew next to nothing about programming. I am now confident that I can actually do things with my newfound ability. I was also made aware of potential programming paths, was given a chance to decide if they interest me, and was shown how I can go about pursuing them. For me, the Nanodegree accomplished its goals.

Timeline

Udacity says it takes students 190 hours on average to complete the program. It took me 189 hours over 2.5 months, as timed by Toggl. You can definitely complete the program faster than that if you rush through the projects a little bit. I dedicated a fair amount of time trying to make my projects more aesthetically pleasing than they needed to be to receive a passing grade.

Is It Worth The Price?

This is subjective. At $200/month with a 50% refund if you complete the program within 12 months, my bill ended up being $300.

You can take the courses for free, but you won’t have access to the curated content, the forums, project reviews, one-on-one coaching, and the certificate. On the usefulness of these features:

  • Curated content: The curated content kept the program concise and was nice to have, but there is a free workaround.
  • Forums: I used the forums a lot, especially when I ran into bugs in my projects. They were often more useful than StackOverflow forums (and other Google-able forums) because students often seemed to run into the same bugs. Udacity instructors were also quick on the trigger in responding to posts, often within a few hours.
  • Project reviews: Very useful. Reviews would often come in a few hours after project submission. The correction of mistakes plus the reassurance that you are doing good work is an important part of the learning process. Reviewers also suggest ways in which to improve your code, even if it’s “just” stylistic improvements.
  • One-on-one coaching: I didn’t use the one-on-one coaching for this Nanodegree, but I probably will for more complex ones.
  • Certificate: The certificate is nice to have, but probably not that useful.

The answer to this question will vary depending on the goals of the student.

Upbeat, Welcoming Learning Environment

The general positivity of the program’s instructors is very noticeable and it is awesome. The company’s upbeat, welcoming learning environment is a major reason why I am sticking with Udacity as my main source of online education.

From left to right: Jessica Uelmen, Kunal Chawla, Miriam Swords Kalk, Cameron Pittman, Steve Huffman, Dave Evans, Andy Brown.

Udacity CEO Sebastian Thrun’s industry connections are evident as well through cameos from Google co-founder Sergey Brin, Reddit co-founder Steve Huffman, Google Director of Research Peter Norvig, Junior (Stanford’s self-driving car), and Sebastian himself.

What I Didn't Like

Honestly, not much. The office hours were meh. These were 45-60 minute pre-recorded webcasts led by a few Udacity coaches that occurred a handful of times each stage. There were often a few key insights in each webcast, however it was difficult to determine what parts I could skip safely skip over. I ended up wasting a decent amount of time on things I didn’t need to watch.

Closing Thoughts

This program is a great fit for people who know they would like to pursue programming, but are unsure of which discipline to target.

One thing to note: this Nanodegree teaches you programming, not computer science. There is a difference that beginners may not be acutely aware of before they start the program.

Probably the most important thing you leave this Nanodegree with is the ability to think like a programmer. The instructors repeatedly come back to the five methods of thinking in the screenshot below:

The terms are a bit jargon-y, but all they mean is that you are able to solve problems in a structured and efficient manner. This skill is important in programming, as well as in other areas of life.

Rating: ★★★★★