David Venturi

March 26, 2016

How to Use Git & GitHub: Version Control for Code

David Venturi

March 26, 2016

How to Use Git & GitHub: Version Control for Code

This course is a part of the Additional Resources section of my Data Science Master's.

Before we begin

Hey, it’s David. I wrote this overview back in 2016. Since then, I’ve become a professional data analyst and created courses for multiple industry-leading online education companies.

Do you want to become a data analyst, without spending 4 years and $41,762 to go to university? Follow my latest 27-day curriculum and learn alongside other aspiring data pros.

Okay, back to the overview.

Course Description

From the course website:

"Effective use of version control is an important and useful skill for any developer working on long-lived (or even medium-lived) projects, especially if more than one developer is involved. This course, built with input from GitHub, will introduce the basics of using version control by focusing on a particular version control system called Git and a collaboration platform called GitHub."

Why Take This Course?

From the course website:

"Git is used by many tech companies, and a public GitHub profile serves as a great portfolio for any developer. But more than that, you’ll establish an efficient programming workflow that allows you to:

Keep track of multiple versions of a file
Track bugs by reverting to previous working versions of a file
Seamlessly collaborate with other developers on a project

The use of tools like Git and GitHub is essential for collaborating with other developers in most professional environments."

Back to Data Science Master's

David Venturi

January 30, 2016

Udacity Data Analyst Nanodegree

David Venturi

January 30, 2016

100% Complete

Udacity Data Analyst Nanodegree

This program is part of the Data Science Core section of my Data Science Master's.

Before we begin

Hey, it’s David. I wrote this review back in 2016. Since then, I’ve become a professional data analyst and created courses for multiple industry-leading online education companies.

Do you want to become a data analyst, without spending 4 years and $41,762 to go to university? Follow my latest 27-day curriculum and learn alongside other aspiring data pros.

Okay, back to the review.

Program Description

From the program website:

"We built this program with expert analysts and scientists at leading technology companies to ensure you master the exact skills necessary to build a career in data science.

Learn to clean up messy data, uncover patterns and insights, make predictions using machine learning, and clearly communicate critical findings."

Why Take This Nanodegree?

From the program website:

"The Data Analyst Nanodegree program is specifically designed to prepare you for a career in Data Science. As a Data Analyst, you are responsible for obtaining, analyzing, and effectively reporting on data insights ranging from business metrics to user behavior and product performance. This program’s curriculum was developed with leading industry partners to ensure students master the most cutting-edge curriculum. Graduates will emerge fully prepared for this amazing career."

My Review

Here is my review on Medium.

Here are the individual courses contained within the Nanodegree:

Intro to Inferential Statistics
Intro to Descriptive Statistics
Intro to Data Analysis
Data Wrangling
SQL for Data Analysis
MongoDB for Data Analysis
Data Analysis with R
Intro to Machine Learning
Data Visualization and D3.js
A/B Testing

Listed below (in alphabetical order) are the software, libraries, frameworks, etc. that are covered in the Nanodegree:

Anaconda
d3.js
dimple.js
dplyr
ggplot2
JSON
knitr
matplotlib
MongoDB
NumPy
pandas
Python
iPython Notebook
R
R Studio
scikit-learn
SQL
XML

Back to Data Science Master's

David Venturi

January 30, 2016

Differential Equations

David Venturi

January 30, 2016

Differential Equations

I took this course as a sophomore in my chemical engineering program. This course is similar to this Udacity course, but with a bit of extra detail. Officially called "Ordinary Differential Equations", the course description is as follows:

Course Description

This course is an introduction to solve ordinary differential equations. Topics include:

First-order differential equations.
Linear differential equations with constant coefficients.
Laplace transforms.
Systems of linear equations.
Examples involving the use of ordinary differential equations in mechanical systems.

How Is It Useful?

Differential equations are not frequently used in data science or computer science in general, however they are useful for modeling economics, finance, and biology.¹ Check out this iPython notebook that outlines how differential equations relate to a very common data science tool, Markov Chains.

Back to Data Science Master's

David Venturi

January 30, 2016

Multivariable Calculus

David Venturi

January 30, 2016

Multivariable Calculus

I took this course as a freshman in my chemical engineering program. Officially called "Calculus II", the course description is as follows:

Course Description

More integration techniques.
Numerical integration, improper integrals.
Curves, speed, velocity.
Functions of several variables, partial derivatives, differentials, error estimates, gradient, maxima and minima.
Sequences, series, power series.
Taylor polynomial approximations.
Double and triple integrals, polar and cylindrical coordinates.
Applications to mass, center of mass, moment, etc.

How Is It Useful?

Math is one of the key building blocks of data science. Calculus is essential for more advanced topics in data science such as machine learning, algorithms, and advanced statistics.¹ Multivariate calculus is especially applicable in machine learning.²

Back to Data Science Master's

David Venturi

January 30, 2016

Calculus

David Venturi

January 30, 2016

Calculus

I took this course as a freshman in my chemical engineering program. This course is very similar to this Coursera course. Officially called "Calculus I", the course description is as follows:

Course Description

Functions, limits, derivatives.
Optimization, rate problems, exponentials, logarithms, inverse trigonometric functions.
Exponential growth as an example of a differential equation.
Fundamental Theorem of Calculus, Riemann integral.
Applications to problems involving areas, volumes, mass, charge, work, etc.
Integration techniques.

How Is It Useful?

Math is one of the key building blocks of data science. Calculus is essential for more advanced topics in data science such as machine learning, algorithms, and advanced statistics.¹

Back to Data Science Master's

David Venturi

January 30, 2016

Statistics II

David Venturi

January 30, 2016

Statistics II

This course is a follow-up course to an introductory course in probability and statistics. Officially called "Strategies for Process Investigations", I took this course as a senior in my chemical engineering program. The course description is as follows (chemical engineering things italicized):

Course Description

This course is designed to give you a more comprehensive understanding of how models are estimated from data, and how experimental programs can be designed to make the resulting data as informative as possible. The focus of the course is largely on empirical models, i.e., models that are estimated from data. However, the techniques for estimating parameters, making decisions about parameters, and planning experiments also apply equally to fundamental or first-principles models.

The objectives of the course are:

To provide you with a strong background for developing empirical models between process variables through model building, including multiple linear regression with emphasis on evaluation and interpretation of the resulting model.
To provide you with basic techniques for the initial screening of process variables including 2-level, complete and fractional, factorial designs, and higher-order experimental designs.

How Is It Useful?

Statistics is one of the pillars of data science. Check out this Quora page for a great answer to the question "How do data scientists use statistics?"

Back to Data Science Master's

David Venturi

January 30, 2016

Statistics I

David Venturi

January 30, 2016

Statistics I

I took this intro to statistics course as a sophomore in my chemical engineering program. This course is essentially this Udacity course, but with a chemical engineering flair. Officially called "Analysis of Process Data", the course description is as follows (chemical engineering things italicized):

Course Description

Statistical methods for analyzing and interpreting process data are discussed, with special emphasis on techniques for continuous improvement of process operations. Topics include:

Graphical and numerical summaries
Principles of valid inference
Probability distributions for discrete and continuous data
An introduction to linear regression analysis
Role of data in assessing process operation
Identifying major problems
Process capability
Comparing process performance to target values
Comparing performances of two processes
Control charts

How Is It Useful?

Statistics is one of the pillars of data science. Check out this Quora page for a great answer to the question "How do data scientists use statistics?"

Back to Data Science Master's

David Venturi

January 30, 2016

Linear Algebra

David Venturi

January 30, 2016

Linear Algebra

I took this course as a freshman in my chemical engineering program. This course is similar to this Udacity course, but with a bit of extra detail and minus the Python. Officially called "Introduction to Linear Algebra", the course description is as follows:

Course Description

Vectors, dot and cross products, lines and planes, projections.
Vectors in n-space.
Systems of linear equations.
Matrix algebra and linear transformations, inverses.
Spaces and subspaces.
Linear independence, basis and coordinates, dimension, rank.
Determinants, Cramer's Rule.
Eigenvectors, eigenvalues and diagonalization with applications.
Orthonormal bases and symmetric matrices.

How Is It Useful?

Math is one of the key building blocks of data science. Linear algebra is essential for more advanced topics in data science such as machine learning, algorithms, and advanced statistics.¹ Check out this Quora page for a great summary of the use cases of linear algebra in data science.

Back to Data Science Master's

David Venturi

January 29, 2016

MIT Mathematics for Computer Science

David Venturi

January 29, 2016

100% Complete

MIT Mathematics for Computer Science

This course is a part of the Bridging Module for my Data Science Master's.

Before we begin

Hey, it’s David. I wrote this overview back in 2016. Since then, I’ve become a professional data analyst and created courses for multiple industry-leading online education companies.

Do you want to become a data analyst, without spending 4 years and $41,762 to go to university? Follow my latest 27-day curriculum and learn alongside other aspiring data pros.

Okay, back to the overview.

Course Description

From the course website:

"This course covers elementary discrete mathematics for computer science and engineering. It emphasizes mathematical definitions and proofs as well as applicable methods. Topics include:

Formal logic notation, proof methods.
Induction, well-ordering.
Sets, relations.
Elementary graph theory.
Integer congruences.
Asymptotic notation and growth of functions.
Permutations and combinations, counting principles.
Discrete probability.
Further selected topics may also be covered, such as recursive definition and structural induction, state machines and invariants, recurrences, and generating functions."

Why Take This Course?

A course in discrete mathematics is a requirement for the majority of undergraduate computer science programs. Completing this course, along with the other two courses in my bridging module, means I will have completed a standard first-year computer science curriculum, plus the full mathematical and statistical core.

Back to Data Science Master's

David Venturi

January 29, 2016

Harvard CS50: Intro to Computer Science

David Venturi

January 29, 2016

100% Complete

Harvard CS50: Introduction to Computer Science

This course is a part of the Bridging Module for my Data Science Master's.

1.  Course Description
2.  Why Take This Course?
3.  My Review
       •   Course Overview
       •   Timeline
       •   Is It Worth the Price?
       •   Learning C
       •   How Challenging is It?
       •   Addressing the PHP Haters
       •   Closing Thoughts

Before we begin

Hey, it’s David. I wrote this review back in 2016. Since then, I’ve become a professional data analyst and created courses for multiple industry-leading online education companies.

Do you want to become a data analyst, without spending 4 years and $41,762 to go to university? Follow my latest 27-day curriculum and learn alongside other aspiring data pros. My top programming course recommendation for 2023 is in there, too.

Okay, back to the review.

Course Description

From the course website:

"This is CS50x, Harvard University's introduction to the intellectual enterprises of computer science and the art of programming for majors and non-majors alike, with or without prior programming experience. An entry-level course taught by David J. Malan, CS50x teaches students how to think algorithmically and solve problems efficiently. Topics include:

Abstraction
Algorithms
Data structures
Encapsulation
Resource management
Security
Software engineering
Web development

Languages include C, PHP (edit: Python replaces PHP in Fall 2016), and JavaScript plus SQL, CSS, and HTML. Problem sets inspired by real-world domains of biology, cryptography, finance, forensics, and gaming."

Why Take This Course?

A introductory level course in computer science is a requirement for virtually all undergraduate computer science programs. Completing this course, along with the other two courses in my bridging module, means I will have completed a standard first-year computer science curriculum, plus the full mathematical and statistical core.

My Review

April 12, 2016

View this entry on Medium

High praise for Harvard’s online introduction to computer science course is not difficult to find. "It’s a cultural touchstone, a lifestyle, a spectacle," says The Harvard Crimson. YouTube CEO Susan Wojcicki says CS50 changed her life. CS50 receives near perfect scores across the board on CourseTalk, Class Central, and edX. Here are my thoughts.

Course Overview

CS50 is a true, comprehensive introduction to computer science. The course is taught by the vivacious David Malan and hosted on edX. There are 13 weeks of instruction with 8 mandatory problem sets (psets) and a final project:

Week 0: Binary. ASCII. Algorithms. Pseudocode. Source code. Compiler. Object code. Scratch. Statements. Boolean expressions. Conditions. Loops. Variables. Functions. Arrays. Threads. Events.
Week 1: Linux. C. Compiling. Libraries. Types. Standard output. pset1
Week 2: Casting. Imprecision. Switches. Scope. Strings. Arrays. Cryptography. pset2
Week 3: Command-line arguments. Searching. Sorting. Bubble sort. Selection sort. Insertion sort. O. Ω .Θ. Recursion. Merge Sort. pset3
Week 4: Stack. Debugging. File I/O. Hexadecimal. Strings. Pointers. Dynamic memory allocation. pset4
Week 5: Heap. Buffer overflow. Linked lists. Hash tables. Tries. Trees. Stacks. Queues.
Week 6: TCP/IP. HTTP. pset5
Week 7: HTML. CSS. PHP (edit: Python replaces PHP in Fall 2016). pset6
Week 8: MVC. SQL. pset7
Week 9: JavaScript. Ajax. pset8
Week 10: Security. Artificial intelligence.
Week 11: Artificial intelligence, continued.
Week 12: Exciting conclusion. (Spoiler alert: montages, CS50 Family Feud, cake!) Final Project

There are two lectures per week. Each lecture is 50ish minutes long. Each week has a series of shorter videos as well:

Walkthroughs: 1–3 minute videos of David Malan walking you through the lecture’s sample code at a slower pace
Section: 5–30 minute videos of a Harvard teaching fellow explaining lecture concepts in depth
Shorts: 5–15 minute videos of other CS50 staff members explaining lecture concepts with additional (and sometimes quirky) examples (see video below)

Problem sets are “programming assignments that challenge you to apply concepts to problems inspired by real-world domains.” They are marked by an automated grading system. Your overall mark for a problem set is the fraction of tests that your code passes (1.0 = 100%).

Timeline

There is no to-the-hour estimate from Harvard. They do state problem sets take 10–20 hours to complete. With 8 problem sets, ~5 hours of video content per week, and a final project, a ballpark estimate would be somewhere just north of 200 hours. It took me 200 hours and five minutes, as tracked by Toggl.

I spent like a day and a half figuring out how web hosting works and editing this stupid video for my final project, both of which aren’t required, so 185 hours or so is probably a more reflective number.

Is It Worth The Price?

The course is free, so yes. You have access to all of the materials and grading at no cost. Probably the best undergraduate computer science course in the world is available to anybody who has an Internet connection. That’s still so cool.

You can pay for a verified certificate, which currently costs $90, if you’d like it for personal or professional reasons.

Learning C

The first 6 weeks of Harvard CS50 are spent in C, a language notorious for its complexity. Seems like a curious choice, no? Even though I likely won’t use C much going forward, I am very glad it was the language of choice for this reason:

“The advantage to knowing C is that you have a very good idea of how a computer works. Not just how your programming model works, but how memory’s laid out, and suchlike. Knowing C also lets you appreciate how much less work you have to do in a higher level language … and the cost involved in working in that higher level language.”

— Frank Shearar on Programmers Stack Exchange

The CS101 equivalents from Udacity and MIT (via edX) both use Python. Nick Parlante’s Stanford course uses C, but the difficulty level of that course isn’t on par with the aforementioned three.

How Challenging is It?

It’s tough, but it’s good tough. At no point did I feel lost. The course is well-structured and there are tons of additional resources. Basically every social media platform you can think of has an official CS50 community — I found Reddit and Stack Exchange the most useful.

The CS50 staff say that the pointers section of the course is the hardest and they are right. You intimately deal with pointers and allocating memory in pset5 and pset6. Mental gymnastics are required. Finally figuring out pointers is probably the thing I am most proud of coming out of the course.

This pset REALLY made you understand how pointers and allocating memory work. Learned a ton about linked lists and hash tables as well.
— David Venturi (@venturidb) February 16, 2016

The 10–20 hours for each problem set is accurate. Your code won’t work and you will get frustrated (especially because half of the course is in C), but frustration is good. Frustration conquered is learning.

Addressing the PHP Haters

Edit (August 2016): Fall 2016’s edition of CS50 replaces PHP with Python, another high-level programming language, which nullifies the following concern.

I almost didn’t enroll in CS50 after reading this popular Quora answer regarding CS50 teaching web development using PHP:

“Trust me. You DO NOT want to spend your time learning PHP in 2014.”

— Anubhav Sinha on Quora

That may be true, but it’s not particularly relevant to the decision to take this course. There is only one week of lecture content and two problem sets that use PHP. You do not dedicate a significant amount of time to it. The educational function of PHP is to demonstrate how higher-level programming languages are useful.

One of the reasons why they do use PHP in CS50 is because it is heavily inspired by C. By week 7, students have a solid C foundation. Picking up PHP at that point isn’t difficult because of the syntactical similarities.

Bottom line: do not let two weeks of PHP deter you from this one-of-a-kind learning experience.

Closing Thoughts

I’m having a hard time describing CS50 without sounding hyperbolic. The course was just so damn good. This piece in The Harvard Crimson is ridiculously dramatic, but it’s so true. The content is engaging. David Malan is too good at his job. The production value is absurd. Honestly, go check out a lecture for the production value alone. Fall 2016’s edition will be even crazier: they’re shooting lectures in 6K and VR.

I have now completed CS50, Stanford CS101, and half of Udacity CS101 and without question Harvard’s introduction to computer science is my favourite. A few weeks post-graduation, I’ve already had some legitimate nostalgia. It was an experience.

Rating: ★★★★★

Back to Data Science Master's

Back to Blog

David Venturi

January 29, 2016

Udacity Intro to Programming Nanodegree

David Venturi

January 29, 2016

100% Complete

Udacity Intro to Programming Nanodegree

This Nanodegree is a part of the Bridging Module for my Data Science Master's.

1.  Program Description
2.  Why Take This Nanodegree?
3.  My Review
       •   Program Overview
       •   Does the Program Accomplish Its Goals?
       •   Timeline
       •   Is It Worth the Price?
       •   Upbeat, Welcoming Learning Environment
       •   What I Didn't Like
       •   Closing Thoughts

Before we begin

Hey, it’s David. I wrote this review back in 2016. Since then, I’ve become a professional data analyst and created courses for multiple industry-leading online education companies.

Okay, back to the review.

Program Description

From the program website:

"This introductory Nanodegree program teaches you the foundational skills all programmers use, whether they program mobile apps, create web pages, or analyze data. It is ideal for beginners who want to learn new skills, make informed choices about career goals, and set themselves up for success in career-track Nanodegree programs."

Why Take This Nanodegree?

A introductory level course in programming is a requirement for virtually all undergraduate computer science programs. Completing this program, along with the other two courses in my bridging module, means I will have completed a standard first-year computer science curriculum, plus the full mathematical and statistical core.

My Review

April 12, 2016

View this entry on Medium

The Udacity Intro to Programming Nanodegree was the first course on my radar when I decided I was going to teach myself programming. I had taken Udacity’s CS101 standalone course previously and had a very positive experience. The Nanodegree was getting consistently positive reviews. The content breadth and depth appeared to be just right for what I was looking for in an introductory program. It seemed like the perfect fit.

Program Overview

This Nanodegree, like all Nanodegrees, is a curation of Udacity’s best courses laid out in a logical order. My edition of the Intro to Programming Nanodegree was broken into five mandatory stages, with a project bookending each stage:

Stage 0 (optional): A bit of HTML (project)
Stage 1: More HTML and CSS (project)
Stage 2: Python and general programming concepts (strings, variables, loops, etc.) (project)
Stage 3: Classes, functions, abstraction, and working with pre-existing code (project)
Stage 4: Back-end programming, databases, and template engines (project)
Stage 5: A bit of JavaScript, APIs, recursion and parallel computing, solving big programming problems, and responsive web design (project)

Instead of hour-long lectures, Udacity breaks their courses down into a bunch of mini-lessons. The vast majority of videos are under five minutes and are followed by multiple-choice and code quizzes. Projects are graded by Udacity Project Reviewers.

Note: Stage 4 and 5 were revamped in February 2016. In Stage 4, you now learn a bit about popular tech roles (front-end/back-end/mobile programming, data analysis) and in Stage 5 you do a deeper dive into one of those roles.

Other note: you can view the code that generated the above-linked projects in this GitHub repository.

Does The Program Accomplish Its Goals?

One of the first videos (this one isn’t available to the public) you watch in the program references four stages of expertise: ignorance, awareness, ability, and fluency. The program aims to get students to the awareness and ability stages.

A quote from the curriculum director, Andy Brown, in that video (I'm assuming Udacity is okay with me sharing this paywalled clip in this context):

“You’re going to see a lot of new topics and new ideas in this Nanodegree. Because you’re going to see so many, it means we’re just not going to be able to focus on acquiring fluency. Fluency generally takes a really long time to acquire, in even one thing.

So instead, the goal is going to be to get to ability in the most important topics that are fundamental throughout programming. In addition, you’re going to gain awareness of many topics. Through this combination of awareness and ability, you’re going to have the skills to do amazing things, as well as the understanding of what you need to do next depending on your own personal goals.”

— Andy Brown, Lead Instructor at Udacity

Prior to enrolling, I knew next to nothing about programming. I am now confident that I can actually do things with my newfound ability. I was also made aware of potential programming paths, was given a chance to decide if they interest me, and was shown how I can go about pursuing them. For me, the Nanodegree accomplished its goals.

Timeline

Udacity says it takes students 190 hours on average to complete the program. It took me 189 hours over 2.5 months, as timed by Toggl. You can definitely complete the program faster than that if you rush through the projects a little bit. I dedicated a fair amount of time trying to make my projects more aesthetically pleasing than they needed to be to receive a passing grade.

Is It Worth The Price?

This is subjective. At $200/month with a 50% refund if you complete the program within 12 months, my bill ended up being $300.

You can take the courses for free, but you won’t have access to the curated content, the forums, project reviews, one-on-one coaching, and the certificate. On the usefulness of these features:

Curated content: The curated content kept the program concise and was nice to have, but there is a free workaround.
Forums: I used the forums a lot, especially when I ran into bugs in my projects. They were often more useful than StackOverflow forums (and other Google-able forums) because students often seemed to run into the same bugs. Udacity instructors were also quick on the trigger in responding to posts, often within a few hours.
Project reviews: Very useful. Reviews would often come in a few hours after project submission. The correction of mistakes plus the reassurance that you are doing good work is an important part of the learning process. Reviewers also suggest ways in which to improve your code, even if it’s “just” stylistic improvements.
One-on-one coaching: I didn’t use the one-on-one coaching for this Nanodegree, but I probably will for more complex ones.
Certificate: The certificate is nice to have, but probably not that useful.

The answer to this question will vary depending on the goals of the student.

Upbeat, Welcoming Learning Environment

The general positivity of the program’s instructors is very noticeable and it is awesome. The company’s upbeat, welcoming learning environment is a major reason why I am sticking with Udacity as my main source of online education.

From left to right: Jessica Uelmen, Kunal Chawla, Miriam Swords Kalk, Cameron Pittman, Steve Huffman, Dave Evans, Andy Brown.

Udacity CEO Sebastian Thrun’s industry connections are evident as well through cameos from Google co-founder Sergey Brin, Reddit co-founder Steve Huffman, Google Director of Research Peter Norvig, Junior (Stanford’s self-driving car), and Sebastian himself.

What I Didn't Like

Honestly, not much. The office hours were meh. These were 45-60 minute pre-recorded webcasts led by a few Udacity coaches that occurred a handful of times each stage. There were often a few key insights in each webcast, however it was difficult to determine what parts I could skip safely skip over. I ended up wasting a decent amount of time on things I didn’t need to watch.

Closing Thoughts

This program is a great fit for people who know they would like to pursue programming, but are unsure of which discipline to target.

One thing to note: this Nanodegree teaches you programming, not computer science. There is a difference that beginners may not be acutely aware of before they start the program.

Probably the most important thing you leave this Nanodegree with is the ability to think like a programmer. The instructors repeatedly come back to the five methods of thinking in the screenshot below:

The terms are a bit jargon-y, but all they mean is that you are able to solve problems in a structured and efficient manner. This skill is important in programming, as well as in other areas of life.

Rating: ★★★★★

Back to Data Science Master's

Back to Blog