Monday, June 11, 2012

Critical skills for new computer science graduates

So, you've decided to go to college for computer science. Good for you. I'm here to tell you that most of the time, you won't care about most of what you've been taught. In fact, what you need to know might not be obvious at all. In many cases, it's the secondary tools, not the abstract knowledge, that's most important, and that is a rarely communicated view that many graduates I see have never been informed of.

Here's what you need to learn (much of which you probably won't unless you seek out the knowledge on your own) before you graduate and start looking for work. When I interview people, I expect you to know these fundamentals as a starting point, and I ask questions that build past these basics. If everything on this list isn't second nature to you, then you probably won't be able to deal with the kinds of things that I'm going to want you to figure out in an interview at all.


Version Control

Most college professors that I've run into don't care about how you store your work. They just care that you "submit" it, and submission is usually a very non-complex process (e.g. email, form submission, etc.) In the real world, you'll usually have a shared version control repository using something like subversion or git for a team and they'll share their work that way, so you're going to want to come up to speed. Knowing how to check a file in and get it back out is a good start, but in reality, the world has changed and you need to keep up. At a minimum, I'd say the new graduate needs to be able to do all of the following in a modern, distributed VCS like git or mercurial:

  • Create an initial repository
  • Check out / clone an existing repository
  • Create a branch
  • Keep work up-to-date on a branch while work continues on the mainline
  • Merge topic-branch work with the mainline
  • Read the logs and learn the tools for searching for specific changes
Much of the knowledge gained will be specific to the tool you use, but at least you'll know the basic feel of the way these things are done in modern tools, and that will allow you to learn quickly in new environments.

Scripting

One of the more common traits I see in graduating students, these days, is that many of them know a spectrum of general-purpose languages, but few know how to do quick-and-dirty work. You should, at a minimum, know a Unix-like system (MacOS or Linux, for example) and how to use its shell to write simple scripts. Beyond that, I'd recommend some familiarity with Perl command-line usage (e.g. something like

  find work -type f | xargs -l5 -- perl -i.bak -ple 's/one/two/g'

should be trivial for you. This may seem useless, but when you need to scrounge a filesystem, find a handful of files and transform them quickly, you will thank me.

For higher-level scripting I'd suggest Python or Ruby at this point. One of those should at least be in your toolbox, even if you work in a different language routinely.

The Web

There isn't an industry that hasn't been impacted by the Web at this point. It's not just content-generators ecommerce shops that have to know the Web inside and out anymore. RESTful APIs and other HTTP-based communications mean that everyone really needs to get familiar with the tools of programming the Web. I'd say that if you have a passing understanding of the HTTP, URI and MIME specifications along with the general sort of flavor of an API over the Web using something like REST, SOAP or the like, then you should be in good shape.

Client-side Web work is a large, but relatively well-defined niche. If you're going there, then there are tools you need to know how to use (pretty much everything that's part of HTML5), but if you're not, then you can probably get by with a passing familiarity with HTML of any flavor and maybe a little JavaScript.

Server-side Web work is a fragmented mess. If you plan to work on the server-side of the Web, then in general, you should at least have been exposed to some sort of modern Web framework like Rails, Django or similar. If you don't plan to be in that space, then it's probably enough to be aware of the general shape of at least one framework, but you might never have done anything substantial in one.

Databases

Knowing a particular database isn't that interesting. Knowing SQL cold is probably non-negotiable. You should at least be able to construct complex joins. You should also know how to use SQL from your language of choice and understand what a binding vs. string interpolation is.

For your own productivity, I suggest knowing a lightweight, user-instantiable database like sqlite, though that's really an optional item and easily picked up later.

VMs

Virtual machines are an essential tool for the modern programmer. Being able to slap a working system on top of whatever junk you've been handed can increase your productivity several-fold. Virtual Box, kvm, VMware and others make up the core offerings in this space. Learn them. Use them. Love them.

Editors

Start by picking up emacs and vim skills on a Linux or MacOS system. Yes, I said and. You should know how to use both of the prevalent project-level editors for the system because you might find yourself in a situation where one or the other will make your life radically easier. Make it a point to find out how to do each new thing you learn in one, in the other. Understand their strengths and weaknesses.

On Windows and MacOS I recommend getting familiar with eclipse as well, even if you don't plan to do any work in Java. It's an important tool and well worth learning. A great way to dive into eclipse is to write yourself an Android app. This was how I taught myself.

There are plenty of other textual editors out there, but those three are an excellent place to start.

Operating Systems

You've heard me mention MacOS and Linux quite a bit, but Windows very little. There's a reason for that. Windows is dying as a platform. Most of the most important tools for it were all developed elsewhere, these days (Chrome, Firefox, Photoshop, etc.), and all of Microsoft's attempts to re-cast the system as relevant to new market niches (such as handsets) have failed miserably. I recommend that you know how to use a Windows box and do simple things like install a VM on it to do real work, but if that's all you know, you're probably better off.

Cryptography

A deep understanding of cryptography isn't required of most programmers, but it's impossible to get away reasonably without knowing how the basics work. You should understand what hashing is in a crypto context as well as the differences between private and public key (AKA symmetric and asymmetric) crypto. You should understand that SHA1 has been demonstrated to be weak and you should understand it deeply enough to know that, in most contexts, you don't care. You should be able to carry on a cogent conversation about the value of longer pass-phrases and where and when multi-factor authentication is key (pun intended). If you can't write ElGamal from memory, that's OK. You certainly don't need to be a crypto expert in order to write code in most programming jobs.

Security

At the very least, you should understand why and when security is important to your coding. You should understand what a buffer overflow is and what a NAT-based firewall does and does not protect you from. These are the fundamentals that you need in order to evaluate the security of your own applications and to understand when your code needs to be security-aware.

2 comments:

  1. Wow this is a great essay! I agree on a number of fronts, but just to touch on a couple of points:

    * Editors - Interesting that you suggest learning BOTH emacs and Vim, this runs afoul of the conventional "learn one editor - learn it WELL' wisdom, but I think it's very smart advice in a number of ways. Sure, they're very different tools that are capable of doing much the same thing, but becoming comfortable with each will really give you new insight into solving certain kinds of problems.

    * Scripting - I'm curious about whether you think the cognitive load inherent in learning Perl even enough to be capable of effectively creating simple one liners is worth it as compared to learning some of the stock tools like sed or awk for similar tasks?

    * Operating Systems - Couldn't agree more. I'd also advise against thinking you can get away with working in Windows and using tools like Cygwin etc. They're great for what they are, but really don't teach the fundamentals the way you should learn them. Mostly people seem to choose Windows because some game or other won't run in MacOS / Linux. Just dual boot and be done if you must have a particular game :)

    ReplyDelete
  2. Feoh, my focus, here, was on flexibility. If you learn Perl, you'll never be scared by a regular expression again. If you learn vi, you'll have to get used to modal editing. These are skills that will cary you over into other environments as they emerge, even if you choose not to use them again.

    As for Perl presenting a cognitive load... I'm not sure there's any context in which that's a bad thing. If it's hard to get work done, that's one problem, but if you just have to think about it more... I have no particular problem with that. Also, exposing yourself to Perl means you expose yourself to a massive arsenal of external tools in CPAN, which is an experience everyone should have (especially language designers... I find it shocking that a module distribution and management system is considered an afterthought for any language these days).

    ReplyDelete