Aaron's Essays: software

Showing posts with label software. Show all posts

Saturday, December 21, 2013

Solving the Wikipedia problem

Today I was reading an AMA discussion on reddit from a Wikipedia admin. Much of the conversation centered around either delitionism (editors and admins on Wikipedia who want to delete lots of pages that they don't consider important) or the creeping vandalism of special interests. I've long thought that both problems could be solved by creating a new service to replace Wikipedia, so let me put my thoughts down in writing.

Something new every day: Bourne Shell variables

I've worked with the Unix Operating System and its variants since the late 1980s. I've worked with the Bourne Again Shell (bash) since the early 1990s. And yet today I learned something new about variable expansion. In the startup scripts for a source code indexing system called OpenGrok, I found this gem:

somecommand ${PROG:+-c} ${PROG}

Now, I know that ${FOO-bar} will be replaced with the value of $FOO if it is currently set or "bar" if it's not. That much I learned many years ago, but this usage of "+" was new to me. After some testing, I found that "+" substitutes the following text if and only if the variable is set, otherwise it substitutes nothing. Thus if $PROG were set to "foo", the above text would execute:

somecommand -c foo

But if $PROG were not set, then somecommand would be run with no arguments at all. Very slick!

How I managed to go over 20 years without learning that, I'm unsure (then again, perhaps I've learned and forgotten it...)

Friday, October 8, 2010

Google's WEBP image format and World of Warcraft

I got my hands on Google's new image file format today and started testing it on some images I had lying around. I specifically wanted to see how it did with bad JPEGs of rendered sceenes. Hard edges and re-processed artifacts are something that JPEG traditionally handles very poorly, so there's some real, practical benefit to having a new format that can do these things well.

I took 9.5MB of input screenshots from my library and got 3.9MB of images out at 75% quality. Now, 75% in JPEG is pretty poor, but these don't look all that bad at all. At least not worse overall than the input (WoW compresses its screenshots pretty heavily).

More interestingly, however, here are some visual comparisons. All of these images are PNG format, converted either from the original JPEG input or from the converted WEBP at the given resolution.

Original screenshot (cropped region saved as PNG from original JPEG). Full file size: 541KB.
This is the 75% quality version of the same image as WEBP. Full file size: 257KB.
And this is the same WEBP conversion again, this time at 85%. Full file size: 347KB.
A second original image, again cropped and saved as PNG from a JPEG original screenshot. Full file size: 189KB. (re-compressed at 75% quality as JPEG, it was 132KB).
A 75% quality WEBP conversion. Full file size: 72KB.
An 85% quality WEBP conversion. Full file size: 109KB.

The lessons I learned from this are:

WEBP does have some visual loss, even at 85% when it comes to highly saturated color, especially red.
The file size drop is quite dramatic, even over a re-compressed JPEG at a lower quality.
Overall, the look of the WEBP files is impressively smooth.
I noticed (not in the samples, above) that splotchy regions of similar color, but varying value were sometimes flattened to the point that information was clearly being thrown away. At 75% quality this was a substantial change, but at 85% quality, it was hardly noticeable.

I really look forward to this format being available in the majority of browsers, so I can safely start using it!

Thursday, June 10, 2010

Are your passwords safe in MD5 or SHA-1 formats?

I've read, over and over again, various questions and seemingly authoritative statements about the security of various hashing algorithms. I've gotten kind of tired of reading misinformation, so here's some detail that you can trust.

US-CERT of the U. S. Department of Homeland Security said MD5 "should be considered cryptographically broken and unsuitable for further use."
The document that this was stated in is titled, "Vulnerability Note VU#836068: MD5 vulnerable to collision attacks"
A collision attack is where an attacker, given access to a hashed password (or other plain text), crafts a password that yields the same result when hashed. Thus to a password authentication system, the crafted "collision" seems to be the correct password.
If your password hashing scheme does not use a salt, none of this is interesting to you, as you have little or no security to speak of given an attacker (internal or external) who has access to your hashed passwords.

OK, so what does any of this mean to you? Is MD5 secure? Well, not really. It is possible, with moderate hardware investment and access to the hashed password to generate a "skeleton key." No one can "crack" the original password in a reasonable amount of time that I know of or that I've read about, but access to an equivalent password solves many problems for an attacker, even if they can't then take that password and use it against other services (since those services would not be using the same "salt" which prevents the same password hashing the same way on two different sites or services).

The question you have to ask yourself is this: why are you hashing passwords? Is it to protect them, should someone gain access to your systems from the outside? Is it to protect them from those who have access to the data store? In these cases, md5 is at best a weak protection, but it is significantly better than some of the alternatives (DES, etc.) which are breakable in practically no time.

But MD5 is used in many places besides password hashing. Should we stop using it there? Probably not.

For example many backup and data validation tools use MD5 to make sure that data has not been modified (either to initiate a backup/copy or to safeguard against accidental local change). These purposes are still served just as well now as they were when MD5 was introduced, and the fact that MD5 has been proven to have possible collision attacks does not really impact the data integrity aspect of the algorithm. Of course, there are cases where MD5 will identify a block as unchanged when it has, in fact, changed. This is true for all hashing algorithms, but the reason that MD5 was initially considered acceptable for this purpose was that the chance of that happening without malicious intent is astronomically small (that malicious intent was not believed to be as much of a factor then is not interesting to us, now). It would be a bit like dropping a penny down into one of those boxes with water where the goal is to land it on a small platform, and just as you dropped it, an earthquake struck, causing the penny to bounce off the platform, jump back up through the slot and blind you. Just as I don't recommend avoiding such games because of the risk of blindness, I don't think you need to stay away from hashing algorithms (including MD5) in order to avoid missing a data update. If you think someone might be waiting for you to drop the penny so they can set off some dynamite, then you have a different kind of problem, and MD5 might not be the best choice (e.g. if you're performing MD5 checksums in order to verify that a system's software has not been compromised).

Now, that changes as your risk profile changes. There are times, I believe, where it makes sense to take extra precautions. For example, if you're making constant backups of large amounts of rapidly changing data whose integrity in original and backup form has a high risk associated (e.g. medical data), then I might use two hashing algorithms to perform the verification. MD5 might be a fine choice for one of them, but I'd use SHA-1 or something similar on top of it. It's still astoundingly unlikely to be an issue, but there's a time an place for being stupidly extra-certain and if you can afford the extra CPU cycles, why not compute two hashes while you're looking at the data?

What about SHA-1? Hasn't that been broken too? No, SHA-1 has known weaknesses which will likely yield security-impacting attacks in the future, but as of now, these weaknesses have yet to be translated into actual attack vectors. It's certainly worth staying on top of, and keeping a flexible hashing scheme (ala the OpenBSD/LDAP schema) in your application in order to upgrade to SHA-3 when it becomes available and has been thoroughly tested, but for now SHA-1 is an excellent choice for anything short of military/state-secret sorts of crypto-hashing needs.

Bruce Schneier, who is recognized around the world as an authority on cryptographic security, had this to say about the news regarding SHA-1:

They can find collisions in SHA-1 in 2⁶⁹ calculations, about 2,000 times faster than brute force. Right now, that is just on the far edge of feasibility with current technology.
Jon Callas, PGP's CTO, put it best: "It's time to walk, but not run, to the fire exits. You don't see smoke, but the fire alarms have gone off." That's basically what I said last August.

It's time for us all to migrate away from SHA-1.
Most of the hash functions we have, and all the ones in widespread use, are based on the general principles of MD4. Clearly we've learned a lot about hash functions in the past decade, and I think we can start applying that knowledge to create something even more secure.
Hash functions are the least-well-understood cryptographic primitive, and hashing techniques are much less developed than encryption techniques. Regularly there are surprising cryptographic results in hashing ... we still have a lot to learn about hashing.

Sunday, May 23, 2010

Writing a Perl 6 URI module

I wanted to write a parser of some sort using Perl 6's spiffy parser language otherwise known as "rules". This is the super-extended regular expression syntax that Perl 6's own parser is written in, and it's not just powerful, it's easy to use. In fact, it's so easy to use that almost all of my time writing a URI parser module was spent on other aspects of the code than the parser itself.

First off, some background. Perl 6 has a URI module already. However, it relies on a number of Perl built-in character classes to match things like digits and alphanumerics. In reality, the RFCs that define URIs are very precise, and there are different specifications depending on what you need. So, I decided to re-write the module with a pluggable parser so that you could give a regular, modern URI and have it parse correctly, but you could also ask for special "IRI" parsing on an internationalized URI and the right thing would happen there. I even went so far as to bring in an older version of the specification as a legacy mode.

The current state of the Perl 6 parser and runtime called Rakudo is actually fairly solid for a pre-release implementation of such a complex language spec. There are some gaping holes, but they were all relatively easy to work around. Some of these included overly aggressive list-flattening, some operators that were broken at the time I wrote this code and the big one: named rules only work as a stand-alone grammer with a specific entry-point called TOP.

I worked around all of these issues and have, so far, been able to parse basic URIs according to RFC 3986. Here's a sample of what a Perl grammar for URIs looks like:

    token URI {

         ':'  [ '?'  ]? [ '#'  ]?

    }

Here you can see most of the basics: "token" introduces a single expression within the grammar. It calls out to other tokens by enclosing their names in angle-brackets. Literal sequences are enclosed in single-quotes and sub-expressions can be enclosed in square-brackets with regular expression-like repetition counts such as ? for 0 or 1 matches.

In order to have a pluggable interface, I needed a class capable of providing me with two things for each grammar: the grammar itself and a set of routines which would tell me how to find the resulting URI elements in the match data. For this I defined an interface using Perl 6's roles:

  role URI::Specification {

      method parser() { ... }

      method scheme_path() { ... }

# ... other _path methods here...

}

Those ellipses are literal. They cause the methods to be required for any class composed with this role, but do not define any functionality themselves.

Each parser is then defined as:

  class URI::rfc3896 does URI::Specification {

      grammar URI::rfc3896::spec {

          token TOP {  }

          # RFC definition of URI goes here.

}

      method parser() { return ::URI::rfc3896::spec }

      method scheme_path() {

          gather do { take  }

}

      # And so on ...

}

That's it. The only really funky bit here is the gather/take code in the scheme_path. That's the way Perl 6 defines a coroutine-like interface. The paths define how we traverse the match object to find match results. So, for example, the "scheme" (the "http" in "http:/www.example.com/") can only be matched in the URI rule's scheme sub-rule. Some URI elements, however, such as authority (the host name and port - possibly username as well) can be matched multiple ways, so these routines might return multiple lists of subrule names to traverse. I would have simply returned a list of lists, but Perl 6's parameter passing is very complex and currently some of the specification is not yet implemented. Right now, this manifests as overly aggressive list flattening when returning them from a subroutine or method.

This is why I used coroutines to return each of the sub-lists, one call at a time.

I'll continue to post new updates as my URI module nears readiness. For now, it's just awaiting some love on the other parsers, and I think it'll be ready to go.

Tuesday, April 6, 2010

Enforced bad passwords

Long ago, I got sick of sites that restrict me to bad passwords, but today I came across another and it has pushed me to yet again explain why it is that you should never restrict passwords without deeply compelling cause.

Today's offender was Boston Coach, the luxury livery service in that was founded by Boston's Fidelity Investments (the legend states that one day, Fidelity's owner, Ned Johnson wanted a cab and couldn't get one; the next day he had his own fleet of black sedans with smartly dressed drivers, trained to treat their passengers like royalty). Anyway, so I wanted to hire a Coach to take my mother and I to a concert for her birthday (I won't say what birthday; you're welcome, Mom). I had to sign up for an account on their Web site. They committed a few sins in the process:

Unless you really need a pseudonym, don't ask the user to create one. Instead, use the email address for logins.
Never restrict passwords unless you are technologically constrained to do so, and if you are, file a security bug with whatever braindead software it was that forced you to (or consider just dumping it).
Test your UI with and without JavaScript support. This sounds silly, but there are plenty of environments where people aren't allowed to enable unsafe browser features.
Never give the user an error without explaining what it is that they did to get it. Two examples came up, here: the password security policy and the number of occupants per car.

To get back to the heart of my concern: the password. I use PasswordSafe, a program originally written by renowned security expert, Bruce Schneier. It can happily generate a very random, long password. For example:

K1/}"jUCF/byp6( : $1$0ZV5xOu3$iTgccli1bBSykSJxcOrfi.

Notice that this password would be nearly impossible to memorize, but because PasswordSafe stores it in an encrypted file, I just have to remember one, easier to remember password to access all of them. I tried to enter this very password for my account, but don't bother trying to use it... it was rejected. The confusing bit was that the UI informed me that I had not met "minimum password requirements." Wow, if that password doesn't measure up to Boston Coach's minimum requirements, they must spend all day, every day servicing lost password requests!

In reality, what they'd done is refused to accept any password with punctuation (resulting in 31 possible characters being removed from all possible passwords on a typical US keyboard). This is a tragic thing to do to your password security, and a company founded by Fidelity Investments should know better.

Anyway, they should fix their broken software and anyone else that uses such terrible requirements for passwords should get on it ASAP.

Friday, March 26, 2010

Safe browsing and virus removal

Sometimes you have to use Windows. There might be a game you like that only runs there or you might need a Windows-only program for work. Whatever it is that draws you to Windows, you know going in that you run the risk of your system becoming compromised (becoming a "zombie," getting a key-logger or any number of other harmful scenarios). OS/X is starting to feel the heat of increased market share as well, in case Mac fans thought they were somehow immune. In 2009 and now in 2010, Mac/OS + Safari did quite poorly in a challenge to compromise browsers. My brother just recently got some sort of malware that caused him to spam the family with bogus links, and I put together this overview of what to do in response. In case it's useful to others, here you go:

Preventative:

1) Always use Firefox to browse the web (Safari and Chrome are getting there, but currently don't have the suite of helpful and stable plugins that Firefox does) http://www.mozilla.com/en-US/firefox/personal.html
2) Always use the noscript plugin for Firefox http://noscript.net/ and add exceptions with care
3) If you're going to visit a site that might be questionable, use the "Tools -> Start Private Browsing" feature

Doing anything less is roughly equivalent to going on a sex tour of the third world without condoms. That's not a pretty metaphor, but neither is having your system infected with every bot this side of Robbie from Forbidden Planet.

As for cleaning your existing system... it's hard. The best and safest way is to back up your data and then use the re-install/recovery disks that came with the computer. If you want a less drastic approach (that isn't as guaranteed to work), then I suggest one of these resources:

AntiVir removal tool -- Avira, makers of my favorite free antivirus tool
McAfee Virus Removal Tools -- McAfee (about $90)
Symantec Removal Tools -- Symantec removal tools (free?)

I suggest figuring out what you have first. AntiVir, McAfee or Symantec can be used to do a full scan, and should turn this up. If not, try a malware removal tool like Spybot Search and Destroy SpyBot Search and Destroy (but be careful if you do a Web search for it... don't click on ads, and make sure you spelled it correctly).

To keep yourself safe in the future, make sure you have an up-to-date virus scanning tool (AntiVir has a free version that pops up a single ad for their product only, per day, asking you to buy the full version and there are paid programs from Symantec and McAfee). Also, make sure that you run the latest version of your browser (Firefox will auto-update with security fixes, but you should upgrade to the latest major version at least once every 6 months). Don't use IE. but if you really must, make sure it's updated to the very latest version. Microsoft's track record for keeping old browsers secure isn't very good.

Beyond that, consider doing everything that isn't Windows-specific in a virtual machine. You can get an easy-to-use virtual machine manager at http://www.virtualbox.org/wiki/Downloads and then download the install image for Ubuntu Linux and load it up in the virtual machine. This allows you to do things that would otherwise be unsafe in Windows within a safer environment. It's cumbersome, but the security return on your investment is well worth it.

Monday, November 2, 2009

Blackra1n, Cydia and SBSettings

I've just re-jailbroken my iPhone (mostly to fix a problem introduced by upgrading to 3.1 without first uninstalling the jailbreak - never do that), and I ran into an SBSettings problem. It installs fine, but once it's done installing, it doesn't do anything. I swipe across the top of the screen and nothing happens.

If you just used blackra1n to jailbreak, and SBSettings won't work for you, here's what you do (props to this forum for the fix):

Re-install Cydia from the blackra1n interface
Re-install dpkg (this may show up as "Debian Package Manager" or similar) from Cydia
Re-install SBSettings from Cydia
Re-boot the phone

This worked for me, and got my SBSettings install working flawlessly.

Thursday, October 29, 2009

Wikipedia Needs More Trivia

The "... in popular culture" sections and articles on Wikipedia used to be called "Trivia," but were so overwhelmingly lists of pop-culture references that the convention was altered. These days there are articles like Zombies in popular culture which typically start off well, but quickly descend into cobbled-together lists of movies, books, comics and video games that may or may not relate to, or represent the state of the genre being discussed. At one time, I worked on the Lovecraftian horror article, and while I tried to make it an article about the genre and its evolution over time, it was continuously inundated by well-meaning editors who would add lists of genre works to the article.

To combat this, Wikipedia really should embrace both styles, and integrate them more deeply. I'd love to see the template mechanism enhanced so that trivia could be encapsulated as template-like objects. (more...)

Perl 5, Perl 6, Moose, Rakudo and the Future of Perl

I made this comment today on my Google Reader feed with respect to Perl 5, and I thought it bore mentioning here since it's rather a longer topic than a blurb. The context was an announcement of a new project aimed at giving Perl 5 faster Meta-Object Protocol (MOP) support ala Ruby and Perl 6 (and to a somewhat lesser extent, Python, Lisp, and many others).

I'll ignore the back-and-forth. Here's the bottom-line: Perl 5 is a dead-end platform. Not because of Perl 6, but because it's a deep legacy of parts that have out-grown their use. It's not extensible in ways that people want to use it today (e.g. for Moose and MOPs in general), and to bolt on those features is painful in the extreme, and guaranteed to be orders slower than it should be. I'm not saying Perl 6 is The Way. I'm saying that a re-design of Perl 5 is required to extend it into relevance with respect to today's programming environment. If that's called Perl 5.50 or if it's called Perl 6, it doesn't matter. Perl 5 is and should remain a powerful legacy language whose contributions to the state of the art were significant in their day but whose limitations are the right reason to use it only where required today.

As you can see, I'm a fan of what Perl 5 has done. I think it brought some concepts "into the fold" of language design that were downright heretical at the time (grammar ambiguity can yield developer ease, for example). It also promoted regular expressions from library features and the domain of special-purpose languages such as sed or AWK to full-fledged peers in the design of a language.

However, Perl 5 is old and tired. To this day, it lacks such basic features as named subroutine parameters and it still has some deep confusion over types and naming that stem from its original idea that some types were so external that users should manipulate the namespace directly to access them. Why are these archaic features still in the language? Not because the Perl community lacks the skills to fix them, but because the language carries with it a heavy burden of legacy code. Breaking CPAN is such a taboo that maintainers have been forced into resignation over the frustration the community wells up when it or its evil twin, DarkPAN are threatened by the most trivial changes.

So, what are we to do? Moose and its ilk attempt to subvert the legacy code and inject new concepts by force. Sadly, that battle is one of attrition and diminishing returns. Ultimately speed and complexity are major factors in maintaining such a beast. Perl 6 (along with its currently hot implementation, Rakudo) is another path. Re-designed from the ground up, Perl rises from its own ashes with features freshly cast in modern terms. In many ways this is the right solution. Perl 6 is such a radical departure from the traditional design of popular languages that it is almost guaranteed to produce some of the same positive impact that Perl 5 brought us.

However, Perl 6 is fundamentally a new language. There's nothing wrong with implementing a new language that's aimed at the Perl community, but there's a middle ground that I think we need as well. Here are what I see as the design requirements of a new Perl that is still Perl:

Provide basic functionality that Perl programmers want without cumbersome add-ons to implement them (named parameters, MOP, etc.)
Maintain syntactic compatibility with Perl 5 as much as possible to ease transition
Marginalize, deprecate or simply remove features which have hindered the task of implementing secondary implementations of Perl 5 (e.g. on Parrot)

Given a reasonable balance of those conflicting goals, a "Middle Perl" could be a powerful tool for transition to Perl 6 while also providing existing Perl programs a way to grow and adapt to modern needs. One immediately obvious counter-argument is that this will slow Perl 6 development. That seems reasonable, but on the one hand, Perl 6 has been in development for 10 years. Slowing it down isn't really possible. It is possible to steal its thunder and prevent its developer base from expanding at a critical time, so there's some valid concern there, but I think that can be mitigated by making Middle Perl and explicit stepping stone project that aims at ultimate adoption of Perl 6. On the other hand, there's the fact that developers are already working on this project, but only in module-space. They're implementing dozens of modules to extend Perl 5. If those efforts were re-directed to implementing a Middle Perl, then there would be no need for anyone currently working on Perl 6 or considering doing so to change course.

Many times I've heard Perl 6 compared to Python 3. I always point out that this is an unfair comparison because Perl 6 is a radical re-design of the language which requires blazing a new trail through the entire topic of language design and the integration of disparate programming paradigms. What I've suggested here, however, should absolutely be compared to Python 3. A re-evaluation of some core concepts; a re-write of the code; but fundamentally the same language.

Wednesday, July 22, 2009

Getting Google Gears To Work With Firefox 3.5.1

Google Gears works fine with Firefox's most recent security update, 3.5.1, but you actually have to convince it of that as of the current release of Gears (0.5.29.0).

In order to "convince" Gears that it's allowed to run, you need to first install it from the Gears site. This will not run, but it will install the files. Then, find the file called "gears.xpt" under your Firefox user settings directory. On my Linux system, this is under my ".mozilla/firefox" directory a few levels down under some very funky directory names with lots of letters and numbers... that's OK, just find the file using whatever your OS uses to search for files.

Once you find that file, look in the directory one level above it for another file called "install.rdf". Load this file up in the text editor of your choice (notepad, vi, emacs, whatever), and look for the line that sets "maxVersion" to "3.5".

Got it? OK, here's the crazy complicated part... add ".1" to the end of "3.5". That's it. Save the file and restart Firefox. You should be able to bring up the Tools -> Add-ons dialog and verify that Gears is installed.

Tuesday, June 30, 2009

Windows 7 Release-Day Shipping

I'm not a fan of Microsoft Windows, but sadly I play video games that are best suited to running under Windows, and my laptop's Blu-Ray drivers don't play well with Linux, so I'm stuck running Windows in two places. Vista has been a chore, and from what I hear from my Windows-savvy friends, Windows 7 really improves on Vista. Call it "the apology," if you will (side note: I'm still waiting for my apology for Highlander II, which I saw on opening night with stratospheric expectations...)

So, now Amazon is offering something pretty sweet. You can pre-order Windows 7 from them and have it delivered to you on release day. They did this with World of Warcraft: Wrath of the Lich King as well, and I did get that delivered on release day, so I'm signing up for the same deal with Windows 7. The really nice bonus is that, being a Prime member, I get this option for free. No delivery charges!

Thursday, May 21, 2009

CHDK for PowerShot Cameras

A sample of what CHDK can do in a darkened room.
From CHDK

Last night I tried out CHDK for the first time. This is a third-party software suite for the Canon PowerShot line of cameras. It allows all sorts of interesting things that the PowerShots don't allow by default. For example, the picture to the right was taken in a nearly pitch-dark room with an 80-second exposure. The PowerShot can only do 15 seconds out of the box.

But there's far more. CHDK includes tools for multiple histogram modes, "zebra" flashing of blown-out regions, and even a scripting language in which you can write your own code to control the camera! For anyone who wants to be able to carry around a small camera that's capable of taking complex shots, this is the tool that will let you do it.

Monday, May 18, 2009

A Licensing Primer For Gamers

For those who aren't long-time open source software developers, especially gamers who write mods or addons (see my previous article) I'd like to provide a quick Q&A to answer some of the questions (and misconceptions) I've seen come up recently. I want to emphasize that I am not a lawyer, but these topics are fairly general, and I believe I'm fairly safe in providing a basic intro. You should read the licenses you apply to your software before you do so, of course.

Question 1: "Are open source and public domain the same thing?"

Answer: Not at all. Public domain works are works that have had their copyright expire, revoked or were not eligible for copyright protection in the first place. Open source software, on the other hand is copyrighted and that copyright has one or more owners who retain the rights with respect to that software. Just because you've allowed others to copy your work doesn't meant that you don't retain control of it. For example, you're the only one who can offer that work under other licensing terms or transfer the copyright to another party.

Public domain licensing (see below) is not open source licensing, though it has some properties in common. That's important to remember.

Question 2: "But don't you give up your rights when you license something as open source?"

Answer: All licenses (regardless of how proprietary they are or are not) represent compromise between the copyright owner and the licensee, but copyright status of a work doesn't change because it was licensed under open source terms. It only means that you've allowed re-distribution and modification of the work.

Question 3: "Can I change the licensing terms of my open source software?"

Answer: You can if you are the sole copyright holder. However, if your software is made up of code that you wrote and code that others wrote, unless they signed ownership of their changes over to you, you do not have sole ownership of the final result, and any change of license must either be allowed by the licensing terms under which they made their changes or must be agreed to by them.

Question 4: "Can someone modify my program / mod /addon and redistribute it without my name"

Obviously, this depends on the terms of the license, but in the case of all of the open source licenses that I know of, no. One exception is public domain software. There's some controversy as to what constitutes public domain software and if you can, in fact, release your own code to the public domain, but as so many developers claim that their code is PD it should be noted that PD software has no copyright, and thus can be claimed by and controlled by anyone.

Question 5: "Why would I want to open source something I worked so hard on"

That's a question no one can fully answer for you, but there are some advantages: it makes it easier and more attractive for others to contribute to your software; if you ever stop maintaining your software, others can pick up the ball; having an open source project under your belt is a nice resume item; and mostly, it just helps to build a sense of community around your software.

Wednesday, April 29, 2009

Git, BitKeeper and the Power of Open Source

Update April 2012: The comparison page that I reference now just mentions "other SCM", but a side-bar continues to compare their product only to non-distributed, circa 1980s and 1990s offerings.

Back in the mists of 2002, debate ran hot in the Linux development community. The debate was over a proprietary source code management (SCM) tool called BitKeeper that was used as the primary SCM for the Linux Kernel. When a dispute with the vendor resulted in a schism between the Linux developer community and BitKeeper in 2005, the tool was dropped in favor of a replacement written by Linus Torvalds over a one-month period. To understand the importance of this achievement, understand that BitKeeper was written by eight developers over the course of three years and McVoy, its primary architect and original developer estimated that it would cost $12 million to do it again in an ordinary, non-startup company.

Instead, Torvalds sat down behind his keyboard and set out to replace it. How successful was he? If you look at BitKeeper's comparison page with other SCM tools today, you'll notice that it compares itself to many other tools (and makes quite a few rather large errors along the way), but none of them is git. In fact, none of the list are any of the next-generation tools that have followed in git's wake such as Bazzar or Mercurial. Why? Well, git is simpler, easier and better. It also happens to be radically faster. There's no point in comparing yourself with such a tool in public, since it's only going to make you look bad to say that the free tool is radically better.

McVoy also made the claim that a replacement for BitKeeper wouldn't be possible because it was too hard and programmers capable of doing the work wouldn't do it for free. Why is this? Well, it comes down to graph theory and its application to text revisions. Recognizing text differences is hard enough, but to extend that to maintaining a directed acyclic graph of revision histories and branches in a distributed way... well that's downright hard. Sure, it's hard, but then so is writing a POSIX-compatible kernel. The fact that there are now three excellent options out there for distributed source code management that excel at doing just what McVoy said would be impossible to reproduce should go a long way to demonstrating that free and open source software development is one of the most powerful new paradigms of engineering to come along since the invention of the functional specification.

Aaron's Essays