Tech.Coursera

building for education. one byte at a time.

Why we’re open sourcing a project before we write the code

(Comments)

Today I’m proud to announce that we’ve open sourced Courier! A Scala library for–

Wait, hold on, let me back up a bit.

The thing is, we’re just starting development on Courier now. There’s nothing for you to download. Nothing for you to try out. No testimonials or success stories.

So why announce it?

We’d like to try a bit of an experiment here at Coursera. Instead of building a project internally and waiting until we think it’s fully polished to open source it, we’re going to “throw it over the wall” before we’ve even gotten going on the coding.

Why open source from onset?

While it is less common for companies to open source projects from onset, it is a standard approach taken by many popular open source projects.

A few reasons that more companies should consider taking the same approach:

Start the learning process for running an open source project earlier.

Running an open source software project is a learning process. And while have a rough understanding of what we want to build and what we will use it for, we stand to learn a great deal from external developers due to diversity of engineering problems they are working to solve, their unique perspectives and their passion for great software.

Deeper level of involvement.

Once a project is polished and in use, it’s more difficult to apply feedback from external developers. By opening the project up for feedback earlier, feedback and contributions from external developers can be more easily incorporated into the initial development.

Less total work than open sourcing internal software already in widespread use.

Extracting an internal project so that it can be open sourced often requires untangling the code from internal dependencies, repackaging it, separating out any company specific business logic and configuration settings, reorganizing the documentation, and, often, writing a bunch more documentation!

Once the software is in widespread use internally, doing all this can be disruptive and can even require painful migrations of internal software. While going through this effort to open source an internal project is often worthwhile, if we instead open source a project from the start, much of it can be avoided entirely.

The project we’re open sourcing

Courier is a modest-sized project. It will generate Scala idiomatic data bindings from schemas.

For example, a schema for a simple record with a single field would look like:

1
2
3
4
5
6
7
8
9
10
11
12
{
  "type": "record",
  "name": "Fortune",
  "namespace": "org.coursera.fortune",
  "doc": "A fortune.",
  "fields": [
    {
      "name": "message",
      "type": "string"
    }
  ]
}

Courier will generate a Scala class that will look roughly like:

1
case class Fortune(val message: String)

An instance of this class will serialize as:

1
2
3
{
  "message": "Today is your lucky day!"
}

Courier will be based on Pegasus, a schema language and library engineered at Linkedin. We prefer the Pegasus schema language because it offers a rich type system that aligns well with ADTs and maps cleanly to type-safe languages like Scala. The Pegasus schema language is nearly identical to the Avro schema language, but improves on in in a few important ways, such as adding direct support for “optional” fields.

The implementation of the generated classes will contain everything they need to be serialized to JSON, Avro binary, or any other compatible data formats that we write codecs for.

Our schemas will serve both as documentation and as the source that we generate language bindings from, not just for Scala, but for all languages where we need bindings, such as Swift for iOS, and Java for Android.

So far, we’ve prototyped the idea, written up some design notes, set up a project skeleton on github, but have yet to start coding.

Are there any risks to open sourcing too early?

Our main criteria for open sourcing Courier was:

  • Make sure we had sufficient engineering time to dedicate to it
  • Organize our project summary and design notes so that external developers have enough material to understand the purpose and scope of the project

With both these in place, we believe the benefits of open sourcing far outweigh any risks.

What’s next?

We’ll start writing code in the next day or two. And we expect to have Courier ready for early adoption in about a month. If Courier is something that sounds useful to you, we’d love to get your feedback!

Have a look around on our github project, checkout our design notes, and let us know what you think on the discussion forum!