Saturday, November 10, 2007

Domain languages

When I told my intern that I was moving to Maps from Gmail, she smiled and said - well, of course, you want to code in C++ rather than in Java/JavaScript which is what the majority of Gmail is written in.

She was not the first to remark on that. Somehow I exhude the aura of disdain for managed languages.

In reality, I use Java and C# quite a bit, and often in an environment where one would never use a language one hates - at home. I have a system that downloads podcasts (mostly NPR radio shows) which is written in C#. Another program that I wrote periodically records radio shows, mostly those not available for download, such as This American Life (yes, I do know it's available on iTunes; I don't use iTunes). It is also written in C#. A family of programs that I use to index web sites and download parts of them that match specific criteria - is written in Java. And I am working on a bunch of arcade games that will be available off my web site - that's written in JavaScript.

Truth be told, nothing that I wrote for pleasure in the last 2-3 years is written in C++.

I do think that languages have their domains of applicability. I doubt many people will argue with that. What may be controversial is where the boundaries of these domains are.

For the purposes of this article I will combine Java and C# into one entity called 'managed languages'. This may offend the purist, but this is a very convenient short cut for the purposes of this article.

Anyway, let's start with what managed languages are NOT good for. I don't think they are good for desktop applications. There are 3 problems.

First, the resources that they require to run, specifically, the startup time and memory footprint. If you ever ran Office Communicator, for example, or ATI control center, you know what I mean. On my (super-powerful) laptop, the latter takes 15 seconds to start. On my (older) desktop, the former made the system unusable for the first MINUTE or so after reboot while it loaded.

Second, it is impossible to create a .NET application that does something more that printing a line of text and uses less than 50MB of RAM. Just check the memory consumption of the aforementioned Communicator. And it's not just .net - my Java indexer is running right now on one of my computers and takes 51MB of RAM. The program is barely 1000 lines of code, and is not super memory intensive - all it does is load text from the web and do a regex match looking for a specific pattern.

Here's a bigger (but much better tuned) program - Eclipse. 100MB and 40 seconds to start on my laptop. By comparison, Visual Studio Professional 2005, a much bigger system, on the same computer - 60MB and 10 seconds to start, with a C++ project loaded.

While it's fine for one program to be written in managed code, imagine an environment where there are say 10-20-30 managed applications and services running simultaneously. How about 10 minutes for such a system to boot? And 2GB of RAM just to do nothing? There actually is an example of just such a system - "Origami" Ultra-Mobile PC from Samsung. This thing takes ~5 minutes to boot, of which about 30 seconds is Windows, and the rest is the shell implemented in .NET framework.

Second, deployment issues. When you are distributing your managed app, you have to distribute .NET framework and/or Java runtime with it. This adds to installation time. It may require reboots. It may conflict with the beta version of the same runtime that has been installed on the user's computer with some other app. Finally, it leaves the distinct taste of "crap" being installed on the user's computer. How many installation programs that do deploy runtime environments clean them up after the program is uninstalled? How many runtime uninstallation programs do clean up after themselves?

Finally, the UI of managed programs does not behave quite like native UI. This is especially true for Java, which looks and behaves like it was written for Unix, but the differences rear their ugly head in WinForms as well, although that environment had made a lot of progress. The difference are subtle - behavior of focus, tabs, z-order, but the net result is - I can tell, and I am sure many users can, too. Except they might not know what is the cause.

Another area where managed languages should not be used is education. Most schools have now switched their curriculums to Java, with disastrous results for the industry. I have been interviewing new grads for the last 10 years, and vast majority of the people coming out of contemporary CS programs does not understand pointers, does not know how to manage memory, and has no concept of implementation efficiency. Here's a recent example - an interviewee has created ArrayList to store elements for quicksort partitioning loop.

Of course all these concepts are required in any languages, including managed, where you can never ignore memory management issues - only postpone them to the "tune" stage of your project, which will become progressively longer the longer you ignore it during the development.

And students coming out of school during mid to late 90s had all these concepts - perhaps not quite on par with industry veterans, but they had enough to get started.

Another reason why Java and C# should never be taught in school is that they are manifestly "trade" languages - they are designed to make the craft part of the programming easy. Unlike C, which exposes the user to details of computer architecture, or Scheme/LISP/ML that teach advanced concepts of computer science. Joel has a wonderful article on this here:

Where do managed languages succeed? If you examine the problems I listed above, the answer becomes obvious - wherever the environment is fully controlled by the developer (i. e. deployment does not matter), wherever it serves single purpose (so the are not multiple applications that run concurrently, so startup time and resources do not matter), and wherever there is no UI. Which is to say, on a server.

Here you really reap the benefit of easier development such as automatic memory management, rich runtimes, more agile development schedules, and fewer bugs without paying the costs (except maybe you have to deploy more servers).

Another area of course is single-purpose applications ("scripts") that do batch processing and rely heavily on existing OS/apps infrastructure. For example, a task that runs at 3am and downloads podcasts. These programs are mostly produced by technical enthusiasts for the purpose of automating routine tasks. Here managed languages are great because of the richness of their runtime, that allows developer to complete big tasks with relatively little effort.

C# is especially good because of the COM interop layer that is superbly done - it allows the access to all the Windows and Office infrastructure that is much easier than form unmanaged languages. In effect, the whole of Windows and Office becomes its runtime. To illustrate this concept, here's a C# program that will read you a Project Gutenberg book aloud:

using System;
using System.Collections.Generic;
using System.Text;
using SpeechLib;
namespace speak {
class Program {
static void Main(string[] args) {
if ((args.Length != 2)
|| ((! args[0].Equals("say"))
&& (!args[0].Equals("read")))) {
System.Console.WriteLine("Usage: speak "
+ "{say \"Sentence\" read file_name}");
SpVoice voice = new SpVoice();
if (args[0].Equals("say")) {
try {
System.IO.StreamReader reader = new
while (reader.Peek() > 0) {
string s = reader.ReadLine();
} catch (Exception e) {
System.Console.WriteLine("Error "
+ e.ToString());

In other words, there are no good or bad languages, there are languages that are good or bad for a particular task.

No comments: