Chrno: Brian's Ten Rules for Writing Cross Platform 'C' Code

Source: here

Introduction:

I've had a lot of success in my 20 year software engineering career with developing cross platform 'C' and 'C++' code. Most recently, at Backblaze we develop an online backup product where a small desktop component (running on either on Windows or Macintosh) encrypts and then transmits user's files across the internet to our datacenters (running Linux) in San Francisco, California. We use the same 'C' and 'C++' libraries on Windows, Mac, and Linux interchangeably. I estimate it slows down software development by about 5 percent overall to support all three platforms. However, I run into other developers or software managers who mistakenly think cross platform code is difficult, or might double or triple the development schedules. This misconception is based on their bad experiences with badly run porting efforts. So this article is to quickly outline the 10 simple rules I live by to be achieve efficient cross platform code development.

The Target Platforms: 1) Microsoft Windows, 2) Apple Macintosh, and 3) Linux.

The concepts listed here apply to all platforms, but the three most popular platforms on earth right now are Microsoft Windows ("Windows" for short), Apple Macintosh ("Mac" for short), and Linux. At Backblaze, we deliver the user-installed desktop component of our system to desktops running Windows and Mac, and our datacenter all runs Linux. We use the same 'C' and 'C++' libraries on all three platforms interchangeably by following the 10 simple rules below.

One thing I'd like to make clear is that I always believe in using the most popular (most highly supported) compiler and environment on each platform, so on Windows that's Microsoft Visual Studio, on Apple Macintosh that is Xcode, and on Linux it is GCC. It wouldn't be worth it to write cross platform code if you had to use a non-standard tool on a particular platform. Luckily you can always use the standard tools and it works flawlessly.

Why Take the Extra Time and Effort to Implement Cross-Platform?

Money! :-) At Backblaze we run Linux in our datacenter because it's free (among other reasons), and every penny we save in datacenter costs is a penny earned to Backblaze. At the same time, 90+ percent of the world's desktops run Windows, and to sell into that market we need to offer them a Windows product. Finally, virtually all of the remaining desktops run Apple Macintosh, and an increase of 10 percent of revenues to Backblaze is the difference between a "slightly profitable business" and a "massively profitable business".

Another reason to implement cross-platform is that it raises the overall quality of the code. The compilers on each platform differ slightly, and can provide excellent warnings and hints on a section of code that "compiled without error" on another platform but would crash in some cases. The debugger and run-times also differ on each platform, so sometimes a problem that is stumping the programmer in Microsoft Visual Studio on Windows will show its base cause easily and quickly in Xcode on the Macintosh, or vice versa. You also get the benefit of the tools available on all of the target platforms, so if gprof helps the programmer debug a performance issue then it is a quick compiler flag away on Linux, but not readily available on Windows or Xcode.

If the above paragraph makes it sound like you have to become massively proficient in all development environments, let me just say we can usually teach a Windows centric programmer to navigate and build on the Mac or Linux in less than an hour (or vice versa for a Macintosh centric programmer). There isn't any horrendous learning curve here, programmers just check out the source tree on the other platforms and select the "Build All" menu item, or type "make" at the top level. Most of the build problems that occur are immediately obvious and fixed in seconds, like an undefined symbol that is CLEARLY platform specific and just needs a quick source code tweak to work on all platforms.

So onto the 10 rules that make cross platform development this straight-forward:

Rule #1: Simultaneously Develop - Don't "Port" it Later, and DO NOT OUTSOURCE the Effort!!

When an engineer designs a new feature or implements a bug fix, he or she must consider all the target platforms from the beginning, and get it working on all platforms before they consider the feature "done". I estimate that simultaneously developing 'C' code across our three platforms lengthens the development by less than 5 percent overall. But if you developed a Windows application for a full year then tried to "port" it to the Mac it might come close to doubling the development time.

To be clear, by "simultaneously" I mean the design takes all target platforms into account before coding even starts, but then the 'C' code is composed and compiled on one platform first (pick any one platform, which ever is the programmer's favorite or has better tools for the task at hand). Then within a few hours of finishing the code on the first platform it is compiled, tested, touched up, and then finished on all the other platforms by the original software engineer.

There are several reason it is so much more expensive to "Port A Year Later". First of all, while a programmer works on a feature he or she remembers all the design criteria and corner cases. By simultaneously getting the feature working on two platforms the design is done ONCE, and the learning curve is done ONCE. If that same programmer came back a year later to "port" the feature the programmer must re-acquaint themselves with the source code. Certain issues or corner cases are forgotten about then rediscovered during QA.

But the primary reason it is expensive to "Port A Year Later" is that programmers take short cuts, or simply through ignorance or lack of self control don't worry about the other platforms. A concrete example is over-use of the Windows (in)famous registry. The Windows registry is essentially an API to store name-value pairs in a well known location on the file system. It's a perfectly fine system, but it's approximately the same as saving name-value pairs in an XML file to a well known location on the file system. If the original programmer does not care about cross platform and makes the arbitrary decision to write values into the registry, then a year later the "port" must re-implement and test that code from scratch as XML files on the other platforms that do not have a registry. However, if the same original programmer thinks about all target platforms from the beginning and chooses to write an XML file, it will work on all platforms without any changes.

Finally, the WORST thing you can possibly do is outsource the port to another company or organization. The most efficient person to work on any section of code is the original programmer, and the most efficient way to handle any (small) cross-platform issues is right inline in the code. By outsourcing the port you must deal with communication issues, code merges a year later and the resulting code destabilization, misaligned organizational goals, mis-aligned schedules (the original team is charging ahead while the port team is trying to stabilize their port) etc, etc. Outsourcing any coding task is almost always a mistake, outsourcing a platform port is a guaranteed disaster.

Rule #2: Factor Out the GUI into Non-Reusable code - then Develop a Cross-Platform Library for the Underlying Logic

Some engineers think "Cross-Platform" means "least common denominator programs" or possibly "bad port that doesn't embrace the beauty of my favorite platform". Not true! You should NEVER sacrifice a single bit of quality or platform specific beauty! What we're shooting for is the maximum re-use of code WITHOUT sacrificing any of the end user experience. Towards that end, the least re-useable code in most software programs is the GUI. Specifically the buttons, menus, popup dialogs or slide down panes, etc. On Windows the GUI is probably in a ".rc" Windows specific resource file laid out by the Visual Studio dialog editor. On the Mac the GUI is typically stored in an Apple specific ".xib" laid out by Apple's "Interface Builder". It's important to embrace the local tools for editing the GUI and just admit these are going to be re-implemented completely from scratch, possibly even with some layout changes. Luckily, these are also the EASIEST part of most applications and done by dragging and dropping buttons in a GUI builder.

But other than the GUI drawing and layout step that does not share any code, much of the underlying logic CAN be shared. Take Backblaze as a concrete example. Both the Mac and PC have a button that says "Pause Backup" which is intended to temporarily pause any backup that is occurring. On the Mac the "Pause Backup" button lives in an Apple "Pref Pane" under System Preferences. On Windows the button lives in a dialog launched from the Microsoft System Tray. But on BOTH PLATFORMS they call the same line of code -> BzUiUtil::PauseBackup(); The BzUiUtil class and all of the functions in it are shared between the two implementations because those really can be cross platform, both platforms want to stop all the same processes and functions and pause the backup.

Furthermore, if at all possible you should factor the GUI code all the way out into it's own stand-alone process. In Backblaze's case, the GUI process (called "bzbui") reads and writes a few XML files out to the filesystem instructing the rest of the system how to behave, for example which directories to exclude from backup. There is a completely separate process called "bztransmit" that has no GUI (and therefore can be much more cross-platform) which reads the XML configuration files left by the "bzbui" GUI process and knows not to transmit the directories excluded from backup. It turns out having a process with no GUI is a really good feature for a low level service such as online backup, because it allows the backups to continue when the user is logged out (and therefore no GUI is authorized to run).

Finally, notice that this design is really EASY (and solid, and has additional benefits) but only if you think about cross platform from the very beginning of a project. It is much more difficult if we ignore the cross platform design issues at first and allow the code to be peppered randomly with platform specific GUI code.

Rule #3: Use Standard 'C' types, not Platform Specific Types

This seems painfully obvious, but it's one of the most common mistakes that lead to more and more code that is hard to fix later. Let's take a concrete example: Windows offers an additional type not specified in the 'C' language called DWORD which is defined by Microsoft as "typedef unsigned long DWORD". It seems really obvious that using the original 'C' type of "unsigned long" is superior in every way AND it is cross platform, but it's a common mistake for programmers embedded in the Microsoft world to use this platform specific type instead of the standard type.

The reason a programmer might make this mistake is that the return values from a particular operating system call might be in a platform specific type, and the programmer then goes about doing further calculations in general purpose source code using this type. Going further, the programmer might even declare a few more variables of this platform specific type to be consistent. But instead, if the programmer immediately switches over to platform neutral, standard 'C' variables as soon as possible then the code easily stays cross platform.

The most important place to apply this rule is in cross platform library code. You can't even call into a function that takes a DWORD argument on a Macintosh, so the argument should be passed in as an unsigned long.

Rule #4: Use Only Built In #ifdef Compiler Flags, Do Not Invent Your Own

If you do need to implement something platform specific, wrap it in the 100 percent STANDARD #ifdefs. Do not invent your own and then additionally turn them off and on in your own Makefiles or build scripts.

One good example is that Visual Studio has an #ifdef compiler flag called "_WIN32" that is absolutely, 100 percent present and defined all the time in all 'C' code compiled on Windows. There is NO VALID REASON to have your own version of this!! As long as you use the syntax as follows:

#ifdef _WIN32
// Microsoft Windows Specific Calls here
#endif

then the build will always be "correct", regardless of which Makefiles or Visual Studio you use, and regardless of if an engineer copies this section of code to another source file, etc. Again, this rule seems obvious, but many (most?) free libraries available on the internet insist you set many, MANY compiler flags correctly, and if you are just borrowing some of their code it can cause porting difficulties.

Rule #5: Develop a Simple Set of Re-useable, Cross-Platform "Base" Libraries to Hide Per-Platform Code

Let's consider a concrete example at Backblaze, the "BzFile::FileSizeInBytes(const char *fileName)" call which returns how many bytes are contained in a file. On a Windows system this is implemented with a Windows specific GetFileAttributesEx(), and on linux this is implemented as a lstat64(), and on the Macintosh it uses the Mac specific call getattrlist(). So *INSIDE* that particular function is a big #ifdef where the implementations don't share any code. But now all callers all over the Backblaze system can call this one function and know it will be very fast, efficient, and accurate, and that it is sure to work flawlessly on all the platforms.

In practice, it takes just a very short amount of time to wrap common calls and then they are used HUNDREDS of times through out the cross platform code for great advantage. So you must buy off on building up this small, simple set of functionality yourself, and this will slow down initial development just a little bit. Trust me, you'll see it's worth it later.

Rule #6: Use Unicode (specifically UTF-8) for All APIs

I don't want this to become a tutorial on Unicode, so I'll just sum it up for you: Use Unicode, it is absolutely 100 percent supported on all computers and all applications on earth now, and specifically the encoding of Unicode called UTF-8 is "the right answer". Windows XP, Windows Vista, Macintosh OS X, Linux, Java, C#, all major web browsers, all major email programs like Microsoft Outlook, Outlook Express, or Gmail, everything, everywhere, all the time support UTF-8. There's no debate. This web page that you are reading is written in UTF-8, and your web browser is displaying it perfectly, isn't it?

To give Microsoft credit, they went Unicode before most other OS manufacturers did when Windows NT was released in 1993 (so Microsoft has been Unicode for more than 15 years now). However, Microsoft (being early) chose for their 'C' APIs the non-fatal but unfortunate path of using UTF-16 for their 'C' APIs. They have corrected that mistake in their Java and C# APIs and use UTF-8, but since this article is all about 'C' it deserves a quick mention here. For those not that acquainted with Unicode, just understand UTF-8 and UTF-16 are two different encodings of the same identical underlying string, and you can translate any string encoded in one form into a string encoded to the other form in 1 line of code very quickly and efficiently with no outside information, it's a simple algorithmic translation that loses no data.

SOOOO... let's take our example from "Rule #5" above: BzFile::FileSizeInBytes(const char *fileName)". The "fileName" is actually UTF-8 so that we can support file names in all languages such as Japanese (example: C:\tmp\子犬.txt). One of the nice properties of UTF-8 is that it is backward compatible with old US ascii, while fully supporting all international languages such as Japanese. The Macintosh file system API calls and the Linux file system API calls already take UTF-8, so their implementation is trivial. But so that the rest of our system can all speak UTF-8 and so we can write cross-platform code calling BzFile::FileSizeInBytes(const char *fileName)", on Windows the implementation must do a conversion step as follows:

int BzFile::FileSizeInBytes(const char *fileName)
{
#ifdef _WIN32
wchar_t utf16fileNameForMicrosoft[1024]; // an array of wchar_t is UTF-16 in Microsoft land
ConvertUtf8toUtf16(fileName, utf16fileNameForMicrosoft); // convert from Utf8 to Microsoft land!!
GetFileAttributesEx(utf16fileNameForMicrosoft, GetFileExInfoStandard, &fileAttr);
return (win32fileInfo.nFileSizeLow);
#endif
}

The above code is just approximate, and suffers from buffer over-run potentials, and doesn't handle files larger than 2 GB in size, but you get the idea. The most important thing to realize here is that as long as you take UTF-8 into account BEFORE STARTING YOUR ENTIRE PROJECT then supporting multi-platform (and also international characters) will be trivial. However, if you first write the code assuming Microsoft's unfortunate UTF-16, if you wait a year then try to port to the Macintosh and Linux with your code assuming UTF-16 or (heaven forbid) US-ASCII it will be a nightmare.

Rule #7: Don't Use 3rd Party "Application Frameworks" or "Runtime Environments" to make your code "Cross-Platform"

Third party libraries are great - for NEW FUNCTIONALITY you need. For example, I can't recommend OpenSSL highly enough, it's free, redistributable, implements incredibly secure, fast encryption, and we just could not have done better at Backblaze. But this is different than STARTING your project with some library that doesn't add any functionality other than it is supposed to make your programming efforts "Cross-Platform". The whole premise of the article you are reading is that 'C' and 'C++' are THEMSELVES cross platform, you do not need a library or "Application Framework" to make 'C' cross platform. This goes double for the GUI layer (see "Rule #2" above). If you use a so called "cross platform" GUI layer you will probably end up with an ugly and barely functioning GUI.

There are many of these Application Frameworks that will claim to save you time, but in the end they will just limit your application. The learning curve of these will eclipse any benefit they give you. Here are just a few examples: Qt (by TrollTech), ZooLib, GLUI/GLUT, CPLAT, GTK+, JAPI, etc.

In the worst examples, these application frameworks will bloat your application with enormous extra libraries and "runtime environments" and actually cause additional compatibility problems like if their runtimes do not install or update correctly then YOUR application will no longer install or run correctly on one of the platforms.

The astute reader might notice this almost conflicts with "Rule #5: Develop Your Own Base Libraries" but the key is that you really should develop your OWN cross platform base set of libraries, not try to use somebody else's. While this might be against the concept of code re-use, the fact is it take just a SMALL amount of time to wrap the few calls you will actually need yourself. And for this small penalty, it gives you full control over extending your own set of cross platform base library. It's worth doing just to de-mystify the whole thing. It will show you just how easy it really is.

Rule #8: The Raw Source Always Builds on All Platforms -> there isn't a "Script" to Transmogrify it to Compile

The important concept here is that the same exact foo.cpp and foo.h file can be checked out and always build on Windows, Macintosh, and Linux. I am mystified why this isn't universally understood and embraced, but if you compile OpenSSL there is a complex dance where you run a PERL script on the source code, THEN you build it. Don't get me wrong, I really appreciate that OpenSSL has amazing functionality, it is free, and it can be built on all these platforms with a moderate amount of effort. I just don't understand why the OpenSSL authors don't PRE-RUN the PERL script on the source code BEFORE they check it into the tree so that it can be compiled directly!

The whole point of this article is how to write software in a cross platform manner. I believe that the above PERL scripts allow the coders to make platform specific mistakes and have the PERL script hide those mistakes. Just write the source correctly from the beginning and you do not need the PERL script.

Rule #9: All Programmers Must Compile on All Platforms

It only takes about 10 minutes to teach any entry-level programmer how to checkout and build on a new platform they have never used before. You can write the 4 or 5 steps of instructions on the back of a business card and tape it to their monitor. If this entry level programmer wrote his or her new cross platform code according to the rules in this article, it will build flawlessly (or be fixable in a few minutes with very basic changes). THIS IS NOT SOME HUGE BURDEN. And ALL programmers must be responsible for keeping their code building on ALL platforms, or the whole cross platform effort won't go smoothly.

With a small programming team (maybe less than 20 programmers), it is just fine to use the source code control system (like Subversion or CVS) as the synchronization point between the multi-platform builds. By this I mean that once a new feature is implemented and tested on one platform (let's say Windows), the programmer checks it into Subversion, then IMMEDIATELY checks it out on both Macintosh and Linux and compiles it. This means that the build can be broken for one or two minutes while any small oversights are corrected. You might jump to the conclusion that the other programmers will commonly notice this 1 or 2 minute window once a day, but you would be dead wrong. In the last 15 years of using this system in teams ranging from 10 programmers to 50 programmers, I can only remember two incidents where I happened to "catch" the tree in a broken state due to this "cross platform issue window" and it only cost me 5 minutes of productivity to figure out I was just unlucky in timing. During that same 15 years there were probably HUNDREDS of broken builds I discovered that had nothing to do with cross-platform issues at all, or had specifically to do with a stubborn or incompetent programmer that refused to obey this "Rule #9" and compile the code themselves on all platforms. This last point leads me to our final rule below....

Rule #10: Fire The Lazy, Incompetent, or Bad-Attitude Programmers Who Can't Follow These Rules

Sometimes in a cross platform world the build can be broken on one platform, and that's Ok as long as it doesn't happen often. Your very best programmer can get distracted at the critical moment they were checking in code and forget to test the compile on the other platforms. But when one of your programmers is CONSISTENTLY breaking the build, day after day, and always on the same platform, it's time to fire that programmer. Once an organization has stated their goals to develop and deliver cross platform, it is simply unprofessional to monkey up the system by ignoring these rules.

Either the offending programmer is patently incompetent or criminally lazy, and either way it is way more than just "grounds for termination", I claim it is a moral obligation to fire that programmer. That programmer that is giving decent, hard-working programmers a bad name, and it reflects badly on our honorable profession to allow them to continue to produce bad code.

Conclusion: Cross Platform is Easy, but You Must Start from the First Line of Code!

If you don't take anything else away from this article, the most important point is that if you EVER want your software project to run on multiple platforms, then you must START your software project from the very first line of code as a cross-platform project. And if you just maintain the cross-platform philosophy the whole way along, the rest of these rules will come about naturally just as they did for us at Backblaze.

Chrno

About Me

Bookmarks

Previous Posts

My Friends

Drop A Line

Music Collections

Tuesday, April 03, 2012

Brian's Ten Rules for Writing Cross Platform 'C' Code

1 Comments: