Intro
Coding for some 20 odd years, last 8 in the industry1, having mated or fought (or both), in order of appearance, with 6502A assembler, basic, turbo Pascal, C, C++, Perl, ruby, python, Jscript, and making acquaintance with SQL, C#, Java and PostScript2, while residing in Oric Nova 64, pc286 with turbo button, Atari 512stfm, HP-UX (hp-ux), AIX (as in no pager in man), Solaris (slowaris), Windows, Linux, FreeBSD and True643, getting visited by NCR, Sinix, OpenVMS and Netware, and having strictly professional relationship with SAPDB/MAXDB/SAPDB4, Sybase (Adaptive Server), Informix (IDS), Lotus Domino Server, Oracle, Microsoft SQL, Exchange servers and VMware ESX, my 5 cents ...
1 Elvis has left the building since 2 Everything you can make fractals with deserves a programming language title. Except perhaps for a xerox. 3 Too bad hp bought + sidelined it. You can pull almost any piece of hardware out of it, while OS is running – except for the 0th processor. Plus, points for looking like black refrigerator. 4 Real branding changes
New rules, in the Bill Maher style:
Debate does not always have two sides. At least one side is wrong. I’ve recently stumbled upon older C vs. Pascal forums and could not resist reading. Ignoring the fact that it is known what happened since (C went on to become preferred operating system and system programming language, while Pascal declined into obscurity), one could not have guessed who was right judging solely on the information presented. Arguments, criticisms and the eloquence of proponents, seemed equally spread among sides. Could it have happened the other way around - a plain simple luck combined with known inability to predict a future? No, C got it right (better than to say the Pascal got it wrong), as simple as that. In programming, as in anything else, you may be clever but outright wrong; in programming, as in anything else, getting deep insight by means of listening to other people pros and cons, is next to impossible. Q.E.D. by anecdotic evidence.
Functionality => bugs. Often functionality <= bugs lint, understand c (an static code analysis product), uml, rose, MISRA, coverity, sal, automated tests, source control1 - surely we must be finally getting reasonably bug-free code? Memory management in C/C++ is error prone, so use gc and virtual machines, smart pointers and what not; memory management issues declined, new issues raised.
1 Actually, a great way to find the root cause of problems – like backtracking your steps in the snow using binary search. Or use cleartool annotate.
There must be some correlation between consumers of the language and the language in question. An easy-to-use RAD tool may attract average programmers – since it does not require top expertise, thus the net effect is leveraged. On the other side, language that delivers top performance but is demanding (as assembler is) may attract better programmers, but in smaller numbers, maybe too small to raise a language above certain niche. Then, it should be a point in between these extremes where the first derivative is zero. I guess as the average engineering profile shifted – as it did in the older industries – so did the language of choice. As the saying goes, the programming language is the mirror of the society.
Nothing is cross-platform unless there is only one platform Find java on Netware, or find .NET on anything but m$ (mono, yes). Even decent c++ on ncr. Although Perl usually compiles even on CD-players plus C exists everywhere. Given the way market works, there will be only one platform.
.NET is not serious programming platform yet Count, w/o consulting Google, 5 commercial programs written in it. Then use Google and admit how many of these 2 you knew before? What m$ commercial product is written in some .net language? BTW, this is not to say, that C# is a bad language (a bit cluttered, isn’t it?) – the platform is limited to windows for one, lets somebody else – a framework – handle your optimization needs – good if you are bellow or at average, bad if you aspire to be above the competition. Have noticed how all database engines use own allocation mechanisms; heck, Oracle is a release away from being fully-blown operating system in itself – does not use (or need to use) filesystem, memory management, synchronization primitives or schedule.
Java is not serious programming platform Count, w/o consulting google, 5 commercial programs written in it. Then use google and admit how many of these 3 you knew before? GUI can be written faster than with the MFC, but then again, “a good dove may be faster” than the programmer using mfc. Except for a programmer using X. It is not cross-platform, unless we restrict meaning of multi-platform to Windows, Solaris, HP-UX, AIX and Linux. Use Qt.
Assessing languages only by their syntax/semantics/performance qualities may be flawed In order for your program to run, other programs, like compiler, linker, virtual machine etc. must also be written, ported, maintained etc. It would be interesting to compare total expenses for maintaining/improving Java, .NET, C++ etc. Sun getting sold to Oracle was unrelated to Sun investing in Java - but I don’t believe their java business brought them a lots of income either. Example regarding compiler complexity: C++ templates are not trivial to implement, thus, for a long time, all but trivial template usage cases didn’t work on aix/hp-ux and lesser unices. Example regarding run-time environment complexity: I remember the case when cu had issue with java module interfacing with oracle - came out there were 3 different VM versions on the same host (64-bit AIX 5.3) - two 32-bit and one 64-bit version. Own code may be bug-free, partly thanks to the JVM, but JVM, as any program does, has own issues user cannot fix, only report and hope for a quick response. The point is, it is not your program only that should count in any analysis - it is also everything else that enables that program to run. Deployment is not auto-magical. System libraries, being used by, well - system, are probably a good foothold to rely on. C, being used by the same system as well, is also a safe bet – that it will be properly maintained and of reasonable quality.
IPv what? Remember listening to IPv6 evangelist Rafal Lukawiecki at conferences some 7 years ago. The lecture was entertaining as Rafal is well travelled, jovial and professional; summary was “IPv6 is happening now, and you only have couple of years to adopt”. Or else?
Faulty prediction #1: it is never going to be adopted the way envisioned – at least it should not; instead, cheaper solution emerges, one that allows seamless transition – some servers moving to IPvX-only, while rest of servers and hosts remain IPv4 only, and all can see each other.
Think the way it works now, is, you fix/upgrade your sw/hw (routers) to support IPv6, acquire IPv6 IP, and wait for everybody else (or, at least simple internet hosts majority) to do the same, then you all switch. Or you don’t, but you than have two parallel execution paths. Think the cost is an issue here. But net being regulated by standardization committee, tough luck being accepted with outside solution. Guess m$ better pulls its 'standards, shmartards' once again.
Design by a group a singular failure? C++0x is being designed by a committee, perl 6 being designed by a 'community'.
Faulty prediction #2: both C++0x and perl 6 fail. perl6 may not fail on itself, may only bring down the perl5 we have; when developer focus shifted on 6, that was at perl5 expense. Considering changes from 5 to 6, perl6 is perl as much as C++ is a C.
Example of a long standing annoying perl issues: execute command line program, with redirected stdin/out/err from perl on windows, kill started child and see if readline/<> unblocks – perl side does not close the other pipe end, keeps both handles and is therefore unable to detect a close. This is a small bug, something normal in any product of perl size, but is here since 5.8 (or even before). And starting external programs from perl is by no means a rare scenario. IPv6 support is in the module, not in the perl core. Interfacing using XS is broken on lots of platforms (Linux, Windows and I guess macox/bsd work though).
WSDL, Web services not enterprise WSDL, or Web Service Definition Language is an interface specification, like idl was for RPC. The problem is that generated code is huge (for example, VMware ESX Web Server on Linux – as opposed to Virtual Center which runs on Windows – has a specification that produces more than 1 million loc, which of course failed to compile. VS did better). There are commercial programs (haven’t tried) that claim to be able to generate smaller code, but the programming environment that requires additional, possibly expensive sw from a company with two employees, probably located in Texas – that always works like a charm.
SQL not what SQL wanted to be Humanly readable language for database queries. To be used by programmers and general public alike. … Not. Google got it right. The worst site searches I’ve seen use SQL databases for storage and SQL for querying. Either you get no entries (since "we did not index by x or y" which concerns you a lot), or you get zillion entries, yours being on the 34th page. Does bing use m$ sql servers for storage?
SQL is not compatible among different databases, especially when it comes to resulting performance. Query may be the same, but, depending on the underlying architecture (with the same schema/data), execute in times differing an order of magnitude.
SQL is leaking (see Joel on software for meaning of leaking abstractions) – and unnecessary – abstraction I could live without. Simple queries may be ok (say for reporting purposes), but anything complex, plus operating on large set of data needs to be checked for performance issues and optimized, statement by statement. And I guess reporting info can be nicely picked up using Google's approach. That is why some products (Veritas for one) use separate database for reporting so managerial types may run heavy queries w/o affecting production database.
C - You shoot yourself in the foot. But you do so circling the Mars in the spaceship written in C while communicating with Earth using software written in C. C is an acronym for Gets-Things-Done. Thou shalt not need to worship other gods. Although you may, especially in the userland, especially if it is non-core product component.
List of rules that aren’t:
Goto considered harmful, single function return point, max-40-or-whatever loc per function ... As a point of coding style, this is open to a debate, but as enforced coding style? Same as for max-40-lines per-function, the rule is reasonable except when it isn't. The added meat to accommodate both rules often causes more harm than good - added code is added chance for a bug.
XML for everything Strings must be escaped or otherwise coded – in base64 for example – to avoid misinterpretation of xml syntax elements (such as <>). So, it is not human readable - not text. utf8/ucs-2 handling not ideal (used Pegasus – may be better in newer versions). JSON, for one, is simpler and suffices in most cases. As for using XML to move data over network (e.g. web services), you have to pack/unpack everything once more due to string encoding, so sending 100Mb does not mean the memory throughput is going to be 100Mb. Any finite number divided by two is always twice smaller – defeats the 'today’s machines are fast' argument.
Casting is bad Well, this one actually stands. In any longer living project, any type is going to change, while casts may hide the fact. In any case - avoid if possible, except in wrappers.
Free – program must clean all allocated memory at exit Sane platforms (that does not mean Netware) all have virtual memory, making such endeavor a) redundant, and b) last chance to shoot yourself in the foot.
Illustrations
Simple example suffices if revisited enough times. Also, given enough iterations, any given function will evolve until it handles mail as well.
Start with simple and reasonably well written function: read the given file content to malloc-ed buffer.
void *os_load_file (const char *filename) { struct stat st; ssize_t status; void *buf; int fd = open (filename, O_RDONLY); if (-1 == fd) return NULL;
status = fstat (fd, &st); if (-1 == status) { close (fd); return NULL; }
buf = malloc (st.st_size); if (!buf) { close (fd); return NULL; } status = read (fd, buf, st.st_size); if (-1 == read) { close (fd); free (buf); return NULL; } close (fd); return buf; }
The cleanup code spreads (free/close). This causes copy & paste syndrome even in average-complexity functions, especially with error handling in cleanup code (has close failed?)
How about single-return-point version:
void *os_load_file (const char *filename) { struct stat st; void *buf = NULL; int fd = open (filename, O_RDONLY); if (-1 != fd) { ssize_t status = fstat (fd, &st); if (-1 !== status) { buf = malloc (st.st_size); if (buf) { status = read (fd, buf, st.st_size); } } close (fd); } if (!buf) { /* error handling/reporting here */ }
return buf; }
Smaller, but nesting is noticeable and only gets worse with scale (I've seen code with 13 levels) Most coding styles limit nesting depth which, in combination with single-return-point style requires one to artificially break function. It does, however, bring new meaning to diagonal reading.
Unwinding goto approach
void *os_load_file (const char *filename) { struct stat st; ssize_t status; void *buf = NULL; int fd = open (filename, O_RDONLY); if (-1 == fd) goto end;
status = fstat (fd, &st); if (-1 == status) goto unwind;
buf = malloc (st.st_size); if (!buf) goto unwind; status = read (fd, buf, st.st_size); if (-1 == status) { free (buf); buf = NULL; }
unwind: if (!buf) { /* error handling here */ }
close (fd); /* close error handling here */ return buf; }
More loc than previous case, but scales well with number of calls (there are only 3 calls here - open, stat and read). To illustrate scaling point, consider this example: Exceptions may not (apart from stack unwinding and non-locality) be such a bad thing after all. However, there is much more to error handling than exceptions (coming in the next issue).
status = sql_login (&login); if (0 != status) goto done; status = sql_start_session(&login, &sess); if (0 != status) goto sql_end_session;
status = sql_start_query (&sess, &query); if (0 != status) goto sql_end_session; while (0==(status=sql_fetch(&query)) { .... } sql_end_query: status = sql_end_query (&query); /* error handling */ sql_end_session: status = sql_end_session (&sess); /* error handling */
sql_logout: status = sql_logout (&login); /* error handling */
Points of interest
Integer types – notice int, size_t, ssize_t? It is always a mess, especially when combined with cross-platform code, especially when combined with printfs (I64d vs. ld vs. lld). Newer libcs typically make point of using proper types (ssize_t as oposed to int), plus VS uses own types - so yes, you can typedef each end every relevant case to proper int flavor. My advice is - as you value your life or your reason keep away from the moor – and use int64_t for everything int. Will get lots of warnings when passing larger int when smaller is needed, but that’s why we have wrappers. And yes, you need own printf/scanf code for I64d lunacy – what the hell was wrong with long long and %lld?
The long int is never a right type to use, considering portability and communication between different platforms.
struct stat is a cluter, typedef is a better way (if system include does not already provide one)
read would benefit from retry on EINTR. Granted, in some cases retry should be left to the caller (say reading from the pipe/socket connected to the child process)
string type is char* which is always wrong on windows (unless you use wide-char system API but do conversion to/from utf8 yourself). Should typedef to wchar_t or to the char, depending on the platform.
fstat does not exist on all the platforms (e.g. OpenVMS)
open is not native on, at least, windows – meaning it does not support complete native call (CreateFile) semantics; the share mode may be useful in this case. Should write OS-wrapper for open/CreateFile/etc.
errno and other libc/os error indicators should be stored the moment error is detected as further calls may change them.
NULL – the (!buf) line should, strictly speaking, be (NULL != buf). Or so I was often thought. Which is a point to ignore - the moment someone changes NULL to 0xDEADBEEF will be the moment 99% programs will stop working. This is one of things (NULL being basically a zero) which evolved to be standard by consensus.
Filename may be defined by user's input, thus a user may choose a huge file (near, but not larger than the memory available to the process). That would reserve, and later, during read, use lots of memory causing excessive paging, possibly affecting other programs due to disk IO, while our program will still fail. Why? Chances are loading file is not the last thing the program does, so if we are down to 10% of available memory what else do we have resources to do? Using rule-of-thumb limit is ok, provided limit is visible/documented (#define in commonly known place)
Size may change between stat and read; hence share mode - where available. UNIX does only cooperative locking.
Open may fail on locked file. Retry (where applicable) may be in order. Too bad WaitForSingle object is of no help here (no HANDLE yet in any case) and there is no CreateFileEx with overlapped semantics (event set when open succeeds).
Malloc may not be the right allocation function – reading to page aligned buffer may be faster than to arbitrary address. Also, reading to page-aligned buffer is mandatory when reading from raw disks on some platforms. Also, malloc may succeed even if there is not enough memory due to the overcommit. In that case, read would cause page fault (and coredump). A signal handler may be in order.
Programming is?
Practice and it gets better, unless you are a grossly incompetent. Or, unless you a parent with other interests in life than excelling in what you do for a living.
Schools treat it more like science with NP-completeness and the like. Which you are (if you are programming for living) never, ever, going to need. O-notation - sure, you need to know the scalability. Calculate stability - rarely, unless you are in the finite element/difference business. There are no final solutions (code is left, never perfected), theoretical breakthroughs, if any, either happened long time ago, or fail into category of "someone stumbled, after many failed attempts upon a clever notion how to code this or that", and invented good-sounding name - say versioned pointers (have this in the reading queue for almost a month already). For example, OO is not a theoretical breakthrough - it is usable common approach. There was a lot of prior art before someone consolidated practice into a term, wrote a book, and started world-wide craze that everything must be an object. Refactoring c++ code is major pita - it does not age well. But you have nice UML models to endlessly play with while looking smart and delivering nothing. Notice how C++ projects tend to spend disproportional amount of time (thus eating implementation time) at the beginning, trying to model life, universe and everything? Theory plays minor role.
Unlike art, it is a mundane endeavor. Also, there is a clear concept of working (good) and non-working (not good). It results in (although mainly sub-par) action in the real life - its objects are not people. A painting (save for the magical one) does not backup your data to safe location.
Skill – I would set for a skill. More a set of skills/capabilities: bit of mathematics (you can calculate average disk queue given only #reads and #writes), abstract reasoning and lots of common sense. Since being practical about it bits a shit out of other approaches, it is irrelevant what programming definition is.
What I intended to write was a set of approaches that helped (me) produce code in less time, reliable, reasonably efficient and usable. But, at the end, it is always you that helps you write better (or worse). Approaches and procedures are for tombstones, to quote MrBig.
However, you either write code, or you don't. In the later case, your opinion does not account for much, now, does it?
I’ll skip the last pillar, security, as it is, in my mind, associated with paranoia of 00s. These are in the order of importance, although each one gets a bit into the previous one territory:
Functionality - you sell a product, therefore, you need a product. Not in 10 years, but by yesterday. Therefore, it is better to deliver not optimal – but acceptable – product, than to not deliver the perfect product (perfect even in the sense of “significantly better”). No functionality - no Buck Rogers.
Reliability – program reliable 90% of the time is as good as no program. Plus you get all the bad customer childhood memories even when you finally correct problems.
Performance – once magic barrier in slowness is broken, product is perceived as not working at all. Above the barrier, you may be rejected, but for time being only - later faster version may attract the same people.
Usability – program works but a puny human responsible to administer it cannot follow your often insane demands. Example: requiring registry and configuration file editing, especially in multi-host environment when program itself has no provisions to ease such a task, requiring a reboot so changes may come in the affect etc.
Intro
Coding for some 20 odd years, last 8 in the industry1, having mated or fought (or both), in order of appearance, with 6502A assembler, basic, turbo Pascal, C, C++, Perl, ruby, python, Jscript, and making acquaintance with SQL, C#, Java and PostScript2, while residing in Oric Nova 64, pc286 with turbo button, Atari 512stfm, HP-UX (hp-ux), AIX (as in no pager in man), Solaris (slowaris), Windows, Linux, FreeBSD and True643, getting visited by NCR, Sinix, OpenVMS and Netware, and having strictly professional relationship with SAPDB/MAXDB/SAPDB4, Sybase (Adaptive Server), Informix (IDS), Lotus Domino Server, Oracle, Microsoft SQL, Exchange servers and VMware ESX, my 5 cents ...
1 Elvis has left the building since 2 Everything you can make fractals with deserves a programming language title. Except perhaps for a xerox. 3 Too bad hp bought + sidelined it. You can pull almost any piece of hardware out of it, while OS is running – except for the 0th processor. Plus, points for looking like black refrigerator. 4 Real branding changes
New rules, in the Bill Maher style:
Debate does not have two equal sides. One side is wrong. Or both are. I’ve recently stumbled upon older C vs. Pascal forums; couldn't resist but to read, having written CAD program in Pascal years ago (and fancied myself wrongly, at the time, to be "in the know"). Ignoring the fact that I already knew what happened later to both sides (C went on to become preferred operating system and system programming language, while Pascal declined into obscurity), one could not have guessed who was right judging solely on the information presented in these debates. Arguments for own side, criticisms of the opponent and the eloquence of proponents, seemed equally spread among sides. Thinking of contemporary duels (Java vs. .NET for one) in that context makes me smile. Q.E.D. by anecdotic evidence.
Functionality => bugs. Often functionality <= bugs lint, understand c (an static code analysis product), uml, rose, MISRA, coverity, sal, automated tests, source control1 - surely we must be finally getting reasonably bug-free code? Memory management in C/C++ is error prone, so use gc and virtual machines, smart pointers and what not; memory management issues declined, new issues raised.
1 Actually, a great way to find the root cause of problems – like backtracking your steps in the snow using binary search. Or use cleartool annotate.
There must be some correlation between consumers of the language and the language in question. An easy-to-use RAD tool may attract average programmers – since it does not require top expertise, thus the net effect is leveraged. On the other side, language that delivers top performance but is demanding (as assembler is) may attract better programmers, but in smaller numbers, maybe too small to raise a language above certain niche. Then, it should be a point in between these extremes where the first derivative is zero. I guess as the average engineering profile shifted – as it did in the older industries – so did the language of choice. As the saying goes, the programming language is the mirror of the society.
Nothing is cross-platform unless there is only one platform Find java on Netware, or find .NET on anything but m$ (mono, yes). Even decent c++ on ncr. Although Perl usually compiles even on CD-players plus C exists everywhere. Given the way market works, there will be only one platform.
.NET is not serious programming platform yet Count, w/o consulting Google, 5 commercial programs written in it. Then use Google and admit how many of these 2 you knew before? What m$ commercial product is written in some .net language? BTW, this is not to say, that C# is a bad language (a bit cluttered, isn’t it?) – the platform is limited to windows for one, lets somebody else – a framework – handle your optimization needs – good if you are bellow or at average, bad if you aspire to be above the competition. Have noticed how all database engines use own allocation mechanisms; heck, Oracle is a release away from being fully-blown operating system in itself – does not use (or need to use) filesystem, memory management, synchronization primitives or schedule.
Java is not serious programming platform Count, w/o consulting google, 5 commercial programs written in it. Then use google and admit how many of these 3 you knew before? GUI can be written faster than with the MFC, but then again, “a good dove may be faster” than the programmer using mfc. Except for a programmer using X. It is not cross-platform, unless we restrict meaning of multi-platform to Windows, Solaris, HP-UX, AIX and Linux. Use Qt.
Assessing languages only by their syntax/semantics/performance qualities may be flawed In order for your program to run, other programs, like compiler, linker, virtual machine etc. must also be written, ported, maintained etc. It would be interesting to compare total expenses for maintaining/improving Java, .NET, C++ etc. Sun getting sold to Oracle was unrelated to Sun investing in Java - but I don’t believe their java business brought them a lots of income either. Example regarding compiler complexity: C++ templates are not trivial to implement, thus, for a long time, all but trivial template usage cases didn’t work on aix/hp-ux and lesser unices. Example regarding run-time environment complexity: I remember the case when cu had issue with java module interfacing with oracle - came out there were 3 different VM versions on the same host (64-bit AIX 5.3) - two 32-bit and one 64-bit version. Own code may be bug-free, partly thanks to the JVM, but JVM, as any program does, has own issues user cannot fix, only report and hope for a quick response. The point is, it is not your program only that should count in any analysis - it is also everything else that enables that program to run. Deployment is not auto-magical. System libraries, being used by, well - system, are probably a good foothold to rely on. C, being used by the same system as well, is also a safe bet – that it will be properly maintained and of reasonable quality.
IPv what? Remember listening to IPv6 evangelist Rafal Lukawiecki at conferences some 7 years ago. The lecture was entertaining as Rafal is well travelled, jovial and professional; summary was “IPv6 is happening now, and you only have couple of years to adopt”. Or else?
Faulty prediction #1: it is never going to be adopted the way envisioned – at least it should not; instead, cheaper solution emerges, one that allows seamless transition – some servers moving to IPvX-only, while rest of servers and hosts remain IPv4 only, and all can see each other.
Think the way it works now, is, you fix/upgrade your sw/hw (routers) to support IPv6, acquire IPv6 IP, and wait for everybody else (or, at least simple internet hosts majority) to do the same, then you all switch. Or you don’t, but you than have two parallel execution paths. Think the cost is an issue here. But net being regulated by standardization committee, tough luck being accepted with outside solution. Guess m$ better pulls its 'standards, shmartards' once again.
Design by a group a singular failure? C++0x is being designed by a committee, perl 6 being designed by a 'community'.
Faulty prediction #2: both C++0x and perl 6 fail. perl6 may not fail on itself, may only bring down the perl5 we have; when developer focus shifted on 6, that was at perl5 expense. Considering changes from 5 to 6, perl6 is perl as much as C++ is a C.
Example of a long standing annoying perl issues: execute command line program, with redirected stdin/out/err from perl on windows, kill started child and see if readline/<> unblocks – perl side does not close the other pipe end, keeps both handles and is therefore unable to detect a close. This is a small bug, something normal in any product of perl size, but is here since 5.8 (or even before). And starting external programs from perl is by no means a rare scenario. IPv6 support is in the module, not in the perl core. Interfacing using XS is broken on lots of platforms (Linux, Windows and I guess macox/bsd work though).
WSDL, Web services not enterprise WSDL, or Web Service Definition Language is an interface specification, like idl was for RPC. The problem is that generated code is huge (for example, VMware ESX Web Server on Linux – as opposed to Virtual Center which runs on Windows – has a specification that produces more than 1 million loc, which of course failed to compile. VS did better). There are commercial programs (haven’t tried) that claim to be able to generate smaller code, but the programming environment that requires additional, possibly expensive sw from a company with two employees, probably located in Texas – that always works like a charm.
SQL not what SQL wanted to be Humanly readable language for database queries. To be used by programmers and general public alike. … Not. Google got it right. The worst site searches I’ve seen use SQL databases for storage and SQL for querying. Either you get no entries (since "we did not index by x or y" which concerns you a lot), or you get zillion entries, yours being on the 34th page. Does bing use m$ sql servers for storage?
SQL is not compatible among different databases, especially when it comes to resulting performance. Query may be the same, but, depending on the underlying architecture (with the same schema/data), execute in times differing an order of magnitude.
SQL is leaking (see Joel on software for meaning of leaking abstractions) – and unnecessary – abstraction I could live without. Simple queries may be ok (say for reporting purposes), but anything complex, plus operating on large set of data needs to be checked for performance issues and optimized, statement by statement. And I guess reporting info can be nicely picked up using Google's approach. That is why some products (Veritas for one) use separate database for reporting so managerial types may run heavy queries w/o affecting production database.
C - You shoot yourself in the foot. But you do so circling the Mars in the spaceship written in C while communicating with Earth using software written in C. C is an acronym for Gets-Things-Done. Thou shalt not need to worship other gods. Although you may, especially in the userland, especially if it is non-core product component.
List of rules that aren’t:
goto considered harmful, single function return point, max-40-or-whatever loc per function ... As a point of coding style, this is open to a debate, but as enforced coding style? Same as for max-40-lines per-function, the rule is reasonable except when it isn't. The added meat to accommodate both rules often causes more harm than good - added code is added chance for a bug.
XML for everything Strings must be escaped or otherwise coded – in base64 for example – to avoid misinterpretation of xml syntax elements (such as <>). So, it is not human readable - not text. utf8/ucs-2 handling not ideal (used Pegasus – may be better in newer versions). JSON, for one, is simpler and suffices in most cases. As for using XML to move data over network (e.g. web services), you have to pack/unpack everything once more due to string encoding, so sending 100Mb does not mean the memory throughput is going to be 100Mb. Any finite number divided by two is always twice smaller – defeats the 'today’s machines are fast' argument.
Casting is bad Well, this one actually stands. In any longer living project, any type is going to change, while casts may hide the fact. In any case - avoid if possible, except in wrappers.
Free – program must clean all allocated memory at exit Sane platforms (that does not mean Netware) all have virtual memory, making such endeavor a) redundant, and b) last chance to shoot yourself in the foot.
Illustrations
Simple example suffices if revisited enough times. Also, given enough iterations, any given function will evolve until it handles mail as well.
Start with simple and reasonably well written function: read the given file content to malloc-ed buffer. void *os_load_file (const char *filename) { struct stat st; ssize_t status; void *buf; int fd = open (filename, O_RDONLY); if (-1 == fd) return NULL;
status = fstat (fd, &st); if (-1 == status) { close (fd); return NULL; }
buf = malloc (st.st_size); if (!buf) { close (fd); return NULL; } status = read (fd, buf, st.st_size); if (-1 == read) { close (fd); free (buf); return NULL; } close (fd); return buf; }
The cleanup code spreads (free/close). This causes copy & paste syndrome even in average-complexity functions, especially with error handling in cleanup code (has close failed?)
How about single-return-point version:
void *os_load_file (const char *filename) { struct stat st; void *buf = NULL; int fd = open (filename, O_RDONLY); if (-1 != fd) { ssize_t status = fstat (fd, &st); if (-1 !== status) { buf = malloc (st.st_size); if (buf) { status = read (fd, buf, st.st_size); } } close (fd); } if (!buf) { /* error handling/reporting here */ }
return buf; }
Smaller, but nesting is noticeable and only gets worse with scale (I've seen code with 13 levels) Most coding styles limit nesting depth which, in combination with single-return-point style requires one to artificially break function. It does, however, bring new meaning to diagonal reading.
Unwinding goto approach void *os_load_file (const char *filename) { struct stat st; ssize_t status; void *buf = NULL; int fd = open (filename, O_RDONLY); if (-1 == fd) goto end;
status = fstat (fd, &st); if (-1 == status) goto unwind;
buf = malloc (st.st_size); if (!buf) goto unwind; status = read (fd, buf, st.st_size); if (-1 == status) { free (buf); buf = NULL; }
unwind: if (!buf) { /* error handling here */ }
close (fd); /* close error handling here */ return buf; }
More loc than previous case, but scales well with number of calls (there are only 3 calls here - open, stat and read). To illustrate scaling point, consider this example:
status = sql_login (&login); if (0 != status) goto done; status = sql_start_session(&login, &sess); if (0 != status) goto sql_end_session;
status = sql_start_query (&sess, &query); if (0 != status) goto sql_end_session; while (0==(status=sql_fetch(&query)) { .... } sql_end_query: status = sql_end_query (&query); /* error handling */ sql_end_session: status = sql_end_session (&sess); /* error handling */
sql_logout: status = sql_logout (&login); /* error handling */
Exceptions may not (apart from stack unwinding and non-locality) be such a bad thing after all. However, there is much more to error handling than exceptions (coming in the next issue).
Points of interest
Integer types – notice int, size_t, ssize_t? It is always a mess, especially when combined with cross-platform code, especially when combined with printfs (I64d vs. ld vs. lld). Newer libcs typically make point of using proper types (ssize_t as oposed to int), plus VS uses own types - so yes, you can typedef each end every relevant case to proper int flavor. My advice is - as you value your life or your reason keep away from the moor – and use int64_t for everything int. Will get lots of warnings when passing larger int when smaller is needed, but that’s why we have wrappers. And yes, you need own printf/scanf code for I64d lunacy – what the hell was wrong with long long and %lld?
The long int is never a right type to use, considering portability and communication between different platforms.
struct stat is a cluter, typedef is a better way (if system include does not already provide one)
read would benefit from retry on EINTR. Granted, in some cases retry should be left to the caller (say reading from the pipe/socket connected to the child process)
string type is char* which is always wrong on windows (unless you use wide-char system API but do conversion to/from utf8 yourself). Should typedef to wchar_t or to the char, depending on the platform.
fstat does not exist on all the platforms (e.g. OpenVMS)
open is not native on, at least, windows – meaning it does not support complete native call (CreateFile) semantics; the share mode may be useful in this case. Should write OS-wrapper for open/CreateFile/etc.
errno and other libc/os error indicators should be stored the moment error is detected as further calls may change them.
NULL – the (!buf) line should, strictly speaking, be (NULL != buf). Or so I was often thought. Which is a point to ignore - the moment someone changes NULL to 0xDEADBEEF will be the moment 99% programs will stop working. This is one of things (NULL being basically a zero) which evolved to be standard by consensus.
Filename may be defined by user's input, thus a user may choose a huge file (near, but not larger than the memory available to the process). That would reserve, and later, during read, use lots of memory causing excessive paging, possibly affecting other programs due to disk IO, while our program will still fail. Why? Chances are loading file is not the last thing the program does, so if we are down to 10% of available memory what else do we have resources to do? Using rule-of-thumb limit is ok, provided limit is visible/documented (#define in commonly known place)
Size may change between stat and read; hence share mode - where available. UNIX does only cooperative locking.
Open may fail on locked file. Retry (where applicable) may be in order. Too bad WaitForSingle object is of no help here (no HANDLE yet in any case) and there is no CreateFileEx with overlapped semantics (event set when open succeeds).
Malloc may not be the right allocation function – reading to page aligned buffer may be faster than to arbitrary address. Also, reading to page-aligned buffer is mandatory when reading from raw disks on some platforms. Also, malloc may succeed even if there is not enough memory due to the overcommit. In that case, read would cause page fault (and coredump). A signal handler may be in order.
Programming is?
Practice and it gets better, unless you are a grossly incompetent. Or, unless you a parent with other interests in life than excelling in what you do for a living.
Schools treat it more like science with NP-completeness and the like. Which you are (if you are programming for living) never, ever, going to need. O-notation - sure, you need to know the scalability. Calculate stability - rarely, unless you are in the finite element/difference business. There are no final solutions (code is left, never perfected), theoretical breakthroughs, if any, either happened long time ago, or fail into category of "someone stumbled, after many failed attempts upon a clever notion how to code this or that", and invented good-sounding name - say versioned pointers (have this in the reading queue for almost a month already). For example, OO is not a theoretical breakthrough - it is usable common approach. There was a lot of prior art before someone consolidated practice into a term, wrote a book, and started world-wide craze that everything must be an object. Refactoring c++ code is major pita - it does not age well. But you have nice UML models to endlessly play with while looking smart and delivering nothing. Notice how C++ projects tend to spend disproportional amount of time (thus eating implementation time) at the beginning, trying to model life, universe and everything? Theory plays minor role.
Unlike art, it is a mundane endeavor. Also, there is a clear concept of working (good) and non-working (not good). It results in (although mainly sub-par) action in the real life - its objects are not people. A painting (save for the magical one) does not backup your data to safe location.
Skill – I would set for a skill. More a set of skills/capabilities: bit of mathematics (you can calculate average disk queue given only #reads and #writes), abstract reasoning and lots of common sense. Since being practical about it bits a shit out of other approaches, it is irrelevant what programming definition is.
What I intended to write was a set of approaches that helped (me) produce code in less time, reliable, reasonably efficient and usable. But, at the end, it is always you that helps you write better (or worse). Approaches and procedures are for tombstones, to quote MrBig.
However, you either write code, or you don't. In the later case, your opinion does not account for much, now, does it?
I’ll skip the last pillar, security, as it is, in my mind, associated with paranoia of 00s. These are in the order of importance, although each one gets a bit into the previous one territory:
Functionality - you sell a product, therefore, you need a product. Not in 10 years, but by yesterday. Therefore, it is better to deliver not optimal – but acceptable – product, than to not deliver the perfect product (perfect even in the sense of “significantly better”). No functionality - no Buck Rogers.
Reliability – program reliable 90% of the time is as good as no program. Plus you get all the bad customer childhood memories even when you finally correct problems.
Performance – once magic barrier in slowness is broken, product is perceived as not working at all. Above the barrier, you may be rejected, but for time being only - later faster version may attract the same people.
Usability – program works but a puny human responsible to administer it cannot follow your often insane demands. Example: requiring registry and configuration file editing, especially in multi-host environment when program itself has no provisions to ease such a task, requiring a reboot so changes may come in the affect etc.
|