The 3 Levels of Software Duplication

A software project of any non-trivial size can be said to have three organizational levels:

  1. The Modular level - your project might be composed of several modules.
  2. The Structural level - each module is separated into several directories.
  3. The Code level - the lowest level, where the details of your project emerge.

Each of these levels is influenced by several factors: the programming language(s) used, the organization of the software creators (see Conway’s law), and the various memes occupying the programmers’ brains at the moment of creation (ie. OOP, TDD, functional programming, etc.). 

Duplication happens at all of these levels. 

  • Modular: complete libraries or programs are rewritten because of license restrictions, different philosophies, or different programming languages are used.
  • Structural: every time someone spends time deciding what to call a “bin” or “target” directory, or a “src” directory, an angel weeps.
  • Code: No doubt there is a gigaton of duplicated code out there. For example, if you’re a Java developer, how many times have you written a hashCode() and equals() method? (use guava to do this instead)

Currently there are tools out there to help reduce some duplication at two of these levels, Modular and Code. At the Modular and Code levels, you can search for existing libraries on sourceforge, github, koders, krugle, and so on. Also at the Code level, you can run CPD (or something like it) on your code-base to at least reduce some copy-pasted code within your own project.  At the structural level, there are certain frameworks that suggest or enforce directory structure to some extent (grails and maven, for example). There is room for improvement in all of these areas. For example, the idea of function hashes and OSS project hashes.

As software tools mature and evolve, more and more duplication will go away. That is the way of progress. Software developers are hired to solve problems, not to code.  As a software developer myself, I’m still learning this. 

Remember, as your starting a new project, module, or just coding, think: “if something like this has been done before, how could I find it?” There is so much open-source code out there, it’s almost criminal not to take advantage of it.