You wrote a good software, sold it on the Internet and expect to make a good fortune from it. But you knew that people can (and will) tinker with your software and distribute it illegally. And sooner or later your software will end up in warez sites no matter how hard you worked on that copy protection scheme. And we all know there’s pretty much nothing we can do about that.
But what if you are able to determine who has been distributing your software illegally? Maybe you will be able to terminate the contract, stopping him from getting future updates, and thus hindering his ability to illegally distribute future version of your software. Or maybe you can even sue him.
Basically we can determine who the culprit is by distributing a slightly different version of your software to each of your customer, with the difference being the embedded customer id or similar information. If it somehow ends up in warez sites, then you should be able to download and analyze it in order to determine who the licensee is. That is, if they were unable to break the steganography, and they will only find out after distributing it, not before.
The art of steganography is how you conceal the information so that the licensee wouldn’t know how to decipher it or worse, impersonate another licensee.
Lets consider this following C code snippet:
e = sprintf("Hello world. This is %s, and you are %s.", foo, bar);
Adding an extra space after the last period might not change the output of the program much:
out = sprintf("Hello world. This is %s, and you are %s. ", foo, bar);
It would be almost invisible to the unsuspecting casual users. But to us, that’s one bit of information!
Beginner steganographers might be tempted to overuse whitespaces to conceal information in ‘visible source software’ such as software created in ASP or PHP. For example:
$foo=& new Bar($blah);
can be turned into the following without changing what it does:
$foo =& new Bar($blah);
Notice the extra space before =& and after semicolon.
It is all too easy and there’s plenty of places too. But if done this way, someone will be able to feed the code into a lexical analyzer and strip all steganographic information. The best way to conceal information in my opinion is in constants (strings) and keywords (variable names, function names, etc), just like when doing it within compiled applications. However, it should be noted that there are software already capable of doing this today.
Other things we need to consider is checksumming, so we would be able to determine if the steganographic information is still intact and has not been tampered with. Some redundancy will also help greatly.
Steganography might be less effective with very popular software where people would have access to many copies (or not so copies) of software in question. They could simply compare each version to determine where the bits are. But it might be very effective in case with web applications (has lots of users, but only a few persons have access to the software itself) and PDA applications (a user tipically don’t share their device with another PDA user). An appz cracker would need to obtain extra licenses to be able to determine where the steganography bits are, and that’s still not 100% sure.
Of course distributing software this way also has its drawbacks:
- More processing power needed. But CPU cycles are getting cheap and cheaper today. We can compile a moderately sized C file within seconds.
- Can’t distribute your software using mirror sites, all downloads must be done through a specific site.
- Almost impossible to distribute patch for updates.
I’ve yet to find software that does exactly what I’ve write above, but there are several that do their operation on text files:
- stego: store steganographic bits by justifying a text file.
- spammimic: turn a short message into spam and turn the spam back into the original message.
- StegoMagic: Hides any kind of file or message in TEXT, WAV , BMP 24 bit and BMP 256 colour files.
- snow: conceal messages in ASCII text by appending whitespace to the end of lines.
Most of the effort in steganography arena goes into image steganography, also known as watermarking. But as you can see, I’m more interested in steganography with text and document files, and particularly with application source code.