Steganography for Controlling Software Distribution

You wrote a good software, sold it on the Internet and expect to make a good fortune from it. But you knew that people can (and will) tinker with your software and distribute it illegally. And sooner or later your software will end up in warez sites no matter how hard you worked on that copy protection scheme. And we all know there’s pretty much nothing we can do about that.

But what if you are able to determine who has been distributing your software illegally? Maybe you will be able to terminate the contract, stopping him from getting future updates, and thus hindering his ability to illegally distribute future version of your software. Or maybe you can even sue him.

Enter [steganography](http://en.wikipedia.org/wiki/Steganography).

Basically we can determine who the culprit is by distributing a slightly different version of your software to each of your customer, with the difference being the embedded customer id or similar information. If it somehow ends up in warez sites, then you should be able to download and analyze it in order to determine who the licensee is. That is, if they were unable to break the steganography, and they will only find out after distributing it, not before.

The art of steganography is how you conceal the information so that the licensee wouldn’t know how to decipher it or worse, impersonate another licensee.

Lets consider this following C code snippet:


e = sprintf("Hello world. This is %s, and you are %s.", foo, bar);

Adding an extra space after the last period might not change the output of the program much:


out = sprintf("Hello world. This is %s, and you are %s. ", foo, bar);

It would be almost invisible to the unsuspecting casual users. But to us, that’s one [bit](http://en.wikipedia.org/wiki/Bit) of information!

Beginner steganographers might be tempted to overuse whitespaces to conceal information in ‘visible source software’ such as software created in ASP or PHP. For example:


$foo=& new Bar($blah);

can be turned into the following without changing what it does:


$foo =& new Bar($blah); 

Notice the extra space before =& and after semicolon.

It is all too easy and there’s plenty of places too. But if done this way, someone will be able to feed the code into a [lexical analyzer](http://en.wikipedia.org/wiki/Lexical_analyzer) and strip all steganographic information. The best way to conceal information in my opinion is in constants (strings) and keywords (variable names, function names, etc), just like when doing it within compiled applications. However, it should be noted that there are software already capable of doing this today.

Other things we need to consider is [checksumming](http://en.wikipedia.org/wiki/Checksum), so we would be able to determine if the steganographic information is still intact and has not been tampered with. Some redundancy will also help greatly.

Steganography might be less effective with very popular software where people would have access to many copies (or not so copies) of software in question. They could simply compare each version to determine where the bits are. But it might be very effective in case with web applications (has lots of users, but only a few persons have access to the software itself) and PDA applications (a user tipically don’t share their device with another PDA user). An appz cracker would need to obtain extra licenses to be able to determine where the steganography bits are, and that’s still not 100% sure.

Of course distributing software this way also has its drawbacks:

* More processing power needed. But CPU cycles are getting cheap and cheaper today. We can compile a moderately sized C file within seconds.
* Can’t distribute your software using mirror sites, all downloads must be done through a specific site.
* Almost impossible to distribute patch for updates.

I’ve yet to find software that does exactly what I’ve write above, but there are several that do their operation on text files:

* [stego](http://francyzone.onlywebs.com/stego.php): store steganographic bits by justifying a text file.
* [spammimic](http://www.spammimic.com/): turn a short message into spam and turn the spam back into the original message.
* [StegoMagic](http://www.programmersheaven.com/download/38361/download.aspx): Hides any kind of file or message in TEXT, WAV , BMP 24 bit and BMP 256 colour files.
* [snow](http://www.darkside.com.au/snow/): conceal messages in ASCII text by appending whitespace to the end of lines.

Most of the effort in steganography arena goes into image steganography, also known as [watermarking](http://en.wikipedia.org/wiki/Digital_watermarking). But as you can see, I’m more interested in steganography with text and document files, and particularly with application source code.

64 comments

  1. Believe me, there are a lot more good people who bought a software compared to bad people who download it from warez. That’s my personal experience.

  2. Bagus artikelnya mas, tapi klo kebanyakan sih rata2 yg pake stegano itu kalangan photographer buat di embed di photonya atau penulis buat di embed sebagai secret words walaupun di format audio juga ada. Yang pasti alesannya mereka ga mau karya mereka didistribusikan secara tidak legal karena stegano sifatnya tersembunyi (kecuali emank niat dicek ama orangnya). Tapi intinya sih jarang banget kalo software pake stegano kecuali emank yg didistribusi secara central … contoh …. iTunes Music Shop, ato DRM Based copy protection (Video / Audio). Tapi bener kata bung dani … yaitu tetap aja ada orang kreatif (dalam arti nakal yah) :D

  3. kalau ada yang download dari warez, berarti softwarenya bagus, anggap aja sedekah. :) justru yang susah itu, bikin software supaya ada di warez. artinya yang susah itu bikin software yang bagus. :)

  4. Wow..That is what mean by Steganography? I have to search the meaning for my assigment..he he thanks…

    tak pikir dulu dosenku salah nulis stenografi (kayak yang dipake wartawan)…

    wah..wah…ancaman bagi para cracker nih….(tetapi semakin pandai polisi, semakin pandai pencuri, dan karena itu dunia ini jadi menarik)

  5. Pri, I got the impression that steganography is only applicable for applications that isn’t widely distributed (since we have to have a different version of code for each different copy) — probably a B2B software.

    But how about software that is distributed widely (say, MS Office)?

  6. @-)@-)…lama nyerna tulisannya n teuteup….@-)
    absen ajah
    mas pri imsonia yah hihihi jam 2.07|-)|-)

  7. agree with #3.

    Especially when you dealing with corporate. They always prefer buying, ’cause there are lots of stuff more than just a software. For example, after sales service/support is one of the most important part from a software.

    note : Kalo ada apa-apa biar ada yang disalahin :p

  8. Believe it, sometime I *do* embed such message into my main code. Nice to know there are people who thought about it at firsthand.
    Yes, so I distribute different version (only the minor number) for different clients. I know this is bad, but for my cirtcumstances (below 10 of distribution boxes) this is enough.

    However, I’d love to see where Steganography develops. It’s a nice to know, nonetheless.

    By the way, in typical LoB (Line of Business) applications, it is not only the software carry the risk of being comprimised but also the database.
    You do realize that some programmers, in some extent, can easily reverse engineered the entire application by having only the database schema.

  9. Steganografi… mengingatkan ku pada Da Vinci Code haha..
    sama2 menyimpan informasi 10SAMPOERNA\Jie.KusumoKonon pelukis2 kenamaan jaman dulu punya teknik watermarking khusus yg (kini) bisa dilihat dgn x-ray :)

  10. I thought that there were only less than 5% of bad people who download it from warez. And it seems to be doesn’t make any sense for mass-distributed software.

    So? In my humble opinion: anggap saja sedekah. :D

  11. Hehe.. kode encyptku buat blogger aja aku bikin setengah mati, dicontek orang hanya dengan nambahin dikit.. sebel kan? Cuma ya itulah resiko dunia maya..

    Thanks ya sharingnya, mau ikutan nyoba.. Kalo watermark sih dah dari dulu pake..

    Sering2 nulis kek ginian ya mas.. Thank you bangetzz.. :)

  12. #23:

    Setuju juga. Kalo sampe ada yang ngopi softwareku, aku anggap nothing to lose, mirip kayak orang minta api rokok dari rokok kita, gak mengurangi api di rokok kita.
    Kecuali untuk aplikasi yang bersifat sensitif, semacam CMS or something like thatâ„¢.

  13. hehehe, denger2 musuhnya om Pri udah advance soal ilmu ginian, dia bisa detect tuh image file dgn teknologi ini:d

  14. It would be almost invisible to the unsuspecting casual users. But to us, that’s one bit of information!

    that’s not one bit actually, that is one byte of information…

    But what if you are able to determine who has been distributing your software illegally? Maybe you will be able to terminate the contract, stopping him from getting future updates, and thus hindering his ability to illegally distribute future version of your software. Or maybe you can even sue him.

    aneh, kenapa customer (distributor) yang disalahkan kalau ada copy dari software, belum tentu dia yang melakukan copy illegal…

    steganography ga akan memecahkan masalah piracy, malah mempersulit distribusi software, lu cuman bisa tau copy untuk customer mana yang dibajak, bukan pembajaknya… useless…

  15. Melacak licensee gak cukup pakai serial number saja? Repot2 amat pakai stegano. :-?
    Download the software, decode serial numbernya, lalu lacak licensee-nya. Hasilnya sama saja dengan yang ditulis Priyadi di atas.

  16. #49:

    that’s not one bit actually, that is one byte of information…

    wrong. because we can only insert space there, not arbitrary ascii character. unless of course if we are inserting up to 8 spaces :).

    steganography ga akan memecahkan masalah piracy, malah mempersulit distribusi software, lu cuman bisa tau copy untuk customer mana yang dibajak, bukan pembajaknya… useless…

    betul. paling tidak itu lead menuju pembajak sebenarnya. daripada gak ada lead sama sekali.

  17. wrong. because we can only insert space there, not arbitrary ascii character. unless of course if we are inserting up to 8 spaces

    hey ‘space’ is ASCII character, please refer to ASCII table (#32), and one ASCII character is 1 Byte that is 8 bits…

  18. #53: that’s still one bit of steganographic information, it is either the space is there or it is not, and it can’t be other character than a space. that’s a yes or no proposition, a boolean value, a bit. it is 1 if the space is there, and it is 0 when it is not.

  19. ok I got that if you refer to steganographic information or precisely “data entropy” not “data size”, but according to data produced by your quoted code, there will be 1 byte expansion not 1 bit…

  20. :-” i ijeh ora mudeng what the topik up there? nganggo boso indo wae tah:-w, or dubbingi wae lah ben gak bileng aku :(.

  21. Saya pernah tanya ke mahasiswa saya: BAHASA APA YANG PALING PENTING BAGI SEORANG PROGRAMMER?
    Semua pada nyebutin macam2 programming language, tapi saya kemudian bilang salah semua. Bagi Programmer atau orang yang bekerja dibidang komputer, bahasa yang paling penting adalah bahasa Inggris!

    TI dan SI sangat cepat perkembangannya, dan hampir semua naskah / informasi dari produk baru pasti dalam bahasa inggris. Kalau kita menunggu buku dalam bahasa Indonesia, ketika buku tsb terbit, teknologinya sudah berkembang dan buku tsb sudah ketinggalan. Lalu lihat juga di internet, berapa perbandingan informasi yang berbaha inggris dibanding Indonesia, saya yakin yang berbahasa inggris ribuan kali lebih banyak.

    Salah satu negara bagian india yaitu Bangalore nilai export IT-nya sekitar 60% dari total export negara bagian tersebut, bayangkan kalau Indonesia dapat ekspor TI 10% dari total ekspor kita …! India maju di bidang IT bukan karena mereka jenius2 tapi karena bahasa pengantar mereka di sekolah mulai SD adalah b.Inggris, kebetulan dulu memang bekas jajahan inggris.

    Jadi, paksakan berbahasa inggris meskipun pasif (hanya tahu baca, nggak dapat ngomong). Maaf saya bukannya mau menggurui, tapi ingin menyadarkan kita betapa pentingnya bahasa pergaulan internasional tersebut.

    Salam,
    HTY

  22. Salam kenal, Mas Priyadi

    Menggunakan steganografi memang bisa mengecoh orang dengan menyamarkan sesuatu yang sebenarnya di balik sesuatu yang terindera. Kita bisa mengirim sebuah pesan rahasia di balik gambar Monalisa yang jelita atau diantara bait lagu Greatest Love of All-nya Whitney Houston. Juga bisa disembunyikan diantara berita gosip, resep masakan, sampai surat cinta.

    Namun, karena “penyisipan” itu, maka output stegano jadi mengikuti media yang disisipi. Sebuah berita singkat jika disisipkan ke lukisan Monalisa menjadi sangat besar pertambahan volume datanya, seperti menyisipkan segenggam jerami di setumpukan kayu. Oleh karena itu, untuk kepentingan pengiriman data yang “tersamar”, stegano tidaklah cukup efisien; karena jika besar data tersamar yang dikirim tidak terkendali, maka justru akan membebani saluran komunikasi. Lain cerita jika kuantitas output dan distribusinya tidak menjadi persoalan.

    Just my two cents,
    Salaam,
    Bahtiar HS

  23. #59: dalam bahasan di atas, steganography digunakan untuk menyisipkan identitas pelanggan, besarnya tidak akan signifikan dibandingkan pesan yang ditumpanginya.

  24. ada yang tau tempatnya download source code buat steganografi gak, kalo ada yang delphi.udah googling kemana-mana gak ketemu neh.
    pengen belajar lebih dalem tentang steganografi neh :)

  25. Ngomong ngomong soal stigano…..??? ada yang punya source code nya gak… yang buat vidio….
    Tolongin dong mas..???? \:d/ please….:((:((

Leave a Reply to Priyadi Cancel reply

Your email address will not be published. Required fields are marked *