Make delicious recipes!

Code Obfuscation


Code obfuscation comes handy to protect a copyrighted or a patented software. Because Java is an interpreted language and the intermediate bytecode is so standardized and well documented that it can be decompiled to match nearly 100% of the original code, it becomes necessary to obfuscate the intermediate bytecode so that it cannot be easily decompiled to reveal the original source code easily.


Methods used to obfuscate bytecode

  1. Remove the debugging information such as variable names and line number information.

  2. Mangle the names of variables, methods, package-names and classes.

  3. Encode all string literals and provide a function to decode them. This does not affect the final output of the executable, but decompiled code looks pretty ugly and not immediately understandable. Such encoding can be done for other literals too like integer and float literals.

  4. At the cost of some efficiency, introduce code which is equivalent in functionality but is reasonably more complicated by making use of goto statements, unreasonable true conditions in statements, expanded loops with some valid junk statements in between them.

  5. Some code obfuscators can even insert some non-compilable statements in the bytecode which do not affect the interpretation of the bytecode but fails the decompilers as they are not able to decompile such faulty code. Even if the decompilers succeed, their output of such a non-compilable code is extremely difficult to understand. Bytecode execution remains unaffected due to such buggy code insertion because bytecode interpreters typically are very relaxed in error checking assuming that the compiler would have already done that part.

  6. Remove unused code.

  7. Insert extra unused code.

  8. Make use of function overloading and provide same name to all the functions with different signatures. Imagine understanding a code in which all functions in a class (or in entire code), have been renamed to ‘a()’

  9. Changing the line number information. Line number information is present in bytecode to help debug a program and decompilers use this information to more accurately construct the original source code. So obfuscators mangle this information to confuse the decompilers further.


Problems with obfuscation

The above methods can lead to problems in the actual execution of the program sometimes if the decompiler is not careful to avoid the following pitfalls:


  1. Dynamic class loading

Dynamic class loading (using Class.forName() or ClassLoader.loadClass()) can fail if the package or class names are mangled by the obfuscator. Although modern decompilers are careful to replace static strings used in such dynamic class loader function invocations, problems come when the class/package name is an input from the user or is constructed dynamically by string manipulation.

  1. Reflection

Code using reflection (example: Class.getMethod() or Class.getField()) clearly comes to a problem as #1 above if name mangling is performed.


  1. Serialization

Its not possible to deserialize an obfuscated class into a non-obfuscated class and vice versa. So, care must be taken to use the same class for serialization and deserialization. Either use obfuscated class for both operations or use the non-obfuscated class. But do not mix and match.

  1. Naming Convention Violations

Sometimes the code is expected to contain some well defined method names which the callers of the class assume to be present. If such method names are obfuscated, the code becomes unusable. This is especially true in EJB where method signatures are more of conventions then a specification in some interface or base class.


  1. Maintenance Nightmares

Finally, obfuscated stack trace of exceptions can be a nightmare to the developer. So if the code is not too mature and being run for the very first time without adequate bug fixing, then obfuscation may lead to some big maintenance nightmares.

Some obfuscators however provide a utility which can reconstruct the original stack trace of exceptions even from the obfuscated code. This is done by keeping a reverse-mapping file of the obfuscations performed on the code. Such a mapping file will have original as well as mangled variable names along with the mangled line numbers. When an exception comes in obfuscated code, these utilities lookup the mapping file and construct the original exception as much as possible.








Like us on Facebook to remain in touch
with the latest in technology and tutorials!


Got a thought to share or found a
bug in the code?
We'd love to hear from you:

Name:
Email: (Your email is not shared with anybody)
Comment:

Facebook comments:

Site Owner: Sachin Goyal