Programming: Issue #5 [Cryptography][I/O][Python]: File Cryptography

After a brief delay due some crappy Java work that had to be done, here I am again with another Issue about cryptography in Python. This one is going to have a real-world utility, as the code I’m going to explain here can be used to encrypt and decrypt any kind of file. As of myself, I’ve tested it on various short .txt files trying to catch extreme cases and also with some .mp3 files, everything under Linux system.

Here it is the encoder function:


from Crypto.Cipher import AES
import hashlib
import os

def fileEncoderAES(password, fileinput, fileoutput, padding=" "):
	# Functions to handle files.
	# We delete any existant 'fileinput' on the directory,
	# to avoid data overlapping.
	f = open(fileinput,'r')
	if(os.path.isfile(fileoutput)):
		os.remove(fileoutput)
	f2 = open(fileoutput, 'w')
	
	# Symmetric-Key and variable initialization.
	r = "$"
	blocksize = 128
	filetotal = ""
	key = bytes(hashlib.sha256(password).digest())
	mode = AES.MODE_CBC #  ECB CBC
	encryptor = AES.new(key, mode, key[:8] + key[-8:])

	# Main iteration. Encodes blocksize-bit blocks until end of file.
	while(r != ""):
		r = f.read(blocksize)
		if(len(r) < blocksize):
			# Fill last block with padding character
			r = r + (blocksize-len(r)) * padding
			r = encryptor.encrypt(r)
			filetotal = filetotal + r
			break
		else:
			r = encryptor.encrypt(r)
			filetotal = filetotal + r

	# Save and close files
	f2.write(filetotal)		
	f.close()
	f2.close()

First thing we have to do is to specify and load the files that are going to used for the encoding. We need the file that we want to encode and the destination file. Both of their filenames are passed to the function as paramaters.

If there is any file with the same name we setted up for the destination file, this program will destroy it. This is to avoid having corrupted files, as Python does not totally overwrite a file when a function writes on an already existing file. If we have a file that contains an “123456789” string and we write directly on it a “bbb” string, the resulting file won’t contain just “bbb”, but “bbb456789”. Since it’s not the main purpose of this Issue to manage files and I didn’t need more complex handling thant this for myself, I just kept it simple.

Note that the second parameter on the open() function establishes how the files is going to be used. “r” stands for read-only and “w” stands for write-only. There are others, including appending and read/write.

In this function, we receive a password as a parameter. With the

key = bytes(hashlib.sha256(password).digest())

line we set up a fixed size key out from our variable-lenght password, meaning that any password of any lenght will fit on this to generate a key. MD5 or any other algorithm would work fine aswell.


mode = AES.MODE_CBC # ECB CBC
encryptor = AES.new(key, mode, key[:8] + key[-8:])

There is another difference from the first cryptography issue. Here I’m using CBC mode to generate the AES algorithm. ECB keys are simple and do not change at all over the course of the encryption. On the other hand, CBC-mode keys variate on each iteration, meaning that two consecutive “xxxxxxxx” strings wouldn’t be encoded as the same string. This offers much increased protection over brute-force attacks on files that tend to have some data repeated. In order to use the CBC mode the key requires a initialization vector with a 16bit lenght. Since it’s better if this one is pseudorandom, I just made it so it takes the first and last 8 digits of our key, so we don’t have to manually define it.

The main iteration is pretty simple: it keeps getting blocks of 128 characters from the input file, encoding them and storing them in a variable, and repeating this process until the end of the file is reached. If the last block has a lenght inferior to 128 characters, we replenish it with a padding character (blank spaces by default) until it reaches a lenght of 128 characters, and then we store it and stop the iteration.

Last sentences on the function are used to close the files on the program and free the space they were using.

As of the Decoder function, it’s pretty much the same as the Encoder inverted. The only thing that changes is the padding handling, since in this one we have to take it out rather than adding it. The main iteration is not as complex since we know we’ll always receive a file that is multiple of 128 characters due the padding added on the Encoder function, so we don’t have to create anything to handle that special case.


def fileDecoderAES(password, fileinput, fileoutput, padding=" "):
	# Functions to handle files.
	# We delete any existant 'fileinput' on the directory,
	# to avoid data overlapping.
	if(os.path.isfile(fileoutput)):
		os.remove(fileoutput)
	filetotal = ""
	f3 = open(fileinput, 'r')
	f4 = open(fileoutput, 'w')

	# Symmetric-Key and variable initialization.
	r = '$'
	blocksize = 128
	key = bytes(hashlib.sha256(password).digest())
	mode = AES.MODE_CBC #  ECB CBC
	decryptor = AES.new(key, mode, key[:8] + key[-8:])

	# Main iteration. Decodes blocksize-bit blocks until end of file.
	while(r != ""):
		r = f3.read(blocksize)
		r = decryptor.decrypt(r)	
		filetotal = filetotal + r

	# Clean padding at the end of file.
	while(filetotal[-1:] == padding):
		filetotal = filetotal[:-1] 

	# Save and close files
	f4.write(filetotal)
	f3.close()
	f4.close()

On this library I’ve also included a function to calculate a file’s MD5, just to be sure I was correctly handling paddings. If both files have the same MD5, you can be quite sure they have the same content.


def fileMD5(fileinput):
	f = open(fileinput, 'r')
	return hashlib.md5(f.read()).hexdigest()
	f.close()

I’ll probably make another post more focused on file handling since I guess it can be kinda useful as I had some troubles with it when doing this one (found the ‘overwriting error’ the hard way)

I’m quite excited about this issue as it is the first one to have a real use that anyone can benefit from. The complete library about this Issue can be located here:

Issue 5 files

P.S: When encoding data, the output file doesn’t really matter. I’ve always tested it creating a .txt file with the encoded data and worked well for all file extensions.

This entry was posted in Programming and tagged , , , , , , . Bookmark the permalink.

Leave a comment