Python: print “Hello DFIR World!”

Wednesday, April 8, 2015 Posted by Corey Harrell
Coursera's mission is to "provide universal access to the world's best education." Judging by their extensive course listing it appears as if they are delivering on their mission since the courses are free for anyone to take. I knew about Coursera for some time but only recently did I take one of their courses (Python Programming for Everybody.) In this post I'm sharing some thoughts about my Coursera experience, the course I took, and how I immediately used what I learned.

Why Python? Why Coursera?


Python is a language used often in information security and DFIR. Its usage is varied from simple scripts to extensive programs. My interest in Python was modest; I wanted to be able to modify (if needed) Python tools I use and to write automation scripts to make my job easier. Despite the wealth of resources available to learn Python, I wanted a more structured environment to learn the basics. An environment that leverages lectures, weekly readings, and weekly assignments to explore the topic. My plan was to learn the basics then proceed exploring how Python applies to information security using the books Black Hat Python and Violent Python. Browsing through the Cousera offerings I found the course Programming for Everybody (Python). The course “aims to teach everyone to learn the basics of programming computers using Python. The course has no pre-requisites and avoids all but the simplest mathematics.” Teaches the basics in a span of 10 weeks without the traditional learning to code by mathematics; the course was exactly what I was looking for.

Programming for Everybody (Python)


I’m not providing a full fledge course review but I did want to provide some thoughts on this course. The course itself is “designed to be a first programming course using the popular Python programming language.” This is important and worth repeating. The course is designed to be someone’s first programming course. If you already know how to code in a different language then this course isn’t for you. I didn’t necessary fit the target audience since I know how to script in both batch and Perl. However, I knew this was a beginner’s course going in so I expected things would move slowly. I could easily overlook this one aspect since my interest was to build a foundation in Python. The course leveraged some pretty cool technology for an online format. The recorded lectures used a split screen between the professor, his slides, and his ability to write on the slides as he taught. The assignments had an auto grader where students complete assignments by executing their programs and the grader confirms if the program was written correctly. The text book is Python for Informatics: Exploring Information, which focuses more on trying to solve data analysis problems instead of math problems like traditional programming texts. The basics covered include: variables, conditional code, functions, loops/iteration, strings, files, lists, dictionaries, tuples, and regular expressions.

Overall, spending the past 10 weeks completing this course was time well spent. Sure, at times I wish times moved faster but I did achieve what I wanted to. Exploring the basics of the Python language so I can have a foundation prior to exploring how the language applies to security work. The last thing I wanted to mention about the course, which I highly respect. The entire course from the textbook to the lecture videos is licensed under a Creative Common Attribution making it available for pretty much anyone to use.

Applying What I Learned


The way I tend to judge courses, trainings, and books is by how much of the content can be applied to my work. If the curriculum is not relevant to one’s work than what is the point in wasting time completing it? It’s just my opinion but judging courses and trainings in this manner has proven to be effective. To illustrate this point as it applies to the Python Programming for Everybody course I’m showing how the basics I learned solved a recent issue. One issue I was facing is how to automate parsing online content and consuming it in a SIEM. This is a typical issue for those wishing to use open source threat intelligence feeds. One approach is to manually parse it in to a machine readable form that your SIEM and tools can use. Another and a better approach is to automate as much as possible through scripting. I took the later approach by creating a simple script to automate this process. For those interested in Python usage in DFIR should check out David Cowen's Automating DFIR series or Tom Yarrish's Year of Python series.

There are various open source threat intelligence feeds one can incorporate in to their enterprise detection program. Kyle Maxwell’s presentation Open Source Threat Intelligence touched on some of them. For this post, I’m only discussing one and it was something I was interested in knowing how to do it. Tor is an anonymity service that enables people to hide where they are coming from as they surf the Internet. Tor has a lot of legitimate uses and just because someone is using it does not mean they are doing something wrong. Being able to flagged users connecting to your network from Tor can add context to other activity. Is the SQL injection IDS alert a false positive? Is the SQL injection IDS alert coming from someone who is also using Tor a false positive? See what I mean by adding context. This was an issue that needed a Python solution (or at least a solution where I could apply what I learned.)

To accomplish adding Tor context to activity in my SIEM I first had to identify the IP addresses for the Tor exit nodes. Users using the service will have the IP address of the exit node they are going through. The Tor Project FAQs provides an answer to the question "I want to ban the Tor network from my service." After trying to discourage people from blocking two options are presented by using either the Tor exit relay list or a DNS-based list. The Tor exit relay list webpage has a link to the current list of exit addresses. The screenshot below shows how this information is presented:


Now we’ll explore the script I wrote to parse the Tor exit node IP addresses into a form my SIEM can consume, which is a text file with one IP address per line. The first part –as shown in the image below - imports the urllib2 module that is used to open URLs. This part wasn’t covered in the course but wasn’t too difficult to figure out by Googling. The last line in the image creates a dictionary called urls. A dictionary associates a key with a value and in this case the key is tor-exit with the value being the URL to the Tor exit relay list. Leveraging a dictionary allows the script to be extended to support other feeds without having to make significant changes to the script.


The next portion of the script as shown below is where the first for loop occurs. The for loop will process each entry (key and value pair) in the urls dictionary. The try and except is a method to account for errors such as a URL not working. Inside the try section the URL is opened in to a variable named file and then it is read in to a variable named data using the urllib2 readlines() option. Lastly, a file is created to store the output using the key value and the file handle is named output.


The next part of the script –image below - is specific to each threat feed being parsed. This accounts for the differences in the way threat feeds present data. The if statement checks to see if the key matches “tor-exit” and if it does then the second for loop executes. This for loop reads each line in the data variable (hence the data listed at the URL.) As each line is read there is additional actions performed such as skipping blank lines and any line that doesn’t start with the string “ExitAddress.” For the lines that do start with this string, the line is broken up in to a list named words. Basically, it breaks the line up into different values by using the space as a separator. The IP address is the second value so it is contained in the second index location in the words list (words[1]). The IP address is then written to the output file and after each line is processed a message is displayed saying processing completed.


The screenshot below shows the script running.


The end result is a text file containing the Tor exit IP addresses with one address per line. This text file can then be automatically consumed by my SIEM or I can use it when analyzing web logs to flag any activity involving Tor.


It’s Basic but Works


Harlan recently said in his Blogging post “it doesn't matter how new you are to the industry, or if you've been in the industry for 15 years...there's always something new that can be shared, whether it's data, or even just a perspective.” My hope with this post is it would be useful to others who are not programmers but want to learn Python. Coursera is a good option that can teach you the basics. Even just learning the basics can extend your DFIR capabilities as demonstrated by my simple script.
Labels:
  1. Very cool, Corey. Thanks for posting this. I wonder how many others out there reading your post are going to find this useful...probably quite a few.

  2. +1 Great post, Corey. I've made several aborted attempts to learn Python and still have it as a goal. I think my problem is similar to yours in that I needed a reason to use it instead of just diving in. In a previous attempt, I wrote a short Python script to solve a problem I had and it worked perfectly. It was really more of a glorified batch file, but it did what I needed it to.
    Anyway, great post!

  3. Harlan and Ken, thanks to you both for posting a comment. To be honest I wasn't sure how well this would go over compared to the other topics I blogged about over the past few months.

  4. It's an encouraging post and an answer for those who work in the DFIR field and want to learn Python but not sure how to start. They got the answer with an example how the basics learned was applied. Thanks Corey.

  5. Excellent thanks for this Corey, I intend signing up for this. Was wondering if you had any thoughts on using the legacy version (2.x) of Python referenced by Python for Informatics: Exploring Information vs the current (v3) version which promises little or no backward compatibility?
    I had already downloaded v3 before I looked at the book and as a non-programmer found myself intimidated by immediate Syntax errors - which I managed to work out but wonder what else may be in store?
    Normally I'd guess that the current version (not x.0 though :)) would be preferable to a legacy version - and yes, I know it's free :)
    Cheers
    Peter

  6. Corey - excellent post. Just goes to show how useful and flexible Python can be for DFIR.

    Cults14 - tThe community is trying to change to v3, but most of the third party libraries are still v2 only. To be honest, I would suggest that you use whichever version you're learning from and go from there. :)

Post a Comment