JavaScript String Analysis: Abstract Interpretation with Automata and Widening
Abstract
JavaScript is dynamically typed, and objects can have their methods and properties dynamically modified at runtime -- thus, static analysis tools are a challenge to implement and security analysis tools are consequently limited. We propose the abstract interpretation of JavaScript strings into finite state automata, and a widening operator defined on automata to ensure convergence. Accordingly, we augment TAJS (Type Analyzer for JavaScript), an open source dataflow analysis tool for JavaScript, with these changes in order to precisely approximate strings in real-world JavaScript code. Our framework can be applied safely to the subset of Javascript which does not contain for..in looping constructs, though further work may push that boundary to include support for the entire language. Our hope is that this work is extended for vulnerability detection, for exploits like SQL injections and cross site scripting (XSS) attacks.
The problem
Client side and server side, Javascript is ubiquitous online. It's important then, that web applications which use Javascript are secure. Unlike languages like Java and C++, Javascript is dynamically typed, and objects can even have their methods and properties dynamically modified at runtime. Due to the very loose structure of the language, it's very easy to introduce bugs and security vulnerabilities into Javascript programs. That's why we need good tools that can catch these vulnerabilities before deployment.
What we're doing
It's called Abstract Interpretation, and yes, it's as cool as it sounds. It's a form of static analysis - that is, deriving information about a computer program without actually running it. There are a few static analysis tools available for Javascript right now, but none we found include a suite of string analysis tools that are precise enough to prevent critical and widespread vulnerabilities such as SQL injections and Cross-site scripting attacks. Our solution - rooted in the theoretical framework of abstract string domains, aims to do just that, by representing the strings of a Javascript program by the possible values they could contain.