Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views11 pages

E de Android

Uploaded by

chixinxin6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views11 pages

E de Android

Uploaded by

chixinxin6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Semi-automated Discovery of Server-Based Information

Oversharing Vulnerabilities in Android Applications


William Koch Abdelberi Chaabane Manuel Egele
Boston University Northeastern University Boston University
USA USA USA
[email protected] [email protected] [email protected]

William Robertson Engin Kirda


Northeastern University Northeastern University
USA USA
[email protected] [email protected]
ABSTRACT ACM Reference format:
Modern applications are often split into separate client and server William Koch, Abdelberi Chaabane, Manuel Egele, William Robertson,
and Engin Kirda. 2017. Semi-automated Discovery of Server-Based Infor-
tiers that communicate via message passing over the network. One
mation Oversharing Vulnerabilities in Android Applications. In Proceed-
well-understood threat to privacy for such applications is the leak- ings of 26th International Symposium on Software Testing and Analysis ,
age of sensitive user information either in transit or at the server. Santa Barbara, CA, USA, July 2017 (ISSTA’17), 11 pages.
In response, an array of defensive techniques have been developed https://doi.org/10.1145/3092703.3092708
to identify or block unintended or malicious information leakage.
However, prior work has primarily considered privacy leaks orig-
inating at the client directed at the server, while leakage in the
reverse direction – from the server to the client – is comparatively 1 INTRODUCTION
under-studied. The question of whether and to what degree this Modern mobile applications (apps) typically employ a multi-tier
leakage constitutes a threat remains an open question. We answer architecture. This has been largely due to the explosive growth
this question in the affirmative with Hush, a technique for semi- of cloud computing platforms, such as Amazon AWS, Microsoft
automatically identifying Server-based InFormation OvershariNg Azure and Heroku, allowing developers to conveniently manage
(SIFON) vulnerabilities in multi-tier applications. In particular, the and operate scalable web services [13]. As a result, apps can pro-
technique detects SIFON vulnerabilities using a heuristic that over- vide rich user experiences and are no longer limited by client-
shared sensitive information from server-side APIs will not be dis- device hardware. In such settings, the cloud essentially provides
played by the application’s user interface. The technique first per- an extension of the client’s computation and data storage capabil-
forms a scalable static program analysis to screen applications for ities.
potential vulnerabilities, and then attempts to confirm these can- This application architecture often results in sensitive informa-
didates as true vulnerabilities with a partially-automated dynamic tion flows from user devices to centralized server-side logic and
analysis. Our evaluation over a large corpus of Android applica- storage tiers in the cloud. Users place trust in the app to securely
tions demonstrates the effectiveness of the technique by discov- transfer and store their sensitive information. Unfortunately, de-
ering several previously-unknown SIFON vulnerabilities in eight spite the benefits of the multi-tier architectures, the decoupled na-
applications. ture of the tiers opens up new security and privacy concerns for
the app that would not have occurred if the app was self-contained.
CCS CONCEPTS Much research has investigated ways to detect and prevent leak-
age of sensitive user information. Prior work has identified adver-
• Security and privacy → Domain-specific security and pri-
tising libraries that exfiltrate sensitive user data from mobile de-
vacy architectures;
vices [29]. Malware has also been observed to send sensitive user
data to command and control (C&C) servers [3, 28, 32]. Further-
KEYWORDS
more, client-side data leakages are exacerbated by access control
information leakage, software analysis, Android testing and permission vulnerabilities in Android [38]. To address these is-
sues, many extensions to Android have been proposed to enhance
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
its security and prevent data leakages [5, 30, 52]. Additionally, side-
for profit or commercial advantage and that copies bear this notice and the full cita- channel information leaks such as timing and size of requests, can
tion on the first page. Copyrights for components of this work owned by others than reveal insights about a user’s online activities, even over an en-
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission crypted channel [9, 44]. However, existing research primarily con-
and/or a fee. Request permissions from [email protected]. siders information leakage from the client to the server.
ISSTA’17, July 2017, Santa Barbara, CA, USA In this work, we ask the question of whether sensitive infor-
© 2017 Association for Computing Machinery.
ACM ISBN 978-1-4503-5076-1/17/07. . . $15.00
mation leakage can occur in the reverse direction, namely from the
https://doi.org/10.1145/3092703.3092708 server to the client. To answer this question, we devise an approach

147
ISSTA’17, July 2017, Santa Barbara, CA, USA William Koch, Abdelberi Chaabane, Manuel Egele, William Robertson, and Engin Kirda

that semi-automatically identifies instances of Server-based InFor- given identifier. The cloud service returns a JSON message that
mation OvershariNg (SIFON) vulnerabilities. In particular, we fo- represents this user, shown in Listing 1.
cus on detecting SIFON vulnerabilities by identifying apps that { " f i r s t N a m e " : " Donald " ,
perform client side access control using the heuristic that over- " l a s t N a m e " : " Knuth " ,
shared information from server-side APIs will not be displayed by " e m a i l " : " d o n a l d . k@spambox . us " }
the app’s user interface.
Listing 1: JSON representation of a user profile.
We built a prototype for this approach called Hush for the An-
droid platform, and evaluated it over a large corpus of 31,559 An- To convert this representation into a form that can be com-
droid apps drawn from the Google Play Store. Our evaluation puted on more easily, the mobile app deserializes the user profile
demonstrates that SIFON vulnerabilities indeed exist in the wild, into a Java object, referred to as the model. This typically happens
exposing sensitive user or corporate information to adversaries with the help of a serialization library (e.g., GSON) specifically de-
or market competitors. These vulnerabilities arise from missing signed for this purpose. An example of this deserialization process
or invalid access control policies on an app’s cloud back-end. Es- is shown in Listings 2 and 3.
sentially, SIFON vulnerabilities manifest if web services overshare
public c l a s s P r o f i l e {
data while access control is pushed to and enforced by the client
public S t r i n g firstName ;
instead of the server. We have reported our findings to the develop-
public S t r i n g lastName ;
ers of eight apps we studied and have received confirmation from
public S t r i n g email ; }
two of them.
In a culture of increasingly rapid development cycles [43], Hush Listing 2: Java model of a user profile.
would provide independent and enterprise development teams a
tool to identify and address SIFON vulnerabilities before vulnera- InputStream i s = getProfileInputStream ( ) ;
ble services are deployed in the wild. R e a d e r r = new I n p u t S t r e a m R e a d e r ( i s ) ;
To summarize, the main contributions of this paper are the fol- P r o f i l e p = new Gson ( ) . f r o m J s o n ( r , P r o f i l e . c l a s s ) ;
lowing. Listing 3: Deserialization of a user profile using the GSON library.
• We introduce a novel class of Server-based InFormation After the JSON message has been deserialized into an instance of
OvershariNg (SIFON) vulnerabilities. the Profile class, the app is ready to present the details to the user.
• We propose an approach called Hush that leverages both However, the developer might have recognized that the email is
static and dynamic program analysis techniques to con- sensitive information and should not be displayed. Instead of up-
firm the existence of SIFON vulnerabilities in Android dating the cloud service endpoint to only return safe information,
apps. the developer chose to implement local access control by only dis-
• We develop a prototype implementation of Hush and eval- playing the first and last name of the user in the UI (Listing 4).
uate it over a large corpus of Android apps drawn from
the Google Play Store. Our evaluation demonstrates that tv0 . setText ( " F i r s t Name : " ) ;
SIFON vulnerabilities exist in real apps, manifesting as se- tv1 . setText ( p . firstName ) ;
rious leakages of sensitive user or proprietary corporate tv2 . setText ( " L a s t Name : " ) ;
information. tv3 . setText ( p . lastName ) ;
Listing 4: User interface code to display the safe elements of a
2 BACKGROUND deserialized user profile.

The rise of cloud computing has led to the predominance of a multi- This example contains the essential elements of a server-based in-
tier application architecture for modern applications, especially in formation oversharing, or SIFON, vulnerability. The user that owns
the web and mobile domains. In this architecture, applications are this profile has released this sensitive information to the social net-
split into several tiers, where each tier is responsible for a clearly working app’s cloud back-end under the assumption that it would
defined set of tasks and communicates with other tiers through be properly handled. However, while the developers have made
message passing. One example of such a multi-tier application is an effort to prevent the release of this information by sanitizing
a typical mobile app, where the app implements the user interface the mobile app’s user interface, the cloud service API is neverthe-
and some portion of the application logic, but also invokes cloud- less releasing this information to what should be considered an un-
based services and consumes the results. These services are often trusted client device. We note that client-side defenses such as the
web-based – i.e., they are invoked via HTTP(S) requests, and the Android permissions system, taint tracking, or information flow
data is returned in the form of JSON, XML, Google Protocol Buffers, control would not prevent this vulnerability or its exploitation. As
or similar format. we show in Section 6, taint tracking can be used to identify the
To illustrate this, we introduce a simplified running example of vulnerability, however the problem originates from improper im-
a mobile social networking app authored in Java for the Android plementation of access control on the server and therefore defense
platform. A central concept in this app is that of a user profile that cannot be acheived client-side.
has multiple representations within the cloud, on the wire, and Unfortunately, this simplified example is not entirely fictitious.
within the app itself. To load a user profile, the app invokes a cloud- App A, is a dating app that contains a similar SIFON vulnerability
based web service using HTTP(S) to request a user profile with a as the simplified example in which we discovered in our analysis.

148
Semi-Automated Discovery of SIPHON Vulnerabilities in Android Applications ISSTA’17, July 2017, Santa Barbara, CA, USA

The app developers considered the first and last name, email ad- S-Hush The static analysis stage serves as a scalable, automated
dress, and date of birth of a user’s profile to be sensitive informa- triage phase to determine whether any sensitive data obtained
tion and thus do not display it in the UI to other unrelated users. from the invocation of a cloud service API is hidden from a user
The SIFON vulnerability exists because the sensitive information is (i.e., not displayed in the user interface). The static analysis allows
sent to all users who request a profile indiscriminately. If the server for the efficient identification of apps that should be forwarded to
selectively provided sensitive information only to users who have the subsequent dynamic analysis for confirmation.
a relationship (e.g., friends on the social network) with the data- This static analysis is implemented as a multi-stage data flow
owner, no vulnerability would exist. Our goal with this work is to analysis in which the output of the first stage determines the con-
develop an analysis that can identify instances of SIFON vulnera- figuration for the second. The first stage identifies data flows from
bilities. In the remainder of the paper, we present an analysis and program points that receive data from the network (i.e., sources)
a prototype implementation to that end. to points where that data is deserialized into a Java object (i.e.,
sinks). Then, the second stage identifies flows from these deseri-
3 THREAT MODEL
alization points (sources) to user interface elements (sinks). Note
SIFON vulnerabilities are a consequence of implementation flaws that the sinks from the first stage become sources for the second
in the (server-) application code that supports client applications. stage analysis. If particular deserialized object fields are hidden –
That is, properly implemented access controls on the server side i.e., they do not flow to an UI element – then this data is consid-
would prevent SIFON vulnerabilities. Thus, our threat model does ered to be a potential instance of cloud service oversharing. These
not consider attackers who have the capability to circumvent oversharing instances are categorized according to whether they
proper access controls on the server side. are sensitive or not. If deserialized object fields are hidden and con-
Instead, we consider an attacker who is looking to collect sensi- sidered sensitive, then these fields are labeled as candidate SIFON
tive information without targeting a specific victim, and operates vulnerabilities and the app is forwarded to D-Hush for confirma-
under a budget in terms of money and time. Thus, our attacker tion.
model is opportunistic in the sense that the attacker can afford The reason that the static analysis alone is not sufficient to di-
to spend some time to try to siphon sensitive information off an rectly declare hidden, sensitive deserialized object fields as SIFON
online service. However, the budget restriction incentivizes the at- vulnerabilities is three-fold. First, the static analysis only reports
tacker to shift focus to other targets if the currently analyzed ser- possible flows, but these flows might not necessarily occur at run-
vice does not yield any sensitive information. time. Second, it is impossible for the static analysis to determine
We assume targeted services and their accompanying client whether the cloud service actually returns data to populate the po-
apps are benign but might be vulnerable. Furthermore, we assume tentially vulnerable object fields. If no data is actually returned,
that the attacker can create a regular (i.e., unprivileged) user ac- then the fields will be empty, and no vulnerability will exist in prac-
count on the targeted service if necessary, but has not engaged in tice. Third, the static analysis is oblivious to the apps functionality
social engineering or otherwise duped any victim into disclosing and intent. It is possible the hidden data may be necessarily for
or making available their private data. For instance, social network the apps functionality and given the context in the app may not be
applications, such as Facebook or App A, provide users with the considered sensitive. For example, hidden GPS data could be used
functionality to establish friendships and connected users can ac- to position restaurant locations on a map in one app, while unin-
cess each other’s sensitive information. In this paper we only con- tentionally leaking a users location in another app resulting in a
sider SIFON vulnerabilities where an entirely unrelated attacker SIFON vulnerability.
can access sensitive information of his victims, because the back- D-Hush The dynamic analysis stage takes as input an app that
end server shares this information indiscriminately. contains candidate SIFON vulnerabilities and a set of methods to
Our interpretation of what constitutes sensitive information is hook as identified by the preprocessing module. The app is instru-
based on Trend Micro’s analysis of the Privacy Rights Clearing- mented at these methods to dynamically track information flows at
house (PRC)’s Data Breaches database [31]. In this context, sen- runtime from deserialized messages to the user interface. The app
sitive data encompasses personally identifiable information (PII), is then executed and manually explored by an analyst. During this
financial data, health data, education data, payment cards, and cre- exploration, D-Hush captures how information flows into model
dentials. object fields and how it is accessed there. If there is no access to
sensitive model object fields, D-Hush reports a confirmed SIFON
4 SYSTEM OVERVIEW vulnerability.
We developed an approach for detecting SIFON vulnerabilities on We note that achieving high dynamic coverage of GUI-based
the Android platform that are due to cloud service API oversharing applications with automated tools has proven to be a challenging
called Hush. The goal of the approach is to detect SIFON vulnera- task. While significant research progress has been made on this
bilities in a scalable manner, and is structured as a three-stage anal- front, triggering advanced application functionality requires com-
ysis pipeline:(i) preprocessing, (ii) static vulnerability candidate de- plex inputs that are currently beyond the reach of any automated
tection (S-Hush), and (iii) dynamic confirmation of vulnerabilities tool [12]. However, future improvements in this fundamental en-
(D-Hush). An overview of Hush is shown in Figure 1. abling capability could be easily adopted by our approach. In the
Preprocessing The goal of the preprocessing stage is to triage following sections, we describe the technical details of the prepro-
applications submitted for analysis and gather initial information cessing, S-Hush, and D-Hush analysis stages.
for input to the subsequent static and dynamic analysis stages.

149
ISSTA’17, July 2017, Santa Barbara, CA, USA William Koch, Abdelberi Chaabane, Manuel Egele, William Robertson, and Engin Kirda

Figure 1: Overview of the Hush analysis pipeline. Android apps are preprocessed to handle program obfuscation and cull apps that cannot
contain SIFON vulnerabilities. Candidate SIFON vulnerabilities are identified in a static analysis stage, and a subsequent dynamic analysis
stage confirms or rejects candidate vulnerabilities.

5 PREPROCESSING from the network to deserialization of that data into a model, and
The first stage of the Hush analysis pipeline begins by extracting (ii) identifies flows from individual fields of a deserialized model
information about the app packages and method signatures. This into user interface elements. The output of the analysis is a list of
functionality allows us to filter out apps that we consider not sus- deserialized model fields that do not appear in UI elements. These
ceptible to SIFON vulnerabilities. Additionally it also generates in- elements are subject to a classification step that heuristically in-
puts for both S-Hush and D-Hush. In particular, the triage step fers whether individual fields are likely to contain sensitive infor-
discards all apps that do not request the INTERNET permission mation. Hidden and likely sensitive deserialized model fields are
(i.e., they cannot invoke cloud services over the network) or do then forwarded to D-Hush as candidate SIFON vulnerabilities for
not contain a (known) serialization library. confirmation. In the following, we elaborate on each of these steps
S-Hush and D-Hush require the identification of methods from of the static analysis.
three categories: those that invoke network communication APIs,
those that deserialize network data, and those that manipulate the 6.1 Model Deserialization
user interface. The preprocessing stage, thus, first disassembles the The first step of the static analysis takes as input an app to analyze,
app and then performs a lightweight static analysis to extract this a precomputed database of standard Android API methods that re-
information. ceive data from the network (sources), and the list of call sites for
One important challenge in this respect is that some apps are deserialization libraries found by the preprocessing stage (sinks).
obfuscated which can prevent the automated identification of its The goal of this step is to identify flows of data received from the
methods. While generically deobfuscating apps can be challeng- network to deserialization points.
ing, Hush only needs to address the obfuscation of deserialization As a precursor, the app is disassembled to recover its Dalvik
libraries. As network sources and UI sinks are provided and imple- bytecode representation. A class hierarchy analysis (CHA), control
mented by the Android framework, an app developer cannot easily flow analysis (CFA), and call graph extraction is then performed
obfuscate these methods. on the app bytecode to recover a super-control flow graph (sCFG)
As Android apps are easy to decompile, and reverse engineer, that superimposes control flow graphs (CFGs) for individual meth-
application repackaging represents a serious problem [21, 46, 51]. ods onto the program call graph. The analysis then recovers a di-
To counteract such techniques, the Google Android Team in- rected acyclic graph (DAG) G = (S, F ), where S is the set of pro-
cluded the ProGuard [23] obfuscation tool in the Android Devel- gram variables and F are edges that represent transfers of data
oper Tools (ADT). Moreover, in recent versions of the ADT, Pro- between variables. Using the provided database of network APIs
Guard is enabled by default. One of the obfuscation strategies (i.e., sources), program variables are labeled as network sources
employed by ProGuard transforms method names and rewrites Ssource ⊆ S. Similarly, program variables that flow to input argu-
call sites to use the transformed names. For example, ProGuard ments of deserialization API methods are labeled as deserialization
might convert com.google.gson.Gson.fromJson to com.a.b.j.a. sinks Ssink ⊆ S using input from the preprocessor.
As ProGuard needs to transform call sites and target methods syn- A forward data flow analysis is then performed, beginning from
chronously, this approach cannot be used to obfuscate method calls each labeled network source in the app. This analysis iterates us-
to framework-provided APIs (e.g., network and UI APIs). There- ing a regular worklist algorithm until a fixpoint is reached. During
fore, the preprocessor must incorporate a deobfuscation step to re- the analysis, a standard operational semantics is used to model the
verse transformations such as those performed by ProGuard. We propagation of data between variables, updating the DAG (G) in
report on our heuristics-based implementation in Section 8.1. The an incremental fashion.
final output of this module is the set of sinks for S-Hush and the Once a fixpoint has been found, a reachability analysis is per-
set of functions to hook and their category for D-Hush. formed from all network sources to deserialization sinks to obtain
a relation ({) 7→ S × S that indicates whether, given a pair of ver-
6 S-HUSH tices, data flows from one to another. If a flow is ever found during
S-Hush is a fully automated, scalable static analysis to identify this reachability analysis where
data flows that represent potential SIFON vulnerabilities in An-
droid apps. At its core, S-Hush is a two-stage data-flow analy- f
∃f s.t. s { t, s ∈ Ssource , t ∈ Ssink , f ∈ F ∗
sis that(i) identifies flows from program points that receive data

150
Semi-Automated Discovery of SIPHON Vulnerabilities in Android Applications ISSTA’17, July 2017, Santa Barbara, CA, USA

then the analysis records the sink as a potential source of a deseri- 6.3 Hidden Model Field Classification
alized model d ∈ D. This set (D) of deserialization points serves The final step of the static analysis stage is a heuristic post-filter
as input to the next step of the analysis. that classifies the hidden fields identified in the previous two steps
as to whether they are likely to contain sensitive information. This
6.2 Hidden Model Field Identification filter is necessary to reduce the workload on the subsequent dy-
namic analysis stage by focusing effort on fields that are more
The second step of the static analysis takes as input the app to an- likely to be considered sensitive personal or corporate information.
alyze, the set of deserialization points D (sources), and a precom- This classification step takes as input both the set of hidden
puted database of APIs that render text in UI elements (sinks). The fields as well as their corresponding variable names. These field
goal of this step is to enumerate flows from individual fields of a de- names are easily recovered from the app bytecode. A simple heuris-
serialized data model to user interface elements and, importantly, tic is then applied that checks each field name against a database
to identify model fields for which no such flow exists. of sensitive data patterns that were manually compiled from key-
The first task of this step is to identify, for each deserialization words extracted from breach reports in the Chronology of Data
point d ∈ D, the type of the model that could be deserialized. The Breaches database [40]. The breach report states the type of infor-
model type is required since the analysis needs to know the to- mation compromised for a variety of organizations including busi-
tal set of fields comprising the object (as well as any nested mem- nesses, government, military, medical providers and educational
ber objects). This information can be recovered as the deserializa- institutions. For example, we derive patterns from the keywords lo-
tion library also needs to know the model type to instantiate at cation, date of birth, and gender, manually extracted from a breach
runtime. In practice, the target model type typically appears as report on July 24, 2013 stating Tinder’s mobile app leaked this user
a java.lang.Class parameter to the deserialization method. The information. Furthermore, we extracted keywords from the An-
analysis uses this information to recover the deserialized model droid API representing PII. For example SubscriberId, returns the
type. IMSI for a GSM phone, and DeviceId, returns the IMEI for a GSM
Models often nest other models which the analysis handles by phone and the MEID or ESN for CDMA phones [25].
recursively identifying their types in order to enumerate nested We note that while this technique is simple, it works well in
fields. One challenge that arises in this context are sub-models that practice as supported by the evaluation in Section 9. The pattern
are contained in collection classes which use Java generics. Hush database can be adjusted using domain specific knowledge depend-
operates directly on Java bytecode and generics are subject to type ing on the type of app being analyzed, and more advanced methods
erasure in accordance with the Java language specification. Thus, based on natural language processing or machine learning of sen-
it is possible that type information is lost when generic contain- sitive keywords could also be leveraged to automatically infer this
ers are used. Fortunately, we found that for the Android platform database if necessary.
the original type is helpfully preserved in the form of a Dalvik an- Finally, we also note that, in principle, these data object field
notation (dalvik.annotation.Signature) that allows the analysis names could be obfuscated. However, in practice we found this not
to recover this information from the bytecode. Although annota- to be the case, as deserialization libraries typically match network
tions are not mandatory in Dalvik code, the Signature annotation message fields directly to data object fields based on their names.
is required by most deserialization libraries, therefore Proguard Thus, model field name obfuscation would require the developer
and other obfuscators must be configured to keep these accord- to synchronize the obfuscated field names with the wire protocol
ingly [27]. – seemingly a fragile engineering exercise which we do not expect
With the full tree of deserialized model classes in hand, the anal- to take place in benign apps.
ysis then proceeds to perform a second round of forward data flow The output of this final step of the static analysis is a set of data
analysis. Here, individual data object fields are treated as sources object fields that are likely to be sensitive and hidden, i.e., not dis-
Ssource , and API methods that set data to be displayed in user inter- played, in the app’s user interface. These fields are considered can-
face elements are considered sinks Ssink . Similarly to the previous didate SIFON vulnerabilities, and are forwarded to the next stage
step, an iterative fixpoint computation is performed to construct of the analysis for dynamic confirmation.
a DAG G = (S, F ) that encodes possible flows between program
variables. Once a fixpoint is reached, a second reachability anal-
ysis is performed that identifies all data object fields whose data 6.4 Example
can flow to a UI element. In contrast to the previous step, however, To illustrate the static analysis, we apply it to the code depicted
this step of the analysis reports those fields for which no flows to in Figure 2 which is based on the running example introduced in
UI elements exist. That is, Section 2. The first analysis identifies a flow where a Profile data
f0
model is deserialized from data received over the network, s 0 { t 0 .
f The second step of the analysis identifies two flows where the pro-
∄f s.t. s { t, s ∈ Ssource , t ∈ Ssink , f ∈ F ∗ . f0 f1
file’s fields are displayed to the user, s 0 { t 0 and s 0 { t 1 . How-
ever, no flow is found from the profile’s email field; this is flagged
These hidden fields deserialized from network input are consid- as potential SIFON vulnerabilities. Finally, the classification step
ered potential SIFON vulnerabilities, and serve as input to the final identifies the email field as likely to be sensitive, and outputs this
classification step of the static analysis. field as a candidate SIFON vulnerability.

151
ISSTA’17, July 2017, Santa Barbara, CA, USA William Koch, Abdelberi Chaabane, Manuel Egele, William Robertson, and Engin Kirda

7.2 Application Seeding


The second step of the dynamic analysis stage is a manual prepara-
tory step that overcomes a fundamental difficulty in dynamically
analyzing complex AUTs. In particular, many apps require an ini-
tial configuration in order to exhibit the majority of their function-
ality. The canonical is the requirement to create a user account
with the app’s cloud back-end. Without an account, the available
functionality exposed by an AUT is often limited, resulting in poor
coverage of the AUT at runtime. Therefore, the dynamic analysis
stage requires an analyst to manually perform any necessary and
arbitrarily complicated initial configuration, including but not lim-
ited to user account creation and the configuration of the app’s
settings.

7.3 UI Exploration and Data Collection


The third step of the dynamic analysis explores the UI state space
Figure 2: S-Hush applied to the running example from Section 2. of the app. Hooks triggered during this step record information ac-
cording to the category of the call site that has been instrumented.
The main goal of this manual analysis is to maximize coverage of
7 D-HUSH the AUT by exercising the UI state space as exhaustively as possi-
ble. We note that this exploration does not entail providing invalid,
In this section, we describe D-Hush, the confirmation stage of the unexpected, or random data as inputs – i.e., the analyst does not
Hush analysis pipeline. The goal of this stage is to confirm the fuzz the app, nor tamper with network traffic as a means of dis-
set of candidate SIFON vulnerabilities reported by the prior static covering data leakage. Rather, the AUT is explored exactly as if a
analysis of an app using a semi-automated dynamic analysis. Re- regular user is using it.
call that this step is necessary as the static analysis can only reason
about the structure of models and fields. However, it is impossible 7.4 SIFON Vulnerability Confirmation
for any static analysis to attest whether sensitive fields are popu-
lated with data at runtime. The fourth step of D-Hush processes the outputs generated dur-
The reason that the dynamic analysis is not completely auto- ing the dynamic exploration step. Given the output of the static
mated is due to current limitations in automatic exploration of analysis stage, the goal of this step is to confirm that(i) statically-
GUI-based apps. The flows discovered with S-Hush are commonly identified data flows exist at runtime from network sources to de-
too complicated to be realized by an automated dynamic UI ex- serialization sinks, (ii) statically-labeled candidate model object
ploration system (for an example refer to Section 9). However, D- fields are populated with data by the cloud service.
Hush is flexible and allows us to immediately benefit from future To that end, this step processes the logs generated by the hooks
improvements in automated UI exploration. in temporal order on a per-thread basis. From this, a first bipar-
D-Hush consists of four sequential steps:(i) hooking setup, tite graph B = (N , D, F ) is generated where N is the set of
(ii) app seeding, (iii) user interface exploration and data collection, network sources, D is the set of deserialization sinks, and F is
and (iv) SIFON vulnerability confirmation. In the remainder of this the set of edges representing flows from N to D. For each flow
section, we elaborate on the details of each of the dynamic analysis f ∈ F ∗ , the dynamically-generated logs are analyzed to enumer-
steps. ate the fields that were populated with data by the deserialization
operation. These populated fields are now used as source (N ′ ) to
create a new bipartite graph B ′ = (N ′, D ′, F ′ ), where D ′ is the
7.1 Hooking Setup set of methods invoked on the deserialized object during runtime.
The first step of this analysis stage uses the information extracted Sources that have no flows to sinks represent fields that are pop-
in the initial preprocessing stage to place hooks in the app under ulated by the cloud backend but are never used by the app. These
test (AUT). These hooks are used to log data at specific program object fields are labeled as confirmed vulnerabilities to be reported
points during dynamic exploration, which includes call sites for and reviewed by the analyst.
methods that receive data from the network, call sites that deseri-
alize data model objects, and call sites that populate UI elements. 8 HUSH IMPLEMENTATION
For each of these categories of call sites, a specific set of informa- In this section, we discuss details of our open source prototype
tion is logged. For network call sites, the data received from the implementation of Hush [34] and elaborate on specific challenges
remote endpoint is collected. For UI call sites, the data set for the that we had to overcome to realize Hush. Our Hush prototype
UI element is recorded. For deserialization call sites the input data was implemented on top of a series of open source tools In par-
and the output object model is logged. Furthermore, we instrument ticular, the prototype focuses on the Google Gson [22] deserial-
the resulting deserialized object to hook all method invocations, in- ization library. According to AppBrain [2], this library is the #1
cluding property accesses, on this object. serialization solution and is included by 14.66% of all installed

152
Semi-Automated Discovery of SIPHON Vulnerabilities in Android Applications ISSTA’17, July 2017, Santa Barbara, CA, USA

Android apps. Beyond Gson, Hush supports the Google Protocol We specify network input sources in a precomputed data-
Buffers [24], FasterXML Jackson [19], and FlexJson [8] data serial- base derived from enumerating all methods in the Android
ization libraries. We note that Hush supports any data deserializa- framework that receive a response from an HTTP(S) connec-
tion method for which the model can be automatically extracted. tion. As examples, these include the getInputStream method
During the analysis, when a deserialization point is reached, Hush from HttpURLConnection and java.net.URLConnection classes,
will attempt to extract the model class from a method parameter of getEntity from the org.apache.http.HttpResponse class. Fortu-
type java.lang.Class. If this parameter does not exist, Hush uses nately, as these methods are part of the Android framework, we do
the method’s class as the model. This approach provides the flex- not need to worry about obfuscation. User interface element sinks
ibility necessary to support arbitrary deserialization libraries by were manually compiled by identifying methods in the Android
simply adding new deserialization methods to the configuration. SDK that allow text to be displayed to the user. Examples of these
No code changes are necessary to support additional libraries. sink methods include setText from android.widget.TextView,
setTitle from android.app.AlertDialog, and the loadData
method from the android.webkit.WebView class.
8.1 Preprocessing
Method signatures are extracted from the app byte code using the 8.3 D-Hush
Androguard [1] reverse engineering framework. Additionally, the
The prototype implementation of the D-Hush dynamic analysis
presence of the INTERNET permission is checked with aapt, a tool
stage relies on the Xposed hooking framework [42]. Xposed oper-
included in the Android SDK.
ates by replacing the Zygote process with an extended version of
Before Hush can perform its intended analysis task, obfuscated
the app_process executable that launches Zygote1 . When a new
apps must be deobfuscated. Specifically, our prototype aims to
Dalvik virtual machine is created, this Xposed-modified version of
automatically identify invocations of the deserialization method
app_process loads external packages – in this case, the hooking
com.google.gson.Gson.fromJson by heuristic signature match-
and logging code required to collect the information described in
ing. To this end, we perform a one-time, offline, lightweight
Section 7.
static source code analysis over all versions of the Gson li-
The hooking initialization procedure takes as input the set of
brary to extract a set of strings that uniquely identify the class
methods to hook provided by the preprocessing step. For both UI
com.google.gson.Gson (e.g., error messages). Deobfuscation oc-
and network methods, we used the same set as for S-Hush (see
curs by first decompling the AUT and identifying candidate Gson
Section 8.2). For deserialization methods, however, we include log-
classes by matching against the previously identified strings. Next,
ging for another JSON library, org.json.JSONObject, that cannot
the name and signature of each fromJson function in each one
easily be analyzed statically. (For a more detailed discussion please
of these classes are extracted using signature matching. Note that
refer to Section 10.1) The goal of this additional feature is two-fold:
the fromJson method in com.google.gson.Gson has four different
first, to demonstrate that SIFON vulnerabilities affects several de-
signatures. As three of these signatures only take Java primitive
serialization libraries and second, to support our claim that Hush
types as parameters, deobfuscation is trivial through method sig-
is library independent and hence highly extensible. Thus, we ana-
nature matching since Java primitive types are never obfuscated
lyze two different categories of deserialization: JSON data deserial-
(otherwise, method resolution would fail). In particular, it is suf-
ized to models using Gson, and JSON data mapped to dictionaries,
ficient to extract call sites to methods with an identical signature
or key-value pairs using org.json.JSONObject. Note that the sec-
to these three fromJson methods, even if the name of the method
ond family is constructed dynamically, where dictionary keys are
has been obfuscated. The fourth signature, however, takes as one
retrieved from the server at runtime and cannot be tracked by S-
of its parameters the type com.google.gson.stream.JsonReader,
Hush. The dynamic analysis, however, can process these objects
which is subject to obfuscation. In this case, we simply omit this
and extract the necessary information to detect SIFON vulnerabil-
parameter from the signature and match against the remaining pa-
ities. During the hooking setup, if the data is mapped to a model,
rameter of java.lang.reflect.Type and a return value of type
both data and model are logged. If the data is mapped to a JSON
java.lang.Object. For each identified function, we output the
dictionary, then the dictionary’s key-value mappings are logged
package name, the class name and the function signature. Despite
too. Moreover, as JSON dictionaries are not mapped to models, ac-
the heuristic nature of this deobfuscation algorithm, the reverse
cessing the object data is achieved through a set of accessor and
transformation works well in practice.
mutator methods that should be hooked to track data modification.
Detecting SIFON vulnerabilities in the JSONObject scenario is
8.2 S-Hush slightly different from Gson. The main idea remains the same: data
that is sent by the server and never used is considered oversharing.
The S-Hush static analysis stage is written in Java and uses a mod-
As the constructor of the JSONObject class is hooked, the data (i.e.,
ified version of FlowDroid [3] to perform static data flow analy-
key-value pairs) contained in the resulting object is also known.
sis. To obtain the required static coverage necessary to find hidden
This includes the set of keys that are read (through the hooking
data from a network response in modern Android apps, several
of accessor methods) and those that are written to (through the
modifications to FlowDroid were necessary [33], including Async-
hooking of mutator methods). Keys that are created but never used
Tasks and Fragment support. Fragments are used by 67% of the
(either read or write) are considered leaked information.
apps in our dataset and thus not supporting this functionality in
the analysis was not an option. 1 On Android Zygote is the parent of all apps.

153
ISSTA’17, July 2017, Santa Barbara, CA, USA William Koch, Abdelberi Chaabane, Manuel Egele, William Robertson, and Engin Kirda

We note that both algorithms rely on data accesses (read or The average analysis time for an app was 5 minutes 33 seconds,
write) to assess whether a field is used or not. This process might which supports our decision to set the timeout to 2 hours. Further-
generate false negatives. For Gson hooking, the static analysis can more, the average memory usage of the static analysis tasks was
flag a method as accessing a field; however, this access is in a 2.39 GB. In total, S-Hush processed 5,481 apps and we manually
branch that is never taken. The JSON algorithm is more subtle as investigated the 177 apps whose analysis was terminated due to
SIFON vulnerabilities detection is fully based on dynamic analysis. the timeout. This investigation revealed that the single most preva-
The idea is that a key might be accessed for an internal computa- lent reason (i.e., 161/177 apps) for the timeout was that the inter-
tion but never shown to the user. For instance, we noticed that in procedural, finite, distributive subset (IFDS) solver used by Flow-
social and dating apps, user email addresses are usually sent by the Droid got stuck and did not make further progress in the analysis.
server but hidden from the UI. D-Hush Results S-Hush reported 126 apps with potential SI-
We therefore provide an additional operating mode (called FON vulnerabilities. These apps were first examined to determine
Fuzzy mode) for users aiming to detect all SIFON vulnerabilities compatibility with D-Hush. We found 38 were not runnable due to
even if the data is used internally by the app. The downside of either a programming bug (i.e., app crashes at starting for 11 apps)
this technique, however, is a high number of false positives. More or to network endpoints that were not reachable at the time of
precisely, the algorithm performs a fuzzy string matching between our analysis (27 apps). Additionally, we were unable to analyze 12
the data received from the server and information displayed to the apps as 6 apps had user interfaces in non-latin languages, which
user. If a value is sent but not displayed, the algorithm flags it as prevented the authors from meaningfully engaging with the UI,
suspicious. To do so, we proceed as follows: for all JSON objects and 6 apps required special credentials (e.g., special pin code).
received from the server, key-value pairs are extracted. Then, all Thus, we were left with 76 apps that potentially contain SIFON
strings written to the UI are collected via the hooking of UI meth- vulnerabilities. Our evaluation confirmed SIFON vulnerabilities in
ods set during the first phase. Finally, the algorithm compares the a total of eight apps. Table 1 presents the sensitive data that was
values received from the server and the strings set in the UI with found to be leaked in each app accompanied by the number of in-
a fuzzy string comparison algorithm. stalls in the Google Play Store. In the following sections, we select
App A and App B as case studies to explore the SIFON vulnerabil-
9 EVALUATION ities in more detail.
We evaluate the effectiveness of Hush over a large corpus (i.e.,
31,559) of Android apps. To this end, we analyzed all apps in the 9.1 Case Study: App A
social category of Google Play as archived by the Playdrone [46] App A is an online dating app. S-Hush identified this app as having
project. We chose this category, as apps therein are likely to handle a potential SIFON vulnerability due to the server responding with
sensitive information that should not be shared beyond the owner the first and last name, date of birth, ZIP code, and email address
of the data and the app provider indiscriminately. during a profile request indiscriminately of the user who is sending
Experimental Setup To identify likely SIFON vulnerabilities, the request. The detailed results of the static analysis are displayed
S-Hush was run in parallel using 8 workers each allocated with in Table 2.
20 GB of RAM and sharing 16 2.27 GHz CPUs. To ensure forward The data is deserialized to the object specified by “Model.”
progress of our analysis through the dataset, we set a timeout of “Shown” represents the number of fields presented to the user
2 hours for the analysis of each app. D-Hush was executed on a while “Hidden” are those not displayed by a UI element. “Sensitive”
MacBook Pro with 16 GB of RAM and two 2.6 GHz CPUs. The represents the number of fields that are hidden and also considered
dynamic analysis used the Genymotion Android emulator running sensitive. This is the most important metric when determining if
Android 4.1.1 with Xposed framework version 2.6.1. the app may have a SIFON vulnerability. Additionally, we report
S-Hush Results We started with 31,559 apps contained in the the number of fields in the model that are never used anywhere
Playdrone data set under the social category. 5,481 of these apps in the app as “Unused.” The same metrics are reported for each
passed the preprocessing filter (see Section 5). That is, all apps that case study. App A is a textbook example of a SIFON vulnerabil-
request the INTERNET permission, communicate with the network, ity. S-Hush identified that the Contact model is deserialized from
and are found to invoke a (potentially obfuscated) deserialization three network responses, and the hidden model fields which are
method. Further, 10.46% of these apps contained obfuscated deseri- considered sensitive are different across all three cases. This indi-
alization methods. During the analysis, 315 apps resulted in Flow- cates that the developers implemented a local access control pol-
Droid requesting more than the allocated memory (i.e., run out of icy in an attempt to secure the data. App A was then forwarded to
memory), 177 did not finish within the 2 hour time limit (i.e., run D-Hush for validation. Upon installation of the app, manual inter-
out of time) and 553 terminated due to fatal errors (e.g., FlowDroid action was required to create an account. Dynamic analysis con-
runtime errors). firmed that when a search is performed, the server responds with
The analysis found that 998 apps have static data flows from net- a list of matches. When one of these matches is clicked, for whom
work sources to deserialization points among which 951 have also we are not connected to, the server returns the information as iden-
flows between deserialized object fields and UI elements. Finally, tified by the static analysis. These results are then deserialized by
126 apps contained models with sensitive information that is not the Gson library. Overall, the dynamic analyses identified 67 fields,
displayed to the user. These apps are then forwarded to D-Hush out of which 52 were shown to the user. From the remaining 15,
for confirmation. one field was not sent by the server, seven were sent with “---”

154
Semi-Automated Discovery of SIPHON Vulnerabilities in Android Applications ISSTA’17, July 2017, Santa Barbara, CA, USA

Table 1: Overshared information per app. (*) indicates fields discovered with the fuzzy mode.
Apps Leaked Data Number Installs
App A first and last name, DOB, last action*, ZIP, gender*, user ID*, email, profile status 10,000 - 50,000
App B email, home page, street, ZIP code, phone number 10,000 - 50,000
App C Client OS*, email*, friend list*, user ID*, latitude, longitude 10,000 - 50,000
App D latitude, longitude, last action 5,000 - 10,000
App E Phone number 1000 - 5000
App F userRelationID, latitude*, longitude* 500 - 1,000
App G address, DOB, phone number, email*, deviceOS, Facebook ID, encrypted latitude, longitude, and password 100-500
App H DOB, hashed password, last login, user type 100 - 500

(i.e., three dashes) as content, and seven contained sensitive infor- Table 2: Static analysis results for App A (top) and App B (middle)
mation. In addition, this particular app appears to use integers for Model Shown Hidden Sensitive Unused Total
the member IDs, potentially allowing an adversary to enumerate
LoginObj 7 1 1 0 8
member records to perform unauthorized bulk collection of sensi-
BlogArticle 4 4 0 0 8
tive user data.
InboxMsg 7 5 0 0 12
Contact 50 7 1 1 57
9.2 Case Study: App B
Contact 41 16 1 10 57
App B is a social network connecting dog owners, breeders, and Contact 43 14 1 8 57
dogs. S-Hush identified this app as having a potential SIFON vul-
UserEntity 4 24 8 2 28
nerability due to the server responding with the first and last name,
VerisionEntity 0 1 0 0 1
street address, ZIP code, country, phone, and email address during
RegisterEntity 3 15 0 14 18
a profile request, while keeping this data hidden from the user. The
detailed results of S-Hush are displayed in Table 2. When the server
responds to a request to obtain the profile, the data returned is dese- 10.1 Static Analysis Limitations
rialized in the UserEntity model. This app only displays data from The preprocessing step in Hush uses heuristics to deobfuscate apps
four of the 28 model fields, eight of which are considered sensitive. before submitting them for static analysis. Thus, obfuscation tech-
App B was then forwarded to D-Hush for validation. This app also niques that go beyond merely renaming method names can poten-
required the creation of a user account. However, account creation tially thwart this step. However, recall that Hush aims at analyzing
is not supported from the mobile client. Thus, we had to register benign apps and thus we would not anticipate advanced obfusca-
the account through the app’s accompanying website. Overall, 146 tion techniques to be beneficial to regular app developers.
fields were analyzed, 60 were shown to the user, and five were iden- As demonstrated in Section 8, S-Hush is flexible enough to han-
tified as sensitive. Among the 86 unused fields 46 are never sent by dle a variety of serialization libraries. However, one method of
the server, 3 are empty, and 37 have some value. A quick analysis of deserializing network data into Java data-structures is based on
these values showed that the AnimalOwnerEntity is the only model Android’s JSONObject class. Instead of deserializing network data
leaking sensitive information. Among its seven unused fields, the into a model of a specific type, JSONObject will simply deserialize
first and last name are not considered sensitive, and thus there are a JSON string into a nested java.util.Map. The app can then ac-
five detected leaks. The SIFON vulnerability is triggered as follows. cess individual values in this structure by indexing the map with
When browsing a user profile, a list a of dogs owned by the user a key value. The Map instances returned by the JSONObject class
is presented. By clicking on a dog, a request for the dog’s details uses regular strings as keys. Thus, to accurately reason about these
is made to the server. The response includes the owners informa- nested structures would require a precise handling of string values
tion (i.e., the AnimalOwnerEntity model). This information is sent and operations – a capability currently not supported by S-Hush.
but never shown to the user. This result shows how difficult it can Finally, as S-Hush is implemented on top of FlowDroid, it inher-
be to prevent SIFON vulnerabilities: while explicit user data (i.e., its FlowDroid’s soundness characteristics. As most static analysis
UserEntity) was never sent by the server, that same data was nev- systems that target complex apps, FlowDroid and by extension S-
ertheless leaked through a different model (AnimalOwnerEntity). Hush cannot be sound “as this would make the analysis unscalable
or imprecise to the point of being useless” [37].
10 CHALLENGES AND LIMITATIONS
As Hush relies on static and dynamic analysis techniques, Hush 10.2 Dynamic Analysis Limitations
is subject to their fundamental limitations, such as limited path Android apps are interactive and event-driven. Thus, inputs are
coverage for dynamic analysis and potentially high false positives normally in the form of events that correspond to user interac-
due to over-approximation in the static analysis phase. However, tions (UI events), or system events, such as an incoming phone
beyond these globally applicable limitations, our prototype also call or text message. While testing tools can generate such events
suffers from challenges and limitations that result from our imple- automatically, previous research (e.g., [12]) concluded that simple
mentation. These limitations mainly arise from shortcomings of approaches outperform advanced user interface exploration tech-
our static and dynamic analyses. niques along many dimensions. For the use case considered in

155
ISSTA’17, July 2017, Santa Barbara, CA, USA William Koch, Abdelberi Chaabane, Manuel Egele, William Robertson, and Engin Kirda

Hush, the most important dimension is code coverage. Unfortu- a general-purpose VM-based out-of-the-box framework to recon-
nately, Choudhary et al. [12] found that code coverage of any exist- struct Android malware behaviors. SmartDroid [50] and AppsPlay-
ing automated GUI exploration tool is sobering (∼ 48%) with Mon- ground [41] improve code coverage by intelligently stimulating the
key scoring the highest. Based on these results, we experimented app during dynamic analysis to reveal malicious behavior.
with the Monkey [26] to see whether we would achieve sufficient Similarly to existing static analysis systems, current dynamic
coverage to confirm the suspected SIFON vulnerabilities automati- analysis systems for mobile apps are tailored to detect either infor-
cally. While this approach worked for App H, none of the other vul- mation leakage from client to an external server, or to detect spe-
nerabilities could be confirmed by this fully automated technique. cific malicious activities on the device. Hush, in contrast, specif-
The reason is that the user interactions to manifest the vulnerabil- ically targets SIFON vulnerabilities that, to our knowledge, have
ities were too specific for Monkey to trigger. not been studied before.
However, in a scenario where a test engineer applies Hush to Access Control Enforcement Several approaches have been
detect SIFON vulnerabilities for software testing purposes, the en- proposed to enhance the security of the Android OS, and prevent
gineer could simply trigger (or create if necessary) test cases that data leakage. For example, Kirin [17] enforces the security of apps
mimic the corresponding user interactions. Testing frameworks by limiting permissions for the app being installed. Frameworks
such as Expresso [18] already support such user interface testing. such as XManDroid [6], Saint [39], TrustDroid [49], and many oth-
Similar to improved static analysis tools, Hush can immediately ers [7] focus on controlling the communication between compo-
benefit from improved UI exploration techniques once these tech- nents in different apps. In comparison, TISSA [52], MockDroid [5],
niques manage to generate the complex user interactions required and AppFence [30] allow the specification of fine-grained policies,
to trigger SIFON vulnerabilities. and the substitution of fake information returned from the Android
API. Note that all these protection mechanisms are deployed on the
client, and, hence, cannot prevent SIFON vulnerabilities. Finally,
11 RELATED WORK access control can be guaranteed by the programming language.
There have been many research efforts to detect, measure, and pro- Frameworks such as swift [10] and SIF [11] aim at building web ap-
tect sensitive information leakage in mobile platforms. These ef- plications that are secure by construction. While such frameworks
forts fall into three major categories. can be used to prevent SIFON vulnerabilities, they are not widely
Static Analysis For static analysis, a natural starting point is adopted as they require a specific programming language (i.e., JIF)
the Android permission system. Hence, it has been widely scruti- and the data to be annotated.
nized. For example, at a system level, PScout [4] extracts and an-
alyzes the permission specification from Android OS source code 12 CONCLUSION
and shows that at least 22% of the permissions are not documented. In this work, we presented SIFON, or server-based information
Stowaway [20] considers the same problem from the app’s point oversharing, a new class of security vulnerabilities in multi-tier
of view, and shows that many apps are over-privileged. Note that applications. SIFON vulnerabilities arise due to oversharing of in-
as SIFON vulnerabilities occur at the cloud service, enforcing per- formation from server-side APIs that is not displayed by the ap-
missions at the client side cannot be an effective countermeasure. plication’s user interface. We described Hush, a semi-automated
A popular use of static analysis is for vulnerability discovery. approach to discover and confirm the presence of SIFON vulnera-
For example, CryptoLint [14] uses program slicing to find crypto- bilities. Hush first performs a scalable multi-stage static data flow
graphic misuse in Android apps. CHEX [38], in comparison, scans analysis to screen applications for potential vulnerabilities, and
Android apps for component hijacking vulnerabilities. Compared then confirms the presence of candidate vulnerabilities with a
to related work, Hush is a novel detection system specifically for human-assisted dynamic analysis. We implemented a prototype of
SIFON vulnerabilities, which have not been investigated in prior Hush for the Android platform and demonstrates that it possible
work to the best of our knowledge. to quickly scan thousands of Android applications for SIFON vul-
Outgoing data leakages from mobile apps to external, third- nerabilities with minimal effort. Our work is a first step towards a
party services has been widely studied through static analysis [3, systematic, fully automated framework for server side information
15, 28, 29, 32, 48]. For example, LeakMiner [48], FlowDroid [3], leakage discovery and mitigation.
DroidSafe [28], and DidFail [32] perform static data flow analy-
sis to identify data leakages in Android apps. Note that these tools
consider only information leakage originating from the client to a ACKNOWLEDGMENTS
third-party service. S-Hush, in contrast aims to identify sensitive,
This material is based on research sponsored by DARPA under
hidden data received by the app from a network endpoint.
agreement number FA8750-15-2-0084. The U.S. Government is au-
Dynamic Analysis TaintDroid [16] was the first taint tracking
thorized to reproduce and distribute reprints for Governmental
framework to uncover information leakage at runtime for the An-
purposes notwithstanding any copyright notation thereon. The
droid platform. Dynamic instrumentation is now widely used for
views and conclusions contained herein are those of the authors
mobile malware analysis. Sandbox systems such Andrubis [36] and
and should not be interpreted as necessarily representing the of-
DroidBox [35] use custom instrumentation of the Android system
ficial policies or endorsements, either expressed or implied, of
coupled with taint tracking. VMI-based dynamic analysis systems
DARPA or the U.S. Government.
such as DroidScope [47] and CopperDroid [45] are proposed as

156
Semi-Automated Discovery of SIPHON Vulnerabilities in Android Applications ISSTA’17, July 2017, Santa Barbara, CA, USA

REFERENCES [28] Michael Gordon, Deokhwan Kim, Jeff Perkins, Limei Gilham, Nguyen Nguyen,
[1] Androguard Team. 2015. Androguard. https://github.com/androguard/ and Martin Rinard. 2015. Information-Flow Analysis of Android Applications
androguard. (2015). in DroidSafe. In Proceedings of the ISOC Network and Distributed Security Sym-
[2] AppBrain. 2015. AppBrain Stats. http://www.appbrain.com/stats/libraries/dev. posium (NDSS). Internet Society.
(2015). [29] Michael C. Grace, Wu Zhou, Xuxian Jiang, and Ahmad-Reza Sadeghi. 2012.
[3] Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Unsafe Exposure Analysis of Mobile In-app Advertisements. In Proceedings of
Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. Flow- the ACM Conference on Security and Privacy in Wireless and Mobile Networks
Droid: Precise Context, Flow, Field, Object-sensitive and Lifecycle-aware Taint (WISEC).
Analysis for Android Apps. In Proceedings of the ACM SIGPLAN Conference on [30] Peter Hornyack, Seungyeop Han, Jaeyeon Jung, Stuart Schechter, and David
Programming Language Design and Implementation (PLDI). ACM. Wetherall. 2011. These Aren’t the Droids You’re Looking For: Retrofitting An-
[4] Kathy Wain Yee Au, Yi Fan Zhou, Zhen Huang, and David Lie. 2012. PScout: droid to Protect Data from Imperious Applications. In Proceedings of the ACM
Analyzing the Android Permission Specification. In Proceedings of the ACM Con- Conference on Computer and Communications Security (CCS).
ference on Computer and Communications Security (CCS). ACM. [31] Numaan Huq. 2015. Follow the Data: Dissecting Data Breaches and Debunking
[5] Alastair R. Beresford, Andrew Rice, Nicholas Skehin, and Ripduman Sohan. 2011. Myths. (2015).
MockDroid: Trading Privacy for Application Functionality on Smartphones. In [32] William Klieber, Lori Flynn, Amar Bhosale, Limin Jia, and Lujo Bauer. 2014. An-
Proceedings of the Workshop on Mobile Computing Systems and Applications (Hot- droid taint flow analysis for app sets. In Proceedings of the 3rd ACM SIGPLAN
Mobile). International Workshop on the State of the Art in Java Program Analysis.
[6] Sven Bugiel, Lucas Davi, Alexandra Dmitrienko, Thomas Fischer, and Ahmad- [33] William Koch, Abdelberi Chaabane, Manuel Egele, William Robertson, and
Reza Sadeghi. 2011. XManDroid: A New Android Evolution to Mitigate Privilege Engin Kirda. 2017. FlowDroid Modifications for Hush. https://github.com/
Escalation Attacks. Technical Report. BUseclab/soot-infoflow-android. (2017).
[7] Sven Bugiel, Stephen Heuser, and Ahmad-Reza Sadeghi. 2013. Flexible and Fine- [34] William Koch, Abdelberi Chaabane, Manuel Egele, William Robertson, and En-
grained Mandatory Access Control on Android for Diverse Security and Privacy gin Kirda. 2017. Hush. https://github.com/BUseclab/hush. (2017).
Policies. In Presented as part of the 22nd USENIX Security Symposium. [35] P. Lantz. February 2011. Android Application Sandbox. http://code.google.com/
[8] Charlie Hubbard . 2015. FLEXJSON. http://flexjson.sourceforge.net/. (2015). p/droidbox/. (February 2011).
[9] Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. 2010. Side-channel [36] Martina Lindorfer, Matthias Neugschwandtner, Lukas Weichselbaum, Yanick
leaks in web applications: A reality today, a challenge tomorrow. In Security and Fratantonio, Victor van der Veen, and Christian Platzer. 2014. Andrubis –
Privacy (SP), 2010 IEEE Symposium on. IEEE, 191–206. 1,000,000 Apps Later: A View on Current Android Malware Behaviors. In Pro-
[10] Stephen Chong, Jed Liu, Andrew C. Myers, Xin Qi, K. Vikram, Lantian Zheng, ceedings of the International Workshop on Building Analysis Datasets and Gath-
and Xin Zheng. 2007. Secure Web Applications via Automatic Partitioning. In ering Experience Returns for Security.
Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Prin- [37] Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondrej Lhoták, J. Nel-
ciples (SOSP ’07). ACM, New York, NY, USA. son Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders
[11] Stephen Chong, K. Vikram, and Andrew C. Myers. 2007. SIF: Enforcing Con- Mller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto.
fidentiality and Integrity in Web Applications. In Proceedings of 16th USENIX Commun. ACM (2015).
Security Symposium on USENIX Security Symposium (SS’07). USENIX Associa- [38] Long Lu, Zhichun Li, Zhenyu Wu, Wenke Lee, and Guofei Jiang. 2012. CHEX:
tion, Berkeley, CA, USA, 1:1–1:16. Statically Vetting Android Apps for Component Hijacking Vulnerabilities. In
[12] Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. 2015. Auto- Proceedings of the ACM Conference on Computer and Communications Security
mated Test Input Generation for Android: Are We There Yet? CoRR (2015). (CCS).
[13] Hoang T Dinh, Chonho Lee, Dusit Niyato, and Ping Wang. 2013. A survey of [39] Machigar Ongtang, Stephen McLaughlin, William Enck, and Patrick McDaniel.
mobile cloud computing: architecture, applications, and approaches. Wireless 2012. Semantically rich application-centric security in Android. Security and
communications and mobile computing 13, 18 (2013), 1587–1611. Communication Networks (2012).
[14] Manuel Egele, David Brumley, Yanick Fratantonio, and Christopher Kruegel. [40] Privacy Rights Clearinghouse. 2015. Chronology of Data Breaches. http://www.
2013. An Empirical Study of Cryptographic Misuse in Android Applications. privacyrights.org/data-breach. (2015).
In Proceedings of the ACM Conference on Computer and Communications Secu- [41] Vaibhav Rastogi, Yan Chen, and William Enck. 2013. AppsPlayground: Auto-
rity (CCS). ACM. matic Security Analysis of Smartphone Applications. In Proceedings of the Third
[15] Manuel Egele, Christopher Kruegel, Engin Kirda, and Giovanni Vigna. 2011. In ACM Conference on Data and Application Security and Privacy (CODASPY).
18th Annual Network and Distributed System Security Symposium (NDSS). San [42] rovo89. 2015. Xposed Module Repository. http://repo.xposed.info/. (2015).
Diego, UNITED STATES. [43] Ken Schwaber. 2004. Agile project management with Scrum. Microsoft Press.
[16] William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon [44] Qixiang Sun, Daniel R Simon, Yi-Min Wang, Wilf Russell, Venkata N Padman-
Chun, Landon P Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N Sheth. 2014. abhan, and Lili Qiu. 2002. Statistical identification of encrypted web browsing
TaintDroid: an Information-Flow Tracking System for Realtime Privacy Moni- traffic. In Security and Privacy, 2002. Proceedings. 2002 IEEE Symposium on. IEEE,
toring on Smartphones. ACM Transactions on Computer Systems (TOCS) 32, 2 19–30.
(2014). [45] Kimberly Tam, Salahuddin Khan, Aristide Fattori, and Lorenzo Cavallaro. 2015.
[17] William Enck, Machigar Ongtang, and Patrick McDaniel. 2009. On Lightweight CopperDroid: Automatic Reconstruction of Android Malware Behaviors. In Pro-
Mobile Phone Application Certification. In Proceedings of the ACM Conference ceedings of the ISOC Network and Distributed Security Symposium (NDSS).
on Computer and Communications Security (CCS). ACM. [46] Nicolas Viennot, Edward Garcia, and Jason Nieh. 2014. A Measurement Study
[18] Facebook. 2015. Espresso: Functional UI Testing Framework. http://developer. of Google Play. In Proceedings of the International Conference on Measurement
android.com/tools/testing-support-library/index.html#Espresso. (2015). and Modeling of Computer Systems. ACM.
[19] FasterXML, LLC. 2015. FasterXML, LLC. https://github.com/FasterXML. (2015). [47] Lok Kwong Yan and Heng Yin. 2012. DroidScope: Seamlessly Reconstructing
[20] Adrienne Porter Felt, Erika Chin, Steve Hanna, Dawn Song, and David Wagner. the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis. In
2011. Android permissions demystified. In Proceedings of the 18th ACM confer- Proceedings of the 21st USENIX Conference on Security Symposium.
ence on Computer and communications security. ACM, 627–638. [48] Zhemin Yang and Min Yang. 2012. LeakMiner: Detect Information Leakage on
[21] Clint Gibler, Ryan Stevens, Jonathan Crussell, Hao Chen, Hui Zang, and Android with Static Taint Analysis. In Proceedings of the 2012 Third World Con-
Heesook Choi. 2013. AdRob: Examining the Landscape and Impact of Android gress on Software Engineering (WCSE).
Application Plagiarism. In Proceedings of the Annual International Conference on [49] Zhibo Zhao and Fernando C. Colon Osono. 2012. TrustDroid: Preventing the use
Mobile Systems, Applications, and Services. ACM. of SmartPhones for information leaking in corporate networks through the used
[22] Google, Inc. 2015. Gson Deserialization Library. https://sites.google.com/site/ of static analysis taint tracking. 2013 8th International Conference on Malicious
gson/. (2015). and Unwanted Software: "The Americas" (MALWARE) (2012).
[23] Google, Inc. 2015. ProGuard. http://developer.android.com/tools/help/proguard. [50] Cong Zheng, Shixiong Zhu, Shuaifu Dai, Guofei Gu, Xiaorui Gong, Xinhui Han,
html. (2015). and Wei Zou. 2012. SmartDroid: An Automatic System for Revealing UI-based
[24] Google, Inc. 2015. Protocol Buffers. https://developers.google.com/ Trigger Conditions in Android Applications. In Proceedings of the Second ACM
protocol-buffers/. (2015). Workshop on Security and Privacy in Smartphones and Mobile Devices.
[25] Google, Inc. 2015. TelephonyManager, Android Developers. http://developer. [51] Yajin Zhou and Xuxian Jiang. 2012. Dissecting Android Malware: Characteriza-
android.com/reference/android/telephony/TelephonyManager.html. (2015). tion and Evolution. In Proceedings of the IEEE Symposium on Security and Privacy
[26] Google, Inc. 2015. The Monkey UI android testing tool. http://developer.android. (Oakland). IEEE Computer Society.
com/tools/help/monkey.html. (2015). [52] Yajin Zhou, Xinwen Zhang, Xuxian Jiang, and Vincent W. Freeh. 2011. Taming
[27] Google, Inc. 2017. Proguard configuration for Gson. https://github.com/google/ Information-stealing Smartphone Applications (on Android). In Proceedings of
gson/blob/master/examples/android-proguard-example/proguard.cfg. (2017). the 4th International Conference on Trust and Trustworthy Computing.

157

You might also like