柏克莱大学研究:关于GFW审查负载量的评估报告

本文介绍美国柏克莱大学研究人员Paul Pearce, David Kantola and Krsna Widmer写的一份关于GFW审查负载量的评估报告。美博园转载于此,供研究GFW的朋友参考,原文是英文,美博园没有全文翻译,只是简单将摘要翻译以便中文读者了解大概。他们认为:对于中国的GFW,其机制、触发器和有效性检测已经有很多研究,但还没有研究来试图估计GFW的审查量或验证其负载分配。他们通过分析GFW审查的成百上千的IP连接成千上万的审查数据的泄漏来执行此评估。分析表明:每小时有超过4900万的伪造SYN-ACK报文通过国际路由进入中国。

全文摘录如下:

Using Data Leakage To Infer The Censorship Volume Of The Great Firewall Of China
Paul Pearce, David Kantola and Krsna Widmer
{pearce, dkantola, krsna}@berkeley.edu

Abstract
The Great Firewall of China (GFW) has been recognized as one of the most sophisticated state-level Internet censorship system in the world [1]. There has been much work on identifying its mechanisms,triggers, and measuring its effectiveness [1, 2, 3, 4, 5], but no work attempting to estimate the volume of censorship performed by the GFW or trying to identify how that load is distributed.

In this work we begin to formulate an estimation of the volume of connections censored by the GFW.We perform this estimation through the analysis of information leaked by the GFW from thousands of censored connections across hundreds of IP addresses. Our analysis shows that over 49,000,000 forged SYN-ACK packets may be generated per hour across an international route into China.
1
Introduction
The Great Firewall of China (GFW) is the name given to a collection of sophisticated state sponsored
censorship mechanisms deployed throughout China for the purpose of preventing Chinese citizens from
accessing specific content [1, 4, 6, 7]. These mechanisms are deployed at a various locations throughout
the country [6], and use a variety of techniques to censor Internet traffic flowing within and through
China [6, 7, 1, 4].
Among the arsenal of tools used by the GFW are IP blacklisting, DNS hi-jacking, deep packet inspection
(DPI), TCP Reset (RST) injection, and TCP session hi-jacking through forged SYN+ACKs [4, 7, 8]. In
this work we focus on the TCP packet injection based censorship.
Interestingly, the GFW leaks some amount of internal state as part of it’s censorship activities. Specifi-
cally, the injected RST+ACK and SYN+ACK packets contain fields with observable structure beyond the
meaning of the fields themselves.
1
The goal of this work is two-fold. First, we wish to to enumerate the information leaked by these injected
packets. Second, leveraging the kind of information leaked we want to begin to estimate the volume of
connections censored by the GFW.
To achieve these goals we perform a measurement study using software designed to trigger censorship
from IP addresses based both within and outside of China. As part of this measurement study we collect
day-long traces from several IP addresses, as well as hour-long traces from 508 different neighboring IP
addresses. We then perform detailed analysis on the network traces collected to quantify what information
has been leaked.
Using the information leaked from the GFW we show diurnal patterns in the volume of censorship, as
well as provide preliminary estimates to the actual volume itself. We also learn some properties of what is
presumed to be the internal load-balancing structure of the GFW.
Our preliminary results show that the number of censored connections flowing through one international
gateway may have been as high as 49,000,000 connections an hour.
2
Censorship In Practice
In this section we outline how keyword based GFW censorship works using packet injection, as well as
enumerate the properties of the injected packets.
2.1
Censorship Mechanics
For keyword filtering, the GFW functions very similarity to a on-path1network intrusion detection system.
As packets flow through the network they are sent to what is suspected to be a cluster of machines for
examination. This network topology is shown in Figure 1.
Figure 2 shows the network-level technical censorship mechanism. In this example the user or the service
has included a censored keyword in the payload of the connection. Once the GFW identifies a censored
keyword has occurred in the payload, it takes steps to censor the connection. This is accomplished through
a RST+ACK packet being injected into the TCP stream. The RST+ACK packet is forged to appear as
if it came from the opposite party and is sent to both the user and the service. This results in both client
and the service tearing down the connection. When the GFW injects RST+ACK packets a race condition
is created between the user receiving the content and the injected RST+ACK packets.
Should the client attempt to establish another connection with a short period of time, the GFW will inject
additional forged SYN+ACK packets after the client’s SYN [9]. Since the sequence number in the injected
packets do not match the actual server’s, this desynchronizes the connection. This is shown in Figure 2.
1On-path means the device is not directly on the path from points A to B, but rather on a side path. This means a device
on the path between A and B must forward the packets to the censorship device. A router-based firewall deployed in your
home would be an example of an in-path device.
2
Figure 1: Diagram showing the network topology and the GFW as an on-path device. Packets are forwarded from
within the network to the GFW for inspection. The GFW can then choose to inject new packets such as TCP RSTs
or SYN+ACKs. The true path between user and service is shown in blue, with the forwarded and injected path
shown in red.
2.2
Injected Packets
In this section we outline the characteristics of the injected forged packets that the GFW generates as well
as the patterns observed in these packets.
2.2.1
Collection and Filtering
Using a client located in the USA and a server we had access to in China we searched for censored keywords
while recording all network traffic. Analyzing this traffic we found pattens in particular fields of both the
TCP and IP layer. We then manually compared the traffic and were able to determine that the injected
forged packets from the GFW were distinct. The key difference between real traffic and forged traffic
was that the forged traffic had no TCP options. To find the forged SYN/ACKs and RST/ACKs we
used the following Wireshark filter:
(tcp.flags.syn ==1 && tcp.flags.ack ==1 && !tcp.options)
|| (tcp.flags.reset == 1 && tcp.flags.ack == 1 && !tcp.options ).
3
Figure 2: Diagram showing the sequence of packets that may occur as a user is trying to contact a service. When
a censored keyword occurs in the payload the GFW will react injecting packets into the stream.
Figure 3: TCP Window (WNDW) values over time. Each point is a the WNDW value from a forged SYN+ACK
in response to a censored HTTP request.
2.2.2
Patterns
Specific fields of the TCP and IP layer in the forged packets has some interesting patterns. In the TCP
layer the Window Size Value (WNDW) field would increase with each censored connection. This pattern
4
can be seen in Figure 3. The field had a range starting in the 100’s going to the 5000’s. When WNDW
reached the 5000’s mark it would loop back to the 100’s. This finding was consistent with prior work [7, 9].
In the IP layer of these packets the TTL and Identification fields were of interest. The TTL field would
increase in a similar manner as the WNDW field of the TCP layer. The TTL field ranged from 30’s to
200’s. Previous work [9] showed the TTL is a function equal to WNDW modulo 64 + k, where k is the
number of hops. We were not able to observe this behavior. This may indicate a change in the GFW
behavior since the previous work.
The Identification field had a different behavior from the other fields of interest. Instead of increasing it
decreased. The range of the Identification field started in the 64,000’s and would decrease down to the
1000’s. It would oscillate up and down with each forged packet received and did not appear to signify any
comprehensible counter type activity. The Identification field would be in it’s high range when WNDW
was in the lower ranges and vice versa. Prior work [9] suggested that the Identification field was a function
as well, but function they suggested did not hold with our findings.
3
Measurement Methodology
In this section we outline the methodology used to probe the GFW, record data, and estimate censorship
volume.
Based on the observed counting behavior of the forged SYN+ACK TCP WNDW field from Figure 3 we
designed a set of experiments to probe the behavior of this counter.
The hypothesis of our experiments is that the WNDW value can be used to infer the volume of censorship
using simple counting between samples.
3.1
Programmatically Triggering Censorship
To track the behavior of the WNDW field we first needed to device a program to elicit the desire
SYN+ACK’s. We built a C program to connect to a specific set of Chinese search engines and search
for various censored keywords. The design of the program was such that the packets emitted were very
controlled and specific to target censorship.
The program was built such that as soon as the GFW injected a forged RST packet it immediately
began a process of sending SYN packets in an attempt to elicit forged SYN+ACK’s. It would send these
SYN’s in quick succession in order to obtain fine-grained counter information. Once the “black-list” forged
SYN+ACK period ended the program would restart the process.
We then used tcpdump to record all traffic of later offline analysis.
5
3.2
Probing From Within China
As an initial step we attempted to access censored material from a handful of IP’s within China. Given
the sensitive nature of using Chinese IP’s for censorship this behavior was very controlled and very limited.
We carried out only enough experiments to get a general sense of the behavior of the GFW with respect
to our experiments.
3.3
Probing From Outside China
Following our probing from within China we performed the same actions from IP’s outside of China,
targeting Chinese search engines. We discovered that the behavior observed from within China matched
the behavior observed on traffic directed towards China. This would appear to imply that the GFW does
not discriminate between inbound and outbound traffic.
For the remainder of our work we performed our measurements from IP’s outside of China connecting to
Chinese search engines.
3.4
Probing From Numerous IPs
Given the counter behavior of Figure 3 we needed to assertion how the WNDW value varied across IP
addresses. To answer this question we ran our program simultaneously across two contiguous ranges of
256 IP addresses, both within the same range of 65,536 (two /24’s in the same /16).
4
Results
4.1
Window Between IPs
Figure 4 shows the forged SYN+ACK TCP WNDW values IP’s encountering censorship at the same time.
These two samples show that the counters are not in sync. Although they start at a similar value (not
typical) they have radically different rates of change. This indicates that there are multiple counters being
sampled from.
We find this behavior occurs when the source IP or the destination IP varies. Varying source and destination
port appear to have no effect no the counter WNDW samples from.
6
Figure 4: Example of two different IP’s sampling from different counters.
4.2
Diurnal Load
Figure 5 shows a plot of the rate of change of the WNDW value from two distinct source IP addresses
from a full day of traffic. Each point corresponds to the rate of change of WNDW between two consecutive
forged SYN+ACKs. The solid line shows indicates a 1-hour rolling median of the data points.
This data shows that the rate of change of the WNDW value corresponds closely to the diurnal load
experienced on networks. During early evening in China (UTC+8) the rate of change is highest. During
the night, rate of change is lowest. Although each counter is distinct, both follow the same overall trend.
This result supports the hypothesis that the WNDW value can be used to measure load.
4.3
Estimating Volume
By probing a given counter over time, we can calculate how many events have occurred using simple
subtraction. We known from our observations that each censorship event increments the counter by 5. So
if we probe a given counter twice and get back a value of 10, we know that 1 censorship event occurred
besides our own.
There are several steps necessary to infer the volume of censorship being conducted by the GFW as a
whole.
1. Calculate the number of censored connections for counters we’ve observed.
7
Figure 5: Rate of change of the WNDW field over a 24-hour window. Solid lines are 1 hour rolling medians of the
data.
Figure 6: Example of two different IP’s from from two separate /24’s that are sampling from the same counter.
8
2. From our samples, determine which samples come from the same counter (counter collisions).
3. From our collisions, estimate the total number of counters on a given network path.
4. Repeat the entire experiment for various routes within and through China to enumerate all counter.
In this section we describe our work in steps 1 − 3. Step 4 is left for future work.
4.3.1
Calculating Censored Connections For A Single Counter
The technique for calculating the number of censored connections for a single counter is very straight
forward. First we calculate the delta, ∆adj, between adjacent data-points, and then sum all n of them.
When a counter wraps since we don’t know the exact wrap-around value we take a pessimistic approach
and call that a delta of 1. The total number of censored connections, Tcfor this counter is then given by:
Tc=1
5[P∆adj− 5n].
From performing this calculation across of sample of 508 IP’s for one hour, we find that:
• Max Tc= 105,237
• Min Tc= 5,174
• Average Tc= 13,826
• Median Tc= 10,656
4.3.2
Finding Counter Collisions
In order to estimate the total number of counters, we first need to count the number of times two sample
IP’s shared the same counter (henceforth referred to as counter collisions). Figure 6 shows an example of
counter collision between two sample IP’s.
For this experiment we relied on a 1 hour trace over two /24’s within the same /16, which contained 508
usable IP’s. Since there are over 60,000 possible combinations to consider, an automated mechanism was
needed.
To identify all collisions, we took the following approach:
9
Figure 7: Two sample IP’s that were clustered together into a single counter by the Levenshtein Distance algorithm
at edit distance 5. These are in-fact not the same counter, but rather a false positive. Manual checking of all
Levenshtein matches at small edit distances removes false positives.
1. For a given sample, divide the data into mono-tone regions. The boundaries of a monotone region
are defined as set of points that are monotonically increasing.
2. Within each monotone region, use a least-squares approximation to fit a line to the data.
3. Create a sequence of the x-intercept values of each region.
Once we had a sequence of values for each sample, we used a modified2version of the Levenshtein Distance
algorithm [10] to calculate the edit distance between all pairs of samples.
After running the Levenshtein algorithm we found a total of 13 counter collisions. These all occurred
within edit distances 0−5. Edit distances 0−4 yielded no false positives and 11 of the 13 collisions. Edit
distance 5 had 2 collisions and 16 false-positives. Figure 7 shows an example of a false-positive. Manually
examining the results for larger edit distances did not yield further true positives.
4.3.3
Estimating The Number Of Counters
In order to estimate the total number of counters, we will assume that a given connection is assigned a
counter uniformly at random. Based on n = 508 samples and E = 13 collisions, we can now estimate
2The basic Levenshtein Distance algorithm for sequences looks for equality between two values i and j. In this application
we want i and j to be within a threshold, since these values are derived from noisy measurements and line fitting.
10
the total number of counters using the Chao-Shen entropy estimation [11]. Chao-Shen provides a good
estimation of the Shannon Entropy, S, with a low number of samples [12]. From the Shannon Entropy we
can estimate the number of bins as 2S. To perform the actual calculation we used R’s entropy package
provided by Hausser et al. [12].
From this calculation we find that the expected number of bins, k = 9603.
The use of the Chao-Shen entropy estimator is backed up by the calculation of the expected number of
hash collisions for n samples into a hash space of size k, given by: E = n − k + k?1 −1
k
?n[13]. Plugging
in our n and estimated k we find the expected number of collisions to be 13.17, very close to our observed
number of collisions of 13.
4.3.4
Preliminary Volume Result
Based on the estimation of the number of counters k = 9603 and using the min Tc= 5,174 per counter
we arrive at the estimation of 49,685,822 connection censored by this cluster of counters over a one hour
period.
5
Future Work
We envision four distinct directions for future work.
5.1
Verifying Existing Measurements
The 49,000,000 number is surprising. If this is erroneous, there are several possible causes.
1. We may have missed counter collisions. This would directly impact the counter estimation. A closer
look at our clustering technique using edit distance must be undertaken.
2. Connections don’t map the counters uniformly at random.
3. Counters don’t increment as expected.
However, it is possible that this number is accurate given that the quantity that we are measuring is the
number of censored connections. Let’s say a user connects to domain X, and as the HTML loads a censored
keyword occurs. Every connection a user makes to that domain would result in a censored connection.
If the page has hundreds of sources all for domain X, that would be hundreds of censored connections.
Further, automated tools or users repeatedly hitting refresh would also lead to inflation in the number of
connections.
11
5.2
Measuring From More IPs
The more samples we have the better our estimation of both the number of counters and the volume per
counter. We’d like to probe from more IP’s as well as probe for longer, to fine-tune our measurements and
have a complete 24-hour picture.
5.3
Measuring Different Routes
Prior work has been done to measure where in the internet topology censorship occurs within China [6].
Leveraging this work we could conduct independent experiments to map out “clusters-of-counters” and to
build a more complete picture of the total amount of censorship within the country.
5.4
Mapping Connections To Something More Qualitative
Number of connections censored is less interesting when you consider the case of multiple resources, auto-
mated tools, and users clicking refresh. More interesting is the number of censorship “events” or separate
activities that encounter censorship, as well as the number of users that encounter censorship. What is
needed is a way to model both website properties and user behavior to map from connections to users or
visits.
6
Related Work
There is a large body of literature devoted to the GFW ranging from technical implementation [4] to
evasion [7]. However, we are aware of no work that has attempted to quantify the amount of censorship
performed by the GFW.
Park et al. [4] performed a wide-scale measurement study of HTML based filtering by the GFW. Their work
describes the limitations of a HTML based system and suggest that these limitations likely resulted in China
discontinuing their operations on many routes into the country. This work suggests that the landscape of
censorship is changing, and that new studies must be performed from multiple vantage points.
Verkamp et al. [1] enumerate censorship techniques in not only China but other countries around the world.
They describe the technical implementations from packet injection to DNS techniques.
Xu et al. [6] perform a comprehensive study aimed at determining where censorship occurs within the
network. Using traditional TTL techniques they map what AS’s within China censorship occurs, and find
evidence that different ISP’s implement filtering in different ways.
The West-Chamber project [7] gives detailed documentation related to the technical properties of the
censored packets. West-Chamber’s goal is to build evasion tools rather than to measure, and focus on
properties could lend themselves to evasion.
12
7
Conclusion
Utilizing information leaked by the Great Firewall of China (GFW) as part of it’s censorship mechanisms
we are able to learn a great deal about the behavior of the GFW itself. From this leaked information we
are able to observe diurnal load patterns and embedded counters.
Using basic counting combined with multiple vantage points, linear fitting, and entropy approximations
we’ve begun the work necessary to calculate the number of connections censored by the GFW. This work
marks the first steps in a larger effort to estimate censorship at a country-wide scale.
References
[1] John-Paul Verkamp and Minaxi Gupta. Inferring Mechanics of Web Censorship Around the World.
In Free and Open Communications on the Internet, Bellevue, WA, USA, 2012. USENIX Association.
[2] J. R. Crandall, D. Zinn, M. Byrd, E. Barr, and R. East. ConceptDoppler: a weather tracker for
Internet censorship. In Proceedings of the 14th ACM Conference on Computer and Communications
Security (CCS), pages 352–365, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-703-2. doi:
10.1145/1315245.1315290. URL http://www.csd.uoc.gr/~hy558/papers/conceptdoppler.pdf.
[3] A. Sfakianakis, E. Athanasopoulos, and S. Ioannidis. CensMon: A web censorship monitor. 2011.
URL http://www.usenix.org/event/foci11/tech/final_files/Sfakianakis.pdf.
[4] J. C. Park and J. R. Crandall. Empirical study of a national-scale distributed intrusion detection
system: Backbone-level filtering of HTML responses in China. In Proceedings of the 30th IEEE Inter-
national Conference on Distributed Computing Systems (ICDCS), pages 315–326. IEEE, 2010. URL
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.191.206&rep=rep1&type=pdf.
[5] Antonio M. Espinoza and Jedidiah R. Crandall. Automated Named Entity Extraction for Tracking
Censorship of Current Events. In USENIX Workshop on Free and Open Communications on the
Internet, San Francisco, CA, USA, 2011. USENIX Association.
[6] X. Xu, Z. Mao, and J. Halderman. Internet censorship in China: Where does the filtering occur? In
Neil Spring and George Riley, editors, Passive and Active Measurement, volume 6579 of Lecture Notes
in Computer Science, pages 133–142. Springer Berlin / Heidelberg, 2011. ISBN 978-3-642-19259-3.
URL http://www.cs.umich.edu/~zmao/Papers/china-censorship-pam11.pdf.
[7] Introduction to project west-chamber. URL https://scholarzhang.googlecode.com/svn/trunk/
west-chamber/README.html.
[8] Graham Lowe, Patrick Winters, and Michael L. Marcus. The Great DNS Wall of China. Technical
report, New York University, 2007.
[9] Intrusion defense system of evaluation and problem. URL http://www.chinagfw.org/2009/09/gfw_
21.html.
[10] VI Levenshtein. On bounds for packings in n-dimensional euclidean space. In Soviet Math. Dokl,
volume 20, pages 417–421, 1979.
13
[11] Anne Chao and Tsung-Jen Shen. Nonparametric estimation of shannons index of diversity when
there are unseen species in sample. Environmental and Ecological Statistics, 10:429–443, 2003. ISSN
1352-8505. doi: 10.1023/A:1026096204727. URL http://dx.doi.org/10.1023/A%3A1026096204727.
[12] Jean Hausser and Korbinian Strimmer. Entropy inference and the james-stein estimator, with appli-
cation to nonlinear gene association networks. J. Mach. Learn. Res., 10:1469–1484, December 2009.
ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1577069.1755833.
[13] Ken Bogart and Cliff Stein. Discrete Math in Computer Science, pages 246–247.

原文引自:GFW Paper

原文链接:https://allinfa.com/using-data-leakage-infer-censorship-volume-gfw.html
原文标题:柏克莱大学研究:关于GFW审查负载量的评估报告 - 美博园
※ 除声明转载,美博园文章均为"原创",软件版权归原作者,转载请以上面超链接注明来源!

本文TinyURL短网址: http://tinyurl.com/y9kfplsd
本文Google短网址: https://goo.gl/eagK6A

如喜欢本站请订阅: 或者 点此【RSS订阅真相网】

网 友 留 言

2条评论 in “柏克莱大学研究:关于GFW审查负载量的评估报告”
  1. Andrzeja says:

    2013-12-07 - I2P 0.9.9 
    http://www.i2p2.de/
    http://www.i2p2.de/download.html

    (美博园)

这里是你留言评论的地方

9 + 6 =
Copyright © 2007 - 2018 , Design by 美博园. 版权所有. 若有版权问题请留言通知本站管理员. 【回到顶部】