KEMBAR78
Big Data in Telecommunication Oper | PDF | Databases | Big Data
0% found this document useful (0 votes)
124 views14 pages

Big Data in Telecommunication Oper

1. The document discusses big data in the telecommunications industry, focusing on data sources, platform architecture, and applications. 2. It describes how telecom operators have unique advantages in data sources and scale due to the vast amounts of customer behavioral data generated every day from services like calls, messages, and network usage. 3. The platform architecture incorporates five key sub-systems: data collection, data storage, data processing, basic applications, and advanced applications. It highlights the data collection and processing systems which integrate diverse data types from various systems.

Uploaded by

SHASHIKANT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views14 pages

Big Data in Telecommunication Oper

1. The document discusses big data in the telecommunications industry, focusing on data sources, platform architecture, and applications. 2. It describes how telecom operators have unique advantages in data sources and scale due to the vast amounts of customer behavioral data generated every day from services like calls, messages, and network usage. 3. The platform architecture incorporates five key sub-systems: data collection, data storage, data processing, basic applications, and advanced applications. It highlights the data collection and processing systems which integrate diverse data types from various systems.

Uploaded by

SHASHIKANT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Journal of Communications and Information Networks, Vol.2, No.3, Sept.

2017
DOI: 10.1007/s41650-017-0010-1
c Posts & Telecom Press and Springer Singapore 2017
Research paper

Big data in telecommunication operators:


data, platform and practices
Zhen Wang1 , Guofu Wei1 , Yaling Zhan1 , Yanhuan Sun2 *
1. Information Center, China Telecom Co., Ltd. Anhui branch, Hefei 230001, China
2. Key Laboratory of Wireless-Optical Communications, Chinese Academy of Science,
University of Science and Technology of China, Hefei 230017, China

* Corresponding author, Email: stracy@mail.ustc.edu.cn

Abstract: In the age of information explosion, big data has brought challenges but also great opportunities
that support a wide range of applications for people in all walks of life. Faced with the continuous and intense
competition from OTT service providers, traditional telecommunications service providers have been forced
to undergo enterprise transformation. Fortunately, these providers have natural and unique advantages in
terms of both data sources and data scale, all of which give them a competitive advantage. Multiple foreign
mainstream telecom operators have already applied big data for their own growth, from internal business to
external applications. Armed with big data, domestic telecom companies are also innovating business models.
This paper will introduce three aspects of big data in the telecommunications industry. First, the unique
characteristics and advantages of communications industry big data are discussed. Second, the development
of the big data platform architecture is introduced in detail, which incorporates five crucial sub-systems.
We highlight the data collection and data processing systems. Finally, three internal or external application
areas based on big data analysis are discussed, namely basic business, network construction, and intelligent
tracing. Our work sheds light on how to deal with big data for telecommunications enterprise development.

Keywords: telecommunication operator, enterprise transformation, big data, platform architecture, practi-
cal applications

-----------------------------------------------------------------------------------------------------
Citation: Z. Wang, G. F. Wei, Y. L. Zhan, et al. Big data in telecommunication operators: data, platform
and practices [J]. Journal of communications and information networks, 2017, 2(3): 78-91.
-----------------------------------------------------------------------------------------------------

1 Introduction over 2015[3] . In 2013, IBM issued the report “appli-


cations of big data to the real world”, which stated
With the explosive increase in global data, the term that the internal data of enterprises were the main
“big data” has been used to describe huge datasets. sources of big data[4] .
Doug Laney[1] was the first to mention the 3V’s of Thus, for enterprises, the fundamental challenges
big data management: volume, velocity, and variety. of big data applications are exploring the large vol-
The data generation capacity has never been more umes of data and extracting useful information or
powerful and enormous since the birth of informa- knowledge for future decision making[5] . Big data
tion technology in the early 19th century[2] . The has begun supporting a wide range of potential ap-
overall mobile data traffic is expected to reach 30.6 plications. For instance, Facebook has used social
exabytes per month by 2020, an eight-fold increase networking data to track user interest patterns and

Manuscript received Dec. 30, 2016; accepted Feb. 8, 2017


This work is supported partially by Key Program of National Natural Science Foundation of China (No. 61631018), the Funda-
mental Research Funds for the Central Universities and Huawei Technology Innovative Research on Wireless Big Data.
Big data in telecommunication operators: data, platform and practices 79

carry out precision marketing, which has yielded vantages of communications industry big data com-
profitable results. In 2014, Alibaba launched the pared to other industries. Section 3 introduces the
“DMP”, which enabled businesses to implement dif- framework of the industry’s big data platforms in
ferent marketing strategies for different people based detail, from collection systems to storage systems to
on the analysis of user information obtained through application systems. Section 4 details three internal
this product. Applications of big data in other fields and external applications based on big data. Finally,
include tracking movie box office receipts[6] , health- this paper ends with a summary and directions for
care system[7,8] , customer surveys[9] , and user char- future research.
acteristics analysis[10] . All these big data applica-
tions are gradually transforming the way we live,
2 Data sources and advantages
work, study, etc.
Faced with continuous and intense competition In this section, the major sources of wireless big data
from OTT service providers, traditional telecommu- and the advantages of operators are introduced.
nications service providers must undergo enterprise
transformation. Fortunately, these operators have 2.1 Data sources
access to rich data sources and huge datasets, which
other industries do not have. Large numbers of cus- As providers of basic network services, the goal of
tomers will generate loads of behavioral data every telecom operators is to provide an information chan-
second of the day, including calling, messaging, net- nel between people and equipment, and between dif-
working, and other kinds of information. Even when ferent types of equipment[14] . Operators themselves
the customer is inactive, location-based data will are the producers of big data. Data generated in a
be generated. Moreover, combined with registration communications network is the main source of Inter-
and business information, customer billing data can net big data.
be obtained. Communications data is mainly derived from the
Consequently, the vast amounts of data that op- following three sources[15] .
erators have can potentially outpace the ability • Data in IT system: user attributes, business
of existing CDR-based processing to improve our consumption information, and terminal information
daily lives[11] . Telecommunications data can be data collected from CRM, billing systems, and ter-
used to optimize operations and drive operational minal self-registration platforms, respectively. Basic
business intelligence to realize immediate business user profiles and characteristics can be described in
opportunities[12] . Multiple foreign mainstream tele- accordance to these data.
com operators have already applied big data for their • Data in access network and core network: mo-
own development. Orange Business Services for in- bile signaling, DPI, M2M data, etc. These data accu-
stance, used big data to enhance the accuracy of mulate in wired/wireless networks whenever clients
their churn detection. Spain’s Telefonica Dynamic use voice, SMS, or networking services. The underly-
Insights obtained reliable predictions of user behav- ing structure of the data is complex, hence targeted
ior by packaging and analyzing data. In 2014, Ver- analysis and processes are needed for different types
izon built data centers in California to implement of data to achieve scenario-based descriptions of user
precision marketing[13] . Domestic operators are also locations and preferences.
innovating their business models by exploring the use • Data in operators own Internet applications:
of big data. online business hall data, palm business office data,
This paper provides detailed discussions of three wing payment data, etc. All data, including user ac-
aspects of big data in the telecommunications indus- cess modes, addresses, times, business preferences,
try. Section 2 discusses the sources and unique ad- investment and consumption habits are completely
80 Journal of Communications and Information Networks

stored in the background of the application, which break down the isolated data and develop a real sense
can be obtained directly. of the big data cloud, under the premise that user
In terms of “Volume”, hundreds of millions of privacy is guaranteed.
users’ behavioral data are already in the petabyte or • Continuous and real-time data. Compared to
even terabyte range. In terms of “Variety”, commu- Internet services providers, telecom operators can
nications data covers all businesses, customers, and obtain position tracking data through the cellular
channels, as well as Internet data, human attributes, network protocol even when users only power-on
position trajectories, and terminal information. In their devices and have no data connection (Wi-
terms of “Velocity”, the quality of communication Fi/3G/4G). This mechanism guarantees real-time
services should meet the real-time requirements of and continuous data collection, which will be more
various applications. powerful in real-time applications such as issuing
early traffic warnings.
2.2 Advantages
3 Big data platform
In China, three operators have the largest number of
users compared to all other industries, i.e., approxi- In this section, we will introduce the telecommuni-
mately 1.3 billion mobile users and 300 million fixed cations operator’s internal big data platform in de-
broadband users[16] . The massive number of users tail. The overall design of the big data platform is
combined with their own industry policies provide based on the principles of data concentration, de-
the following advantages to operators. gree of openness, and cloud computing. It aims
• Authentic user information. Owing to the ex- to provide secure access, storage, sharing, analy-
isting real name system, non-real name users have sis, applications, and management. It helps con-
limited services and are required by law to register. struct an enterprise-level and future-oriented data
This not only ensures the authenticity of user infor- center. Moreover, the platform will create an open
mation, but also guarantees that the data has one- and shared public data environment. The above at-
on-one correspondence with a real person. tributes can guarantee the application implementa-
• Comprehensive and intact information. Unlike tion in all internal departments.
Internet companies that can only interact with users
3.1 Overall framework
through their own App business, operators can ac-
cess all behavioral data on users in the network all As is shown in Fig. 1 the platform mainly consists of
the time, such as when and where they used the 5 parts: data collection system, data storage sys-
service, terminal type, website accessed, products tem, data processing system, open mobile system
searched, hot topic interests, etc. With enough stor- and management system.
age and computing power, we can efficiently and The overall framework has a distinct hierarchy and
completely uncover all these behaviors. Moreover, arrangement. The selected technologies and compo-
with the availability of authentic user information, nents are mature and stable. On one hand, it can
the complete and accurate descriptions of user pro- satisfy the data processing requirements in the cur-
files and features can be obtained. rent data environment. On the other hand, all the
• Identifiable and relatable data. User identifi- included technologies are supposed to be in line with
cations in the operator system include the mobile the future direction of big data.
phone number, ID card number, terminal ID, cook-
3.2 Data collection system
ies, and many other types of information. These data
can be related to financial, Internet, hotel, trans- The data collection system is the basic part of the
portation usage, and other business-related data to platform. It provides a variety of data access tools
Big data in telecommunication operators: data, platform and practices 81

data marketing cooperation decision operation


data operation application data security
maintenance operation support analysis
management management

schedule data data data


open mobile desensitization
analysis mining service

real-time query quasi-real-time statistics query


data query
monitor storage in-memory KV relational MPP distributed encryption
database database database database file storage

resource data watermark


on-line calculation off-line calculation
processing

real-time collection off-line collection


data
quality collection CRM billing DPI signalling Web crawler permission

small volume data large volume data

Figure 1 Big data platform framework

and aggregates the critical structured and unstruc- Synchronization technology based on relational
tured system data from all enterprise management database: Both dblink and OGG are synchroniza-
departments, front-end and back-end. By combin- tion technologies for Oracle databases. OGG is a
ing data from the offline acquisition and real-time comprehensive software package for real-time data
acquisition phase, the system can break down the integration and replication in heterogeneous IT en-
isolated information and aggregate all original data vironments. The product enables high availability
into the unified platform. solutions, real-time data integration, transactional
change data capture; and data replication, transfor-
3.2.1 Collection interfaces mation, and verification between operational and an-
There are many different systems in the communi- alytical enterprise systems.
cations enterprise. Thus, interfaces are needed to Applied scenarios: Dblink is mainly used for data
connect the collection system to other kinds of sys- synchronization between Oracle databases. It is of-
tems. Some data are stored in files while others are ten used in full-scale synchronization. OGG uses the
real-time data. There are two kinds of interfaces. database file synchronization mode. Because of its
The data collection interface collects data from var- high efficiency and small influence on the source sys-
ious interior source systems. The service interface tem, it is currently used in production systems and
manages data sharing and transfer among different other time-sensitive applications, such as the syn-
intermediate systems. Tab. 1 introduces the different chronization of attributes tables, orders, and lists.
data interfaces. Interaction technology between HDFS and tables
based on Sqoop: Sqoop Apache (SQL-to-Hadoop)
3.2.2 Collection technologies was designed to help the RDBMS and Hadoop
In this part, we introduce three common collection achieve efficient big data exchange. With the help
technologies. of Sqoop, users can transfer relational database data
82 Journal of Communications and Information Networks

Table 1 Data interface

interface type end system specification interface mode

data collection national platform of internet log mobile network DPI, mobile network AAA data file
fixed network DPI, source IP, AD subscriber ID, times-
data collection DPI platform of network operation tamp, request URL, user agent, referrer URL, destina- file
tion IP, cookie user port, destination port, etc.
fixed network AAA data, including WLAN authentica-
data collection DPI platform of network operation quasi real-time
tion and broadband user authentication

data collection OIDD platform OIDD system signaling data file, real-time
UDB, ISMP, business pilot, WLAN hotspot management
data collection ODMS platform, TSM platform and other value-added business file
data
mobile network billing details (calling and called), SMS
data collection billing system file
billing, flow billing
data collection billing system fixed network billing details (calling and called) file
data service ability product and application all types of original list of external business file, real-time
platform
data service provincial IT system provincial roaming data issued file

to related systems in Hadoop, such as HBase and ciently collect, aggregate, and move massive amounts
Hive. Sqoop can also extract data from the Hadoop of log data from different sources and store them in
system and then export it to the relational database. a centralized data storage system. It is a lightweight
Applied scenarios: The development of businesses and simple gadget which can easily adapt to various
and applications, especially the impact of big data, collection methods and balance loads.
has led to the exponential growth of enterprise data. Applied scenarios: Flume technology is mainly
Data formats are becoming increasingly diverse, such used for the log collection of each system. The devel-
as text, video, Web crawler data, and many other opment of cloud application systems, distributed ar-
structured and unstructured data. The traditional chitectures, and increasing node numbers make daily
dblink and OGG synchronization technologies have operations and maintenance processes increasingly
failed to meet the demands of the industry. Hence, difficult, such as dispersion, storage pressure, non-
the Hadoop open source framework for data pro- standardized log formats, non-unified query chan-
cessing was introduced. Because of the use of nels, and non-automatic push of abnormal infor-
HDFS file storage mode, Sqoop is a good solution to mation. These problems spurred us to build a log
the synchronization problem between the relational database. The business applications cover the track
database and distributed database file system. Cur- analysis of operation and maintenance personnel, op-
rently, the data stored on the Hadoop platform in- erations staff, business processes in the business hall,
cludes all user information, subsidies, sales, orders, and user Web page access. For example, a clerk re-
DPI, signaling, and other structured or unstructured ports on the part of the business that is inefficient
data. These data are collected by Sqoop components and provides a specific order number. Then, oper-
and will be able to meet the subsequent processing ations and maintenance personnel, according to the
requirements of big data SQL engines such as Im- analysis of customer tracks, can identify the time-
pala, Spark, and Hive. consuming link, customer waiting time, and pure sys-
Incremental document collection technology based tem operation time. Based on the above steps, we
on Flume: Flume NG is a distributed, reliable, and can determine the real reasons for the inefficiency
available system provided by Cloudera. It can effi- and provide recommendations for the optimization
Big data in telecommunication operators: data, platform and practices 83

Table 2 Data collection sources and data scale of a company

business type frequency capacity/TB increment/TB processing memory total storage/TB duration/month
customer, account
day 1.00 0.90 2.00 31.00 1
and user information
inventory data integration day/month 1.20 1.20 2.40 37.20 1
mobile network DPI day 1.90 1.90 3.80 68.40 36
fixed network DPI, ITV day 5.00 5.00 10.00 155.00 1
wing payment month 0.01 0.01 — 0.24 24
port A signaling day 0.60 0.50 2.00 37.20 2
OSS data day 0.20 0.10 0.60 74.40 12
income, bill month 3.00 0.80 2.60 75.00 25
statements day/month 1.70 0.20 3.40 40.80 24
group data month 0.30 0.01 — 7.20 24
account — 14.91 10.62 26.8 526.44 —

and management of the IT system. and processing. Moreover, the ODS and EDW hard-
One company’s current collection system is shown ware platform basically use minicomputers or inte-
in Tab. 2. grated machines, which lead to hard management.
Fortunately, open source technologies can integrate
3.3 Data processing system both structured data (e.g., BSS, OSS, MSS) and un-
structured data (e.g., mobile DPI and fixed-network
The data processing system is the core of the plat-
DPI). After the construction of the offline analysis
form, providing deep mining and analysis services.
platform, we can observe the daily critical quota.
Using the distributed storage and parallel comput-
The specific steps are as follows:
ing framework combined with many kinds of comput-
ing engines, this system can accomplish fast and dis-
1 check the external table data according to certain rules,
tributed computing for structured, semi-structured,
such as volatility and consistency;
and unstructured information resources.
2 check and insert the data into internal tables in interface
layer, and do time stamp and partition;
3.3.1 Processing architecture
3 store the mild summary and detail data generated by
In order to achieve efficient collaboration in data pro- the model calculation in HDFS format;
cessing and meet the requirements of different appli-
4 process based on business logic.
cations, we divided the system into a real-time mod-
ule and an offline module as shown in Fig. 2.
Real-time scenario: Real-time data, including 3.3.2 Processing level
mobile broadband/product development, terminal Tab. 3 shows the data processing level of one provin-
sales, package development, 4G flow, and gross in- cial telecommunication company.
come are all displayed by instrument panels, progress
bars, trend charts, regional hotspot maps, and other 3.4 Other systems
forms. The development status and progress are self-
explanatory. Personnel can make timely adjustments This section introduces the other three systems,
to marketing decisions by utilizing the screen display namely the data storage system, open mobile plat-
and rolling update. forms, and management systems.
Off-line scenario: Business development and appli- Functioning as the support of data analysis and
cation complexities apply loads of pressure to storage sharing, data storage systems can store and query
84 Journal of Communications and Information Networks

data data online processing


source collecting

online calculation online storage


type type
online
real-time online service stream computing online
real-time cache persistent data
log server
application
application
data capture real-time monitor
stream data
recommendation on-line analysis database cache KV storage

batch data

off-line off-line storage off-line calculation


type type
files objects
IO intensive iterative
structured/unstructured data
computation-intensive SQL-like
static data
application application
Webpage data base
search ranking recommended calculation
dictionary multimedia log advertising algorithm security detection

task dependent control off-line


processing

Figure 2 Data processing system frame

Table 3 Data processing scale

preprocessing message queue flow preprocessing


data type data scale
normal value maximum normal value maximum normal value maximum
information 2.3 million/day 125 pieces/s 500 pieces/s 125 pieces/s 500 pieces/s 125 pieces/s 500 pieces/s
file 85 million/day 1 000 pieces/s 2 000 pieces/s

structured, semi-structured, and unstructured data. extempore query for cross-domain data.
In order to achieve efficient data transfer, there are The open mobile platform supports both internal
four layers in the storage system. Interface layer: data applications and external business. First, it is a
this layer aims at peripheral data sources and is re- platform for foreign businesses using the multitenant
sponsible for data collection and preprocessing. It mode. Second, the operator is the platform operator
can manage external data sources, interface types, as well as one of the tenants. The platform needs
format requirements, scheduling methods, and su- to assign users and permissions to tenants, and pro-
pervision of data acquisition and exchange. Integra- vide user-level independent storage space, well allo-
tion layer: this layer integrates the isolated business cated computing resources, secure data protection,
model to establish a set of theme-oriented enterprise etc. The multitenant mode needs to make full use of
data models. Intermediary layer: this layer refines the data analysis capacity and help tenants apply for
the integration layer information for the purpose of resources. It can also perform the intelligent man-
application. It can reduce the degree of coupling agement of tenant resources by recycling those with
between models through the fragmented way of pro- high idle rates and expanding limited resources.
cessing and storage, which supports fast and agile The management system has two parts: data man-
data processing and assembly. Summary layer: this agement and security management. The data man-
layer can provide data analysis, data mining, and agement module is responsible for process scheduling
Big data in telecommunication operators: data, platform and practices 85

and monitoring, generation of the main data and in- registration, billing, terminal type, etc. to handle ab-
dex database, and data resource management. The normal values, outliers, and missing values.
security management module is responsible for user Then we generate derivative variables by combin-
rights, data access, access control, data desensitiza- ing business rules. Cluster the ARPU and flow into
tion, data encryption, watermarks, and other system three categories and generate ARPU-rank field (1, 2,
management functions. 3) and flow-rank field (1, 2, 3) respectively. Calculate
other derived fields including the overflow consump-
tion, ARPU and terminal price matching degree.
4 Practice and applications
This section introduces big data analysis-based prac- 4.1.2 Model and algorithm
tices from three perspectives. The first one is their First, filter the valid input variables. The number
application to normal business. Then, it shows their of input variables follows a short and refined prin-
effects on network optimization. Finally, we will in- ciple. Too many input variables are likely to cause
troduce a business in which the telecommunications problems, such as interference and over-fitting, which
operator collaborates with the government. can lead to a decline in the stability of the model.
There are two methods to select variables: choosing
4.1 3G/4G upgrading by business analysis and choosing accordance with
the correlation coefficient. When the correlation co-
4G has become a key business for telecom operators
efficient between two variables is equal to or greater
since the release of TDD/FDD LTE licenses. It has a
than 0.6, this indicates a moderate or above linear re-
strong influence on future user profiles. By now, the
lationship between the two variables. Here, we only
terminal-SIM matching rate is relatively low. The
need to keep one variable.
number of matching users for the Anhui province is
3 800 000, which accounts for 37.1% of the total as of In order to ensure the universality of the model,
July 2016. Thus, using big data to enhance 4G ter- we need to divide the data into the training set and
minal sales is an effective way to validate the present test set. The model is constructed on the training
study. The first step is using data mining to identify set. The hit rate, coverage, and applicability are ver-
potential users. The ARPU can be considered fol- ified on the test set. The availability is guaranteed
lowed by target marketing. This way, both the user by the cross validation.
scale and value can be enhanced. By comparing the indicators obtained by the au-
tomatic classifier node in the SPSS Modeler, we find
4.1.1 Dataset that decision tree algorithm has the best overall per-
formance among the different algorithms shown in
The sample consists of 4 600 000 customers of a
Tab. 4.
provincial company as of April 2016. They used
neither 4G terminals nor LTE flow. Over the next Table 4 Performance of different classifiers
three months, the number of 4G terminal upgrades performance
algorithm
was 334 000 (i.e., the number of positive samples).
precision recall
Because of the large difference between positive and
decision tree 70% 80%
negative samples, we performed some balance mea-
neural network 71% 50%
surements. Meanwhile, 70% of the sample was des-
logistic regression 69% 55%
ignated as the training set and the remainder as the
test set.
In order to ensure the purity of the data, we Fig. 3 shows the key factors chosen by the feature
need to check the data on user information, self- selection module of SPSS.
86 Journal of Communications and Information Networks

flow rank get marketing, mainly through CRM pop-ups and


key brand search times
4G key words search times phone calls. Overall, 223 000 users replaced their
large flow applications
phone duration phones. The outcomes of marketing activities were
25%
over consumption significantly improved and the success ratio was 7%
9% contract expiration
ARPU rank higher than previous campaign, especially for cluster
21% 10% matching degree
13% video preference 1, as shown in Fig. 4.
game preference Mining potential clients and target marketing
access duration
age have proven to be quite effective. It may be pos-
sible to use additional types of classifiers to enhance
Figure 3 Key impact indicators
algorithm performance and choose the crucial indi-
cators more accurately. Moreover, users can be clus-
According to the key indicators mentioned above,
tered according to more detailed standards, which
we screened four categories of non-4G terminal and
may further improve the success ratio.
non-LTE flow mobile users at the end of July.
Cluster 1: Preferred large flow applications such
4.2 Big data driven network construc-
as videos and games. The monthly average flow was
greater than or equal to 500 MB. tion
Cluster 2: Highly concerned about 4G terminals. 4G users will switch to the CDMA Ev-Do network in
Frequently searched terminal brands and used key- areas where LTE network coverage is weak or non-
words like 4G. existent. This is commonly known as “cutting-down
Cluster 3: Preferred high-speed applications; flow”, where users obtain network access from the 3G
played games and used other large flow applications base station and the packets are forwarded through
frequently. ARPU was greater than or equal to 59 the 4G core network (connected by the eHRPD in 3G
Yuan. core network). However, the same frequency inter-
Cluster 4: Preferred certain terminal brands when ference and other complex wireless disruptions may
the contract or replacement period expires. still cause users to cut down to the 3G base station,
even though the capacity and bandwidth are suffi-
4.1.3 Results and discussions
cient. In order to avoid changing user perceptions
22.8 about the quality of the 4G service, personnel can
optimize the network or plan a new site. By ana-
20
lyzing the traffic thermodynamic diagram, we can
16.6
determine which areas need expanded LTE network
success ratio /%

14.8
15 14.0
coverage and report these conclusions to the wireless
10.9 network optimization center or construction center.
10
7.3
4.2.1 Model and algorithm
5
Resolving the frequent cutting-downs in core busi-
ness districts will enhance the user experience and
0
previous current cluster 1 cluster 2 cluster 3 cluster 4 balance the LTE network load. We can identify the
samples
cutting-down station by implementing range deter-
Figure 4 Practical effect of proposed model mination, data integration, and thermodynamic di-
agram analysis.
From August to September 2016, we divided 1. Automatic classification of base station cover-
1 500 000 users into four clusters and performed tar- age area based on grid holography. Based on holog-
Big data in telecommunication operators: data, platform and practices 87

raphy grid GIS platform, we developed the auto-


matic classification of base station coverage area. By
GIS mapping and spatial calculation ability, we re-
alized the intelligent recognition of base stations in
core business circles, transportation hubs, and other
key areas. GIS platform input: local network IDs,
base station IDs, macrobase station labels, base sta-
tion latitudes, and base station longitude. GIS plat-
form outputs: coverage names, coverage types, cor-
responding base station IDs, and local network IDs.
2. Form flow integration tables. We integrated
and cleaned data on bill payments, including 1X, Ev-
Do, LTE, and prepaid bills to form a traffic-wide ta-
Figure 5 Condition of cutting-down flow
ble. The tables contain the traffic flow type, Internet
connection modes, up and down flow, time, places
visited, base stations, and other features. Combin- By using big data in network construction, we not
ing these tables with the base station tables in GIS only enhance the experiences of users and the quality
platform, we can analyze the whole network traffic of service in busy districts, but also can save more
efficiently based on different divided zones and time resources in low flow regions, which will improve re-
periods. source utilization. In our future work, we will focus
on applying this analysis to the city’s WLAN estab-
3. Thermodynamic diagram display. In order to
lishment to further optimize the infrastructure.
facilitate the visualization, we developed a thermo-
dynamic diagram display function. We selected a
part of the province’s core business centers to an- 4.3 Intelligent trace
alyze the cutting-down flow. The thermodynamic
With the development of user-scale mobile Internet,
map can provide more information on the distribu-
the total number of active domestic equipment is ex-
tion of top cutting-down stations.
pected to reach 1.3 billion by June 2016[17] . The
As is shown in Fig. 5, the darker color means that widespread popularity of mobile phones will gener-
there is too much cutting-down flow. Additional op- ate massive signaling data. Mobile phones will trans-
timizations should be performed in these darker re- mit user location information to base stations, such
gions. as calls, SMS, Internet access, periodic events, etc.
Based on the above features, there will be a new way
to obtain city planning information by analyzing sig-
naling data. We can exploit the massive individual
4.2.2 Results and discussions
user characteristics which will accurately reflect the
We cooperated with the Anqing branch to optimize overall characteristics, especially spatial movement.
wireless signal coverage in core business centers. In Moreover, this scheme is low cost and stable. Just
the process, the number of cutting-downs decreased a few installed equipment are needed to enable real-
significantly and the amount of network traffic in- time traffic data collection for a wide area within a
creased. The number of cutting-down times de- short time, with small impact on the network. Com-
creased from 27 000 to 11 000 for a single station each pared with traditional traffic survey technology, sig-
month, and the average daily traffic increased from naling analysis has many advantages, such as wider
11.2 GB to 18.6 GB. coverage, larger sample, lower cost, and long-term
88 Journal of Communications and Information Networks

continuous monitoring. This section explores the ap- 3. Key location mining algorithm. In location
plication of the intelligent tracing system based on mining, we chose DBDCAN algorithm which based
the mining of spatial and temporal mobile data. on density instead of normal K-means algorithm.
We aim to build a support environment and in- This is because 1) the number of K-means is diffi-
teractive interface for the intelligent footprint sys- cult to determine in advance; 2) K-means polygons
tem. We will also establish a spatial-temporal anal- form round clustering shapes easily and is not suit-
ysis standard for big data and industry. Moreover, able for squares, rivers, or places with other shapes;
will promote multidimensional interconnections in 3) K-means is very sensitive to noise data and the
order to achieve efficient organization, orderly man- trace data is not clear enough. On the other hand,
agement, reasonable use, and high value when ana- DBSCAN is adaptable and can avoid the above men-
lyzing spatial-temporal big data. tioned problems.
4. Spatial-temporal trajectory real-time road
4.3.1 Model and algorithm matching algorithm. The base station coverage ra-
dius is about a few hundred meters and it is often
1. Real time data acquisition technology based on far away from the road. Information is updated fre-
mobile signaling. Based on real-time monitoring and quently, generally in less than 1 min. Faced with
using specialized signaling acquisition software and high frequency and high error rates, a map-matching
hardware, operators can filter and analyze specific algorithm is essential to map the base station loca-
signaling processes and obtain information about tion to road-level positions accurately and in real-
base stations and signaling. This technology can lo- time. Thus, we based the algorithm on the com-
cate signals from small cells to large regions, leading monly used probability graph model to conduct road
to personalized services in road monitoring applica- network matching. Because of the characteristics of
tions. The data recorded include the user IDs, time signaling, some optimization is needed to ensure the
stamps, positions, and other location information. accuracy and efficiency of the algorithm. A road
It updates every 5 min to ensure the accuracy and test can provide powerful support for this algorithm.
continuity of user location information. First, install the technical analytic device in the car.
2. Moving sequence detection. There are many Then, record the vehicle trajectory and collect the
uncertainties and disturbances in the signaling time corresponding handover sequence. Finally, using the
sequence. For example, a user’s signal may suddenly marked data, such as time, latitude, and longitude
move far away from the trace, which we call “flying- to adjust the parameters.
points”. Another case is shown in Fig. 6, where the
user does not move at all but handovers occur fre- 4.3.2 Results and discussions
quently. The reason is the overlapping region, as Based on the above models, we can construct the
represented by the red areas. Hence, we need to con- intelligent traffic analysis platform in cities, which
duct preprocessing to filter out the abnormal data will provide dynamic crowd analysis, real-time data
and obtain the real moving sequence by real-time on traffic conditions, traffic behavior analysis, urban
flow computing technology. planning support, etc. Some specific applications are
described below.
C2 C7
C1 C4 The first is traffic demand analysis and road plan-
C5 C9 ning. Based on the analysis of 24/7 hours crowd
C3 C2
C8 movement, we established the planning OD(Origin
real trace Destination) matrix. Then, use tracking modeling
C6
inaccurate trace
we analyzed the main trajectory. According to the
Figure 6 Signaling sequence established traffic grid and the movement coordi-
Big data in telecommunication operators: data, platform and practices 89

nates, we can obtain the real road load demand, by a spatial-temporal real-time road matching algo-
which is based on the crowd flow. The OD trajectory rithm. Testing of the area proved that the results
at different time periods can generate a full time OD of our algorithm accurately represented the actual
trajectory diagram, as shown in Tabs. 5 and 6. conditions. Fig. 7 shows that traffic jams usually
occur at the time people go to work or when they
Table 5 OD matrix of 8:00 am∼9:00 am
go home, at 8:00 and 18:00, respectively. We also
D observed that the rush hour on weekdays starts ap-
O
J01 J02 J03 J04 S01 S02 S03 S04 S05 SUM proximately 2 h earlier than on weekends. Moreover,
J01 50 3 11 5 19 15 19 38 36 196 the overall condition of the road is slightly better on
J02 5 38 69 1 14 5 19 21 67 239
weekends.
J03 8 53 82 3 13 10 8 17 59 253
J04 20 1 1 0 9 22 11 44 8 116 63 weekdays
weekends
S01 3 8 22 1 53 6 7 21 76 197 62
S02 7 4 6 23 7 45 6 45 19 162 61

speed/km·h−1
S03 7 7 8 2 14 6 0 17 34 95 60
S04 36 18 29 32 36 42 7 140 74 414 59

S05 16 19 51 1 31 8 19 34 117 296 58

SUM 152 151 279 68 196 159 96 377 490 1 968 57


56
55
54
Table 6 OD matrix of 18:00 pm∼19:00 pm
04:30 08:30 12:30 16:30 20:30
D time
O
J01 J02 J03 J04 S01 S02 S03 S04 S05 SUM Figure 7 Display of road speed
J01 55 8 11 15 17 9 14 34 17 180
J02 9 42 61 4 17 6 14 20 57 230 Based on mobile phone terminal position infor-
J03 8 81 115 2 20 8 17 14 45 310 mation, data such as real-time population flow, road
J04 14 2 4 0 3 24 6 38 7 98 congestion, etc., can be obtained with high precision.
S01 23 26 26 12 150 39 20 80 111 487
On the one hand, operators can provide travel rec-
S02 12 3 6 39 7 109 4 69 18 267
ommendations for users. On the other hand, this
S03 15 19 10 8 16 7 0 26 26 127
can provide the basis for traffic management depart-
S04 59 18 31 36 34 60 14 243 39 534
ments to control traffic, such as controlling the time
S05 41 70 112 16 128 39 57 80 277 820
of traffic lights and adjusting the buses routes. More-
SUM 236 269 376 132 392 301 146 604 597 3 053
over, unexpected events such as gathering crowds can
be prevented. The analysis also facilitates the quick
These tables show the population flow from the response to abnormal situations. There are multiple
corresponding origin (O) to the corresponding des- and significant applications that we must develop in
tination (D). J** and S** represent the residential the future.
and commercial zones, respectively. In Tab. 5, many
people go to the business district in the morning for
5 Conclusion and future work
work. Tab. 6 shows that some of these people go
home after work while others spend their leisure time Big data is transforming the way we live and do
in the business district. business in this era of information explosion. It has
The second application is the display of road con- brought a lot of challenges, but also great opportu-
ditions. We calculated the speed of each moving se- nities to the communications industry. Faced with
quence and displayed the real-time road conditions continuous and intense competition from OTT ser-
90 Journal of Communications and Information Networks

vice providers, the ways in which to exploit big data OIDD Open Information Dynamic Data
to achieve enterprise transformation is an important ODMS Operation Data Management System
topic. This paper analyzed three aspects of big data: AAA Authentication, Authorization and Accounting

the big data characteristics of the communications ISMP Integrated Services Management Platform

industry, big data platform architectures, and big UDB User Database

data applications. M2M Machine to Machine


TSM Truted Service Manager
In the product development and application pro-
RDBMS Relational Database Management System
cess, according to national laws and regulations, we
WLAN Wireless Local Area Network
addressed the management of information safety and
OGG Oracle Golden Gate
set up a hierarchical information protection mecha-
IT Information Technology
nism. From system architecture design to applica-
ARPU Average Revenue Per User
tion development, we adopted multiple technologies
to protect user information, such as encryption and
watermarking. References
We will conduct in-depth research in the two ar- [1] D. Laney. 3D data management: controlling data vol-
eas: internal data application and external coopera- ume, velocity and variety [J]. META group research
tion. The first is enhancing the efficiency and effects note, 2001, 6: 70.
of the application of big data by establishing pro- [2] X. D. Wu, X. Q. Zhu, G. Q. Wu, et al. Data mining
with big data [J]. IEEE transactions on knowledge and
duction systems for internal customers’ online inter- data engineering, 2014, 26(1): 97-107.
actions and paying attention to the customer expe- [3] C. V. N. Index. Global mobile data traffic forecast up-
rience by performing interface and process optimiza- date, 2015-2020 white paper [R]. 2016, 4.
[4] M. Chen, S. W. Mao, Y. H. Liu. Big data: a survey [J].
tion. Through rapid iteration and response of the
Mobile networks and applications, 2014, 19(2): 171-209.
system function, the front line can become the in- [5] J. Leskovec, A. Rajaraman, J. D. Ullman. Mining of
dispensable tool for sales and service support. The massive datasets [M]. Cambridge: Cambridge Univer-
second is to provide better services in terms of the sity Press, 2014.
[6] M. Mestyán, T. Yasseri, J. Kertész. Early prediction of
security of user information in external connections.
movie box office success based on Wikipedia activity big
We will further improve the multiple levels of pri- data [J]. PloS one, 2013, 8(8): e71226.
vacy protection technologies during the construction [7] W. Raghupathi, V. Raghupathi. Big data analytics in
of big data platform. healthcare: promise and potential [J]. Health informa-
tion science and systems, 2014, 2(1): 1.
[8] V. Yadav, M. Verma, V. D. Kaushik. Big data analyt-
Appendix ics for health systems [C]//IEEE International Confer-
ence on Green Computing and Internet of Things (ICG-
This part introduces some specific abbreviations and CIoT), Delhi, India, 2015: 253-258.
their meanings. [9] P. J. Li, Y. H. Yan, C. M. Wang, et al. Customer
voice sensor: a comprehensive opinion mining system for
Table 7 Abbreviations call center conversation [C]//IEEE International Con-
ference on Cloud Computing and Big Data Analysis (IC-
OTT Over The Top
CCBDA), Chengdu, China, 2016: 324-329.
SMS Short Message Service [10] A. C. E. S. Lima, L. N. de Castro. Predicting tempera-
BSS Business Support System ment from Twitter data [C]//The 5th IIAI International
CDR Call Data Record Congress on Advanced Applied Informatics (IIAI-AAI),
OSS Operation Support System Kumamoto, Japan, 2016: 599-604.
[11] W. Fan, A. Bifet. Mining big data: current status, and
MSS Management Support System
forecast to the future [J]. ACM SIGKDD explorations
EDA Enterprise Data Architecture
newsletter, 2013, 14(2): 1-5.
CRM Customer Relationship Management [12] R. L. Villars, C. W. Olofson, M. Eastwood. Big data:
DPI Deep Packet Inspection What it is and why you should care [J]. White paper,
Big data in telecommunication operators: data, platform and practices 91

IDC, 2011. Guofu Wei was born in Guang’an


[13] X. F. Zheng. Big data application and revelation of Sichuan. He received the B.S. degree in
foreign telecom operators [J]. Mobile communications, mathematics from University of Science
2015, 39(13): 29-33. and Technology of China (USTC), Hefei,
[14] Y. J. Huang, M. Feng, S. Y. Ding, et al. Big data devel- Anhui, in 1997, and the Ph.D. degree in
opment strategy for telecom operators (in Chinese) [J]. computational mathematics from USTC,
Telecommunication science, 2013, 29(3): 6-11. in 2002. He is now a senior economist at
[15] K. F. Chen, H. C. Zhou. Research on realization mode China Telecom Co., Ltd. Anhui branch.
of telecom operators’ big data resource and its strategy His research interests include big data modeling and big data
[J]. Mobile communications, 2016, 40(1): 63-67. operations. (Email: weiguofu@189.cn)
[16] Ministry of Industry and Information Technology of
Yaling Zhan was born in Tongcheng,
the People’s Republic of China. Completion of
Anhui. She received the B.S. de-
the main indicators of the communications indus-
gree in computer science and technology
try by Nov. 2016 [EB/OL]. http://www.miit.gov.cn/
from Chongqing University of Posts and
n1146312/n1146904/n1648372/c5427058/content.html.
Telecommunications in 2004. She is now
[17] People’s Daily Online. Annual report on China’s mobile
an assistant engineer at China Telecom
Internet development(2016) [R]. China: People’s net-
Co., Ltd. Anhui branch. She deeply par-
work, 2016. ticipates in the construction of enterprise
internal application system. She is also in charge of the tar-
get acquisition of precision marketing. Her research interests
About the authors include the big data management, modeling and operation.
(Email: 18955159329@189.cn)
Zhen Wang was born in Xi’an, Shaanxi.
He received the B.S., M.S., and Ph.D. de- Yanhuan Sun [corresponding author]
grees all from University of Science and was born in Yantai, Shandong. She re-
Technology of China (USTC), Hefei, An- ceived the B.S. degree in information en-
hui, in 2002, 2005, 2009 respectively. He gineering from Xidian University, Xi’an,
is currently the general manager of In- China, in 2015. She is currently work-
formation Center of China Telecom Co., ing toward the M.S. degree in information
Ltd. Anhui branch. He engaged in wire- and communication systems at Univer-
less big data and big data applications at Samsung Electronics sity of Science and Technology of China
Communications Research Institute, Suwon, Korea, from 2008 (USTC). She received the IEEE Wireless Personal Multime-
to 2010. He published nearly ten papers in core journals and dia Communications 2016 Best Student Paper Award in 2016.
got three patents. His research interests are the applications Her research interests include machine type communication
of big data. (Email: 15305516521@189.cn) and big data application. (Email: stracy@mail.ustc.edu.cn)

You might also like