商业问题
每天未被检测出的网络威胁,增加了信息失窃可能性,扩大了对商业的长期影响。为了最小化网络犯罪的破坏性,公司需要提高快速检测其网络上异常活动的能力,快速发现问题并及时做出反应。同时也需要提高分析历史数据和发现模式的能力,以帮助他们快速感知异常活动。这些都要求有先进数据关联模型的系统。
尽管一般的杀毒软件、防火墙和事件管理工具就能发现“已知”的病毒威胁,但是“未知的”威胁变化多端,一般的杀毒软件很难立马识别。
技术挑战
当今,公司通常都是利用历史数据和记录来辨别模式和数据间的关联,分析档案数据和流数据,来快速发现差异和潜在威胁。通常需要分析以下几项:
分析以往入侵的特征,来建立已知网络威胁模型
分析多个数据点(如活动时间、活动频率、活动地点),并分析这些数据点与单个用户及过去趋势之间的关联
分析社交网络上异常情况与公司主要成员之间的关联
分析这些项目的系统必须包括一个复数据吸收层,流数据可以在此进行转换,还需要包括一个类图像储存层,用于保存数据间的关联,方便以后查询。
数据泄露事件上涨
783起:2014年美国公司报告泄露数据事件783起创历史新高;
2万亿:至2019年全球泄露造成的损失已经达到2万亿美元;
引人深思的案例分析:阿什利?麦迪逊案
3200万:2015年遭受攻击的网络数据服务商的客户数高达3200万家;
15,000:有约15,000个美国政府雇员曾暴露过涉及国家安全的信息;
7亿6千万:集体诉讼案要求的损失赔偿高达7亿6千万美元;
实时应对的需求
发现数据泄露的应对时间:
98天:对金融公司来说需要98天;
7个月:对零售商来说需要7个月;
随着消耗数据和分析数据工作的不断增多,需要创建一个吸收层,既能消耗、转换和储存流数据,同时又能产生并维护交易间关联信息,这确实非常难。在很多时候,对企业来说这都是一个难以跨越的门槛。
除此之外,现在可用的大部分图像储存都不能达到几十亿字节和边缘的规模,不能做到同时支持每天数十亿分析和查询的处理:必须达到一定的等级和规模,才能识别不断出现的威胁,在更大损失发生之前快速做出反应。
大部分公司都依赖于定制吸收层的对应方案来维护关联,这既不能扩大规模,也不能支持必须的查询时间。这样的应对方案有以下局限性:
复合吸收路径方案,花费过高且很难维持
应对某些非法入侵,如可接受查询时间的预计算子图,效率低,成本高,因此功能效果欠佳。
ThingSpan应对方案
ThingSpan?是objectivity的快速数据应对方案平台,结合了Hadoop和Spark, 帮助公司企业设计可靠的网络安全应对方案。可以帮助公司企业吸收、转变和消耗大量不同的数据流,以产生并维持复合的可拓展的图像结构。这些结构可以运行拍字节,且能有效支持复杂的连续的查询。
ThingSpan采用开源代码,支持建立在高性能、分布式图像数据库上的 Hadoop和Spark生态系统。ThingSpan作为一种YARN应用,在分布式文件系统中能够本地运行,同时运用Spark来转换工作流和数据。它还支持基于Kafka、Flume和其他分布式通讯工具的流系统。通过DataFrames,ThingSpan与Spark联合,ThingSpan能吸收流数据,同时还能维持一级逻辑模型关联。
这个模型兼容强化和变形的数据,以简化网络安全应用并支持分析相关联的复合多维度查询。通过这种以关联为导向的方式,获取快速流数据及静态历史交易数据组成的信息融合,ThingSpan为打击网络犯罪提供最佳的智力支持。现在公司企业可以从高效大规模实时流数据和大数据中,深度分析商业行情,由此也能防止未来可能发生的安全漏洞。
英文原文
How Discovering Data Relationships Can Fight Cybercrime
The Business Problem
Every day that cyber threats go undetected results in the potential for more data theft, creating increased long-term repercussions to businesses. In order to minimize the damage from cybercrime, organizations need the ability to quickly identify abnormal activity on their networks so that they can quickly isolate the problem and react accordingly. They need the ability to access historical data and analyze it to uncover patterns, so that they will be able to more quickly discern when unusual activity is occurring. This involves advanced relationship and pattern discovery processes.
While “known” threats can often be identified by common anti-virus software, firewalls, and event management tools, “unknown” threats take new forms, and may not be immediately spotted based on common queries.
The Technical Challenge
Organizations today make use of historical data and logs to recognize patterns and connections within their data, analyzing archival data alongside streaming data to quickly ascertain discrepancies and potential threats. This typically involves analyzing the following:
Signatures of past breaches to identify known instances of cyber threats.
Multiple data points (e.g. time of activity, frequency of activity, location) and how they relate to historical norms for both the individual user and past trends.
Relationships between anomalies in social networks associated with key individuals in the organization.
Systems set up for these include a complex data ingest layer–where streaming data is transformed–and a graph-like storage layer where this data and the relationships between various transactions can be persisted, then rapidly and continuously queried.
This can often be a challenge, as the volumes of data that must be consumed and analyzed continue to increase. The creation of an ingest layer that can consume, transform and store streaming data while creating and maintaining information about the relationships between transactions becomes very complicated. At many times, it becomes a stumbling block.
Breaches on the Rise
· 783: The number of reported breaches at U.S. organizations in 2014, a record high
· $2 trillion: The global cost of breaches expected by 2019 1
Cautionary Case Study: Ashley Madison
· 32 million: Users of the online dating service that were hacked in 2015
· 15,000: U.S. government workers exposed, implicating national security
· $760 million: Damages claimed in a class action lawsuit 2
The Need for Real-Time Response
average time taken to discover data breaches:
· 98 days for financial firms
· 7 months for retailers 3
In addition, most of the graph stores available today are not designed to scale to multi-billion nodes and edges while supporting billions of transactions that need to be analyzed and queried per day: this is the level of performance and scalability needed to identify emerging threats quickly enough to stop them before significant damage is done. As a result, most organizations rely on solutions based on a custom-built ingest layer feeding into a graph database to maintain relationships, which neither scale nor support required query times. The resulting solutions suffer from limitations that include:
Complex ingest path solutions, which are expensive and difficult to maintain.
Inefficient and expensive hacks like pre-computing sub-graphs for acceptable query times, and therefore limited functionality.
The ThingSpan? Solution
ThingSpan?, Objectivity’s Fast Data solution platform, is integrated with Hadoop and Spark to give organizations the capability to build a fully supportable cybersecurity solution. It does this by enabling organizations to ingest, transform and consume massive and varied data streams to create and persist complex, scalable graph structures. These structures can operate at petabyte scale and efficiently support complex, continuous queries.
ThingSpan leverages open-source tools by supporting the Hadoop and Spark ecosystem atop a high-performance, distributed graph database purpose-built for relationship and pattern discovery. It runs natively on top of HDFS as a YARN application while using Spark for workflow and data transformation. It is also designed to support streaming systems based on Kafka, Flume and other distributed messaging tools for streaming data. Integration with Spark via Data Frames allows ThingSpan to ingest streaming data while maintaining and persisting relationships as first-class logical models.
This model allows for enriched and transformed data to simplify the support of complex, multi-dimensional queries associated with cybersecurity applications and analytics. With its relationship-oriented approach to information fusion involving fast, streaming data and static, historical and transactional data, ThingSpan delivers optimal intelligence to fight cybercrime. Now organizations can achieve business insights from Big Data and real-time streaming data with a high degree of efficiency at scale, thereby preventing future security breaches.
注:本文摘自自媒体—灯塔大数据,转载请注明来源