数据库领域的技术
分布式数据库
定义
distributed data base
它被定义为分布在计算机网络上的一组逻辑连接的数据库。分布式数据库是数据库技术和网络技术相结合的产物,已经形成了数据库领域的一个分支。
[En]
It is defined as a group of logically connected databases distributed on the computer network. Distributed database is the product of the combination of database technology and network technology, which has formed a branch in the field of database.
分布式数据库的研究始于20世纪70年代末
分布式数据库的模式图
; DDBS的基本特点
- 实物分配:数据不是存储在一个站点上,而是存储在计算机网络上的多个站点上。
[En]
physical distribution: data is not stored on one site, but on multiple sites on the computer network.*
- 逻辑整体性:数据物理分布在各个场地,但逻辑上是一个整体,它们被所有用户(全局用户)共享,并由一个DBMS统一管理。
- 场地自治性:各场地上的数据由本地的DBMS管理,具有自治处理能力,完成本场地的应用(局部应用)。
- 场馆之间的协作:场馆虽然高度自治,但相互配合,形成一个整体。
[En]
collaboration between venues: although the venues are highly autonomous, they cooperate with each other to form a whole.*
数据挖掘技术
数据挖掘技术是指从大量的、不完整、含噪声、模糊和随机的数据中提取隐藏的、未知的但潜在有用的信息和知识。
[En]
Data mining technology refers to the extraction of hidden, unknown but potentially useful information and knowledge from a large number of, incomplete, noisy, fuzzy and random data.
应用场景
在大数据时代,数据挖掘作为最常用的数据分析手段,得到了各个领域的认可。目前,国内外学者主要研究数据挖掘技术在分类、优化、识别、预测等领域的应用。
[En]
In the era of big data, data mining, as the most commonly used means of data analysis, has been recognized in various fields. at present, domestic and foreign scholars mainly study the application of data mining technologies in many fields, such as classification, optimization, identification, prediction and so on.
分类
随着时代的进步和科学技术的快速发展,作为人口大国,中国在医疗健康和老龄化社会方面的公开数据呈几何级数增长。基于大数据的数据挖掘所附带的价值问题亟待解决。健康医疗数据的结构、规模、范围和复杂性都在不断扩大,传统的计算方法已经不能完全满足医疗数据分析的需要。数据挖掘技术可以根据医疗数据的一些特征对健康医疗数据进行分类:模式多态、缺失信息(个人隐私问题导致的缺失值)、时序性和冗余性。从而为医生或患者提供准确的辅助决策。
[En]
With the progress of the times and the rapid development of science and technology, as a country with a large population, China's public data in health care and aging society are growing geometrically. The value problem attached to the mining data based on big data needs to be solved urgently. The structure, scale, scope and complexity of health medical data are constantly expanding, and the traditional calculation methods can not fully satisfy the analysis of medical data. Data mining technology can classify health medical data according to some characteristics of medical data: pattern polymorphism, missing information (missing value caused by personal privacy issues), timing and redundancy. Thus, it can provide accurate assistant decision-making for doctors or patients.
与此同时,中国正在加速进入老龄化社会,互联网是完善老龄化社会的重要媒介,大数据是评价老龄化社会的重要技术手段。曲方等人提出了实现“互联网+大数据”模式的养老之路。整个养老服务体系基于多元异质信息聚合和数据融合挖掘,融合了包括通信技术、数据挖掘技术和人工智能技术在内的多种信息通信技术,形成了以互联网+大数据为核心的养老服务体系。
[En]
At the same time, China is accelerating to enter an aging society, and the Internet is an important medium to improve the aging society, and big data is an important technical means to evaluate the aging society. Qu Fang and others put forward the way to realize the pension of "Internet + big data" mode. the whole pension service system is based on multivariate heterogeneous information aggregation and data fusion mining, and the pension system of "Internet + and big data" integrates a variety of information and communication technologies, including communication technology, data mining technology and artificial intelligence technology.
优化
道路的交通状况与人们的出行关系密切,随着城市的快速发展、生活水平的改善,机动车的规模也逐渐扩大,带来了交通拥堵等问题。数据挖掘技术可以有效解决交通道路和物流网络之间的优化问题,Pan等提出了一种数据挖掘预测模型,该模型用于"实时预测"短期的交通状况,给陷入交通拥堵的驾驶人员带来极大的帮助。
随着科学技术的发展,网购变得越来越流行,这带来了物流和交通拥堵、瘫痪等问题。作为中国最大的网络交易平台之一的京东,在人工智能优化时代,使用无人机检测路况反馈数据,并使用数据挖掘技术准确计算物流网络运输所需的参数。它可以轻松高效地缓解物流运输瘫痪的问题,导致中国第一个机器人快递员将第一批货物送到中国人民大学。随着未来交通网络长度和复杂性的增加,实现自动驾驶策略的难度大大增加,只有通过数据挖掘技术才能快速计算出结果。以获得从复杂的道路信息中产生的有效值。
[En]
With the development of science and technology, online shopping is becoming more and more popular, which brings problems such as logistics and transportation congestion and paralysis. JD.com, one of the largest online trading platforms in China, in the era of artificial intelligence optimization, uses drones to detect road status feedback data, and uses data mining technology to accurately calculate the parameters needed for logistics network transportation. It can easily and efficiently alleviate the problem of logistics transport paralysis, resulting in China's first robot courier to deliver the first goods to Renmin University of China. With the increase of the length and complexity of the traffic network in the future, the difficulty of realizing the automatic strategy of self-driving increases greatly, and the result can be calculated quickly only through data mining technology. in order to obtain the efficient value generated from the complex road information.
识别
自从20世纪50年代数字图像出现以来,数字图像成为人类社会中必不可少的"数据"。在计算机应用中,数据挖掘在图像识别的应用越来越普遍,有代表性应用为人脸识别和指纹识别。人脸识别通过对获得的信息库进行数据挖掘,进一步分析和处理可靠的、潜在的数据,充分准备资料的分析工作和未来的开发工作。Wright等阐述了基于稀疏表示的鲁棒人脸识别,并给出了详细的理论分析与实践总结。
针对目前电子报税系统用户名和密码不安全的问题,沙亚庆提出了一种基于智能卡和指纹识别的身份认证方案,并结合指纹技术构造新的密码参数。结果,安全性得到了明显的提高。随着数据挖掘技术的不断发展,大数据对人脸和指纹的识别将越来越精准。
[En]
In view of the insecurity of user name and password in the current electronic tax declaration system, Sha Yaqing proposed an identity authentication scheme based on smart card and fingerprint identification, and combined with fingerprint technology to construct new password parameters. as a result, the security is improved obviously. With the continuous development of data mining technology, big data will be more and more accurate in recognizing people's faces and fingerprints.
预测
预测问题是各个领域研究最多的问题,其目的是通过历史数据来预测未来的数据值或发展趋势。历史数据多为时间序列数据,即按时间顺序排列,得到一系列观测数据。由于信息技术的不断进步,时间序列的数据也日益增多,如天气预报、石油勘探、金融等。时间序列数据挖掘的最终目的是通过分析时间序列的历史数据来预测未来的变化趋势及其影响。
[En]
Prediction problem is the most studied problem in various fields, and its purpose is to predict the future data value or development trend through historical data. Most of the historical data are time series data, that is, they are arranged in chronological order, and a series of observations are obtained. Due to the continuous progress of information technology, the data of time series are also increasing day by day, such as weather forecast, oil exploration, finance and so on. The ultimate goal of time series data mining is to predict the change trend and its impact in the future by analyzing the historical data of time series.
大数据技术
什么是大数据
大数据(big data),或称巨量资料,指的是所涉及的资料量规模巨大到无法透过主流软件工具,在合理时间内达到撷取、管理、处理、并整理成为帮助企业经营决策更积极目的的资讯。
什么是大数据技术
数据集的规模如此之大,以至于在获取、存储、管理和分析方面大大超过了传统数据库软件工具的能力。它具有数据规模大、数据流动快、数据类型多样、价值密度低等四个特点。
[En]
A data set whose scale is so large that it greatly exceeds the capabilities of traditional database software tools in terms of acquisition, storage, management and analysis. it has four characteristics: massive data scale, rapid data flow, diverse data types and low value density.
特点
- 容量(Volume):数据的大小决定所考虑的数据的价值和潜在的信息;
- 种类(Variety):数据类型的多样性;
- 速度(Velocity):指获得数据的速度;
- 可变性(Variability):妨碍了处理和有效地管理数据的过程
- 真实性(Veracity):数据的质量。
- 复杂性(Complexity):数据量巨大,来源多渠道。
- 价值(value):合理运用大数据,以低成本创造高价值。
典型的应用场景
- 洛杉矶警察局和加利福尼亚大学合作利用大数据预测犯罪的发生。
- Google流感趋势(Google Flu Trends)利用搜索关键词预测禽流感的散布。
- 统计学家内特·西尔弗(Nate Silver)利用大数据预测2012美国选举结果。
- 麻省理工学院利用手机定位数据和交通数据建立城市规划。
- 梅西百货的实时定价机制。根据需求和库存的情况,该公司基于SAS的系统对多达7300万种货品进行实时调价。 [8]
999)利用手机定位数据和交通数据建立城市规划。 - 梅西百货的实时定价机制。根据需求和库存的情况,该公司基于SAS的系统对多达7300万种货品进行实时调价。 [8]
- 医疗行业早就遇到了海量数据和非结构化数据的挑战,而近年来很多国家都在积极推进医疗信息化发展,这使得很多医疗机构有资金来做大数据分析。 [9]
Original: https://blog.csdn.net/liujingqi697/article/details/123362614
Author: 江琦697
Title: 数据库领域的技术

tensorflow 2.6版本

k-Means——经典聚类算法实验(Matlab实现)

python调用百度语音api_python通过调用百度api实现语音识别(超详细)

Tensorflow 2.x(keras)源码详解之第十五章:迁移学习与微调

WARNING:tensorflow: is deprecated and will be removed in a future version的解决方案

keras、tensorflow安装详解-神经网络第一个脚本

手把手教你React Native接入聊天IM即时通讯功能-源码分享

使用kmeans聚类对银行支付渠道客户分群

YOLOv5训练自己的数据集

基于python3.8版本的tensorflow,keras 和pytrorch GPU版本简易安装

Windows/Linux安装TensorFlow并实现多分类任务

【KAWAKO】DTLN-1Dconv的原理

树莓派——槑槑智能音箱

Vue.JS React 精彩文章汇总
