We’re dealing with much more data. Although advances in storage capacity and CPU speed have allowed the databases to keep pace, we’re in a new era where size itself is an important part of the problem, and any significant database needs to be distributed.
由于我们需要处理的数据集越来越大,其存储量已经远远超过了单机的容量,数据处理的需求也远远超过了单机CPU的运算能力。所以我们需要分布式的解决方案。
We require sub-second responses to queries. In the ’80s, most database queries could run overnight as batch jobs. That’s no longer acceptable. While some analytic functions can still run as overnight batch jobs, we’ve seen the web evolve from static files to complex database-backed sites, and that requires sub-second response times for most queries.
我们对数据提供速度的要求越来越高,在80年代,可能很多运算都需要跑一整晚。但是这种事情放在现在就变得不可接受了。对于复杂的统计分析我们可以忍受,但是对于网站应用来说,快速响应是必须的。
We want applications to be up 24/7. Setting up redundant servers for static HTML files is easy, but a database replication in a complex database-backed application is another.
我们需要提供7×24的服务。如果你的网站只有一个静态页,那估计问题不大,只需要做好WebServer的容错性就行了。而如果你是一个背后有数据库,有缓存的动态网站,那你就必须做好数据层的容错及自动故障迁移。
We’re seeing many applications in which the database has to soak up data as fast (or even much faster) than it processes queries: in a logging application, or a distributed sensor application, writes can be much more frequent than reads. Batch-oriented ETL (extract, transform, and load) hasn’t disappeared, and won’t, but capturing high-speed data flows is increasingly important.
很多应用场景需要数据层提供更高的写性能和数据吞吐。比如日志型应用,对写性能的要求可能非常高,当写性能成为瓶颈时,通常我们很难难过升级单机配置来解决。所以分布式的需求在这里变得也很重要。
We’re frequently dealing with changing data or with unstructured data. The data we collect, and how we use it, grows over time in unpredictable ways. Unstructured data isn’t a particularly new feature of the data landscape, since unstructured data has always existed, but we’re increasingly unwilling to force a structure on data a priority.
我们对非结构化数据的存储和处理需求日增,在这个变化的世界,互联网领域的应用可能越来越难像软件开发一样,去预先写义各种数据结构。
We’re willing to sacrifice our sacred cows. We know that consistency and isolation and other properties are very valuable, of course. But so are some other things, like latency and availability and not losing data even if our primary server goes down. The challenges of modern applications make us realize that sometimes we might need to weaken one of these constraints in order to achieve another.
我们的应用场景对一致性,隔离性以及其它一些事务特性的需求可能越来越低,相反的,我们对性能,对扩展性的需求可能越来越高。于是在新的需求下,我们必须做出抉择,放弃一些我们习惯了的优秀功能,去获取一些我们需要的新的特性。