KEMBAR78
VLDB Management Challenges | PDF | Databases | Computing
0% found this document useful (0 votes)
123 views6 pages

VLDB Management Challenges

A very large database (VLDB) is defined as a database that contains such a large amount of data that it requires specialized techniques for management and maintenance. There is no set definition for the data size threshold for a VLDB, but challenges generally start to appear around 1 TB of data. Key challenges of VLDBs include configuration, administration, availability during maintenance, backup/recovery times, and decreased performance. Partitioning, clustering, and other techniques can help address some of these challenges.

Uploaded by

john949
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views6 pages

VLDB Management Challenges

A very large database (VLDB) is defined as a database that contains such a large amount of data that it requires specialized techniques for management and maintenance. There is no set definition for the data size threshold for a VLDB, but challenges generally start to appear around 1 TB of data. Key challenges of VLDBs include configuration, administration, availability during maintenance, backup/recovery times, and decreased performance. Partitioning, clustering, and other techniques can help address some of these challenges.

Uploaded by

john949
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Very large database

A very large database, (originally written very large data base) or VLDB,[1] is a database that contains a
very large amount of data, so much that it can require specialized architectural, management, processing
and maintenance methodologies.[2][3][4][5]

Definition
The vague adjectives of very and large allow for a broad and subjective interpretation, but attempts at
defining a metric and threshold have been made. Early metrics were the size of the database in a canonical
form via database normalization or the time for a full database operation like a backup. Technology
improvements have continually changed what is considered very large.[6][7]

One definition has suggested that a database has become a VLDB when it is "too large to be maintained
within the window of opportunity… the time when the database is quiet".[8]

Sizes of a VLDB database


There is no absolute amount of data that can be cited. For example, one cannot say that any database with
more than 1  TB of data is considered a VLDB. This absolute amount of data has varied over time as
computer processing, storage and backup methods have become better able to handle larger amounts of
data.[5] That said, VLDB issues may start to appear when 1  TB is approached,[8][9] and are more than
likely to have appeared as 30 TB or so is exceeded.[10]

VLDB challenges
Key areas where a VLDB may present challenges include configuration, storage, performance,
maintenance, administration, availability and server resources.[11]: 1 1 

Configuration

Careful configuration of databases that lie in the VLDB realm is necessary to alleviate or reduce issues
raised by VLDB databases.[11]: 3 6–53 [12]

Administration

The complexities of managing a VLDB can increase exponentially for the database administrator as
database size increases.[13]

Availability and maintenance


When dealing with VLDB operations relating to maintenance and recovery such as database
reorganizations and file copies which were quite practical on a non-VLDB take very significant amounts of
time and resources for a VLDB database.[14] In particular it typically infeasible to meet a typical recovery
time objective (RTO), the maximum expected time a database is expected to be unavailable due to
interruption, by methods which involve copying files from disk or other storage archives.[13] To overcome
these issues techniques such as clustering, cloned/replicated/standby databases, file-snapshots, storage
snapshots or a backup manager may help achieve the RTO and availability, although individual methods
may have limitations, caveats, license, and infrastructure requirements while some may risk data loss and
not meet the recovery point objective (RPO).[15][16][13][17][18] For many systems only geographically
remote solutions may be acceptable.[19]

Backup and recovery

Best practice is for backup and recovery to be architectured in terms of the overall availability and business
continuity solution.[20][21]

Performance

Given the same infrastructure there may typically be a decrease in performance, that is increase in response
time as database size increases. Some accesses will simply have more data to process (scan) which will take
proportionally longer (linear time); while the indexes used to access data may grow slightly in height
requiring perhaps an extra storage access to reach the data (sub-linear time).[22] Other effects can be
caching becoming less efficient because proportionally less data can be cached and while some indexes
such as the B+ automatically sustain well with growth others such as a hash table may need to be rebuilt.

Should an increase in database size cause the number of accessors of the database to increase then more
server and network resources may be consumed, and the risk of contention will increase. Some solutions to
regaining performance include partitioning, clustering, possibly with sharding, or use of a database
machine.[23]: 3 90 [24]

Partitioning

Partitioning may be able assist the performance of bulk operations on a VLDB including backup and
recovery.,[25] bulk movements due to information lifecycle management (ILM),[26]: 3  [27]: 1 05–118  reducing
contention[27]: 3 27–329  as well as allowing optimization of some query processing.[27]: 2 15–230 

Storage

In order to satisfy needs of a VLDB the database storage needs to have low access latency and contention,
high throughput, and high availability.

Server resources
The increasing size of a VLDB may put pressure on server and network resources and a bottleneck may
appear that may require infrastructure investment to resolve.[13][28]

Relationship to big data


VLDB is not the same as big data, however the storage aspect of big data may involve a VLDB
database.[2] That said some of the storage solutions supporting big data were designed from the start to
support large volumes of data, so database administrators may not encounter VLDB issues that older
versions of traditional RDBMS's might encounter.[29]

See also
XLDB

References
1. "Oracle Database Online Documentation 11g Release 1 (11.1) / Database Administration
Database Concepts" (https://docs.oracle.com/cd/B28359_01/server.111/b28318/partconc.ht
m#CNCPT011). oracle. 18 Very Large Databases (VLDB). Retrieved 3 October 2018.
2. "Very Large Database (VLDB)" (https://www.techopedia.com/definition/14731/very-large-dat
abase-vldb). Technopedia. Archived (https://web.archive.org/web/20180704224849/https://w
ww.techopedia.com/definition/14731/very-large-database-vldb) from the original on 4 July
2018. Retrieved 3 October 2018.
3. Gaines, R. S. and R. Gammill. Very Large Data Bases: An Emerging Research Area,
Informal working paper, RAND Corporation
4. Data Processing Magazine (https://books.google.com/books?id=3JYgAAAAMAAJ). North
American Publishing Company. 1964. p. 18,58.
5. Widlake, Marin (18 September 2009). "What is a VLDB?" (https://mwidlake.wordpress.com/2
009/09/18/what-is-a-vldb/). mwidlake. Archived (https://web.archive.org/web/201810061147
29/https://mwidlake.wordpress.com/2009/09/18/what-is-a-vldb/) from the original on 6
October 2018. Retrieved 7 October 2018.
6. Sidley, Edgar H. (1 April 1980). Encyclopedia of Computer Science and Technology:
Volume 14 - Very Large Data Base Systems to Zero-Memory and Markov Information Source
(https://books.google.com/books?id=KUgNGCJB4agC). CRC Press. pp. 1–18.
ISBN 9780824722142.
7. Gerritsen, Rob; Morgan, Howard; Zisman, Michael (June 1977). "On some metrics for
databases or what is a very large database?" (https://doi.org/10.1145%2F984382.984393).
ACM SIGMOD Record. 9 (1): 50–74. doi:10.1145/984382.984393 (https://doi.org/10.1145%2
F984382.984393). ISSN 0163-5808 (https://www.worldcat.org/issn/0163-5808).
S2CID 6359244 (https://api.semanticscholar.org/CorpusID:6359244).
8. Rankins, Ray; Jensen, Paul; Bertucci, Paul (18 December 2002). "21" (https://archive.org/de
tails/microsoftsqlserv00rayr). Microsoft SQL Server 2000 (https://archive.org/details/microsoft
sqlserv00rayr) (2nd ed.). SAMS. ISBN 978-0672324673. Administering Very Large SQL
Server Databases.
9. "Oracle Database Release 18 - VLDB and Partitioning Guide" (https://docs.oracle.com/en/d
atabase/oracle/oracle-database/18/vldbg/partition-intro.html). Oracle. 1 Introduction to Very
Large Databases. Archived (https://web.archive.org/web/20181003205734/https://docs.oracl
e.com/en/database/oracle/oracle-database/18/vldbg/partition-intro.html) from the original on
3 October 2018. Retrieved 3 October 2018.
10. "The Very Large Database Problem - How to Backup & Recover 30–100 TB Databases" (htt
p://cdn2.hubspot.net/hubfs/214442/Actifio_For_Very_Large_Databases_White_Paper.pdf)
(PDF). actifio. Archived (https://web.archive.org/web/20180219182335/http://cdn2.hubspot.n
et/hubfs/214442/Actifio_For_Very_Large_Databases_White_Paper.pdf) (PDF) from the
original on 19 February 2018.
11. Hussain, Syed Jaffer (2014). "Tuning & Applying Best Practices On Very Large Databases
(VLDB)" (http://www.aioug.org/sangam14/images/Sangam14/Presentations/201461_Hussai
n_ppt.pdf) (PDF). Sangam: AIOUG. Archived (https://web.archive.org/web/20181004205048/
http://www.aioug.org/sangam14/images/Sangam14/Presentations/201461_Hussain_ppt.pdf)
(PDF) from the original on 4 October 2018.
12. Chaves, Warner (7 January 2015). "Top 10 Must-Do Items for your SQL Server Very Large
Database" (http://sqlturbo.com/top-10-must-do-items-for-your-sql-server-very-large-databas
e/). SQLTURBO. Archived (https://web.archive.org/web/20171213085742/http://sqlturbo.co
m/top-10-must-do-items-for-your-sql-server-very-large-database/) from the original on 13
December 2017. Retrieved 5 October 2018.
13. Furman, Dimitri (22 January 2018). Rajesh Setlem; Mike Weiner; Xiaochen Wu (eds.). "SQL
Server VLDB in Azure: DBA Tasks Made Simple" (https://blogs.msdn.microsoft.com/sqlcat/2
018/01/22/sql-server-vldb-in-azure-dba-tasks-made-simple/). MSDN. Archived (https://web.a
rchive.org/web/20181006072244/https://blogs.msdn.microsoft.com/sqlcat/2018/01/22/sql-ser
ver-vldb-in-azure-dba-tasks-made-simple/) from the original on 6 October 2018. Retrieved
6 October 2018.
14. "Specialized Requirements for Relational Data Warehouse Servers" (https://web.archive.or
g/web/19971010114605/http://www.redbrick.com/rbs-g/whitepapers/tenreq_wp.html). Red
Brick Systems, Inc. 21 June 1996. Archived from the original (http://www.redbrick.com/rbs-g/
whitepapers/tenreq_wp.html) on 10 October 1997.
15. "Cluster design considerations" (https://developer.couchbase.com/documentation/server/3.x/
admin/Concepts/bp-clusterDesign.html). Crouchbase. Archived (https://web.archive.org/web/
20181017195247/https://developer.couchbase.com/documentation/server/3.x/admin/Conce
pts/bp-clusterDesign.html) from the original on 17 October 2018. Retrieved 17 October 2017.
16. "Cross Datacenter Replication (XDCR)" (https://developer.couchbase.com/documentation/s
erver/3.x/admin/XDCR/xdcr-intro.html). Crouchbase. Archived (https://web.archive.org/web/2
0181017195516/https://developer.couchbase.com/documentation/server/3.x/admin/XDCR/x
dcr-intro.html) from the original on 17 October 2018. Retrieved 17 October 2017.
17. Chien, Tim. "Snapshots Are NOT Backups" (https://www.oracle.com/technetwork/database/a
vailability/rman-fra-snapshot-322251.html). Oracle technetwork. Archived (https://web.archiv
e.org/web/20180907091910/https://www.oracle.com/technetwork/database/availability/rman
-fra-snapshot-322251.html) from the original on 7 September 2018. Retrieved 10 October
2018.
18. "Using a split mirror as a backup image" (https://www.ibm.com/support/knowledgecenter/en/
SSEPGG_9.5.0/com.ibm.db2.luw.admin.ha.doc/doc/t0006423.html). IBM Knowledge
Center. Archived (https://archive.today/20180109160158/https://www.ibm.com/support/knowl
edgecenter/en/SSEPGG_9.5.0/com.ibm.db2.luw.admin.ha.doc/doc/t0006423.html) from the
original on 9 January 2018. Retrieved 10 October 2018.
19. "Chapter 1 High Availability and Scalability" (https://dev.mysql.com/doc/mysql-ha-scalability/
en/ha-overview.html). dev.mysql. Archived (https://web.archive.org/web/20161215030829/htt
ps://dev.mysql.com/doc/mysql-ha-scalability/en/ha-overview.html) from the original on 15
December 2016. Retrieved 12 October 2018.
20. Brooks, Charlotte; Leung, Clem; Mirza, Aslam; Neal, Curtis; Qiu, Yin Lei; Sing, John; Wong,
Francis TH; Wright, Ian R (March 2007). "Chapter 1. Three Business solution segments
defined". IBM System Storage Business Continuity: Part 2 Solutions Guide. IBM Redbooks.
ISBN 978-0738489728.
21. Akhtar, Ali Navid; Buchholtz, Jeff; Ryan, Michael; Setty, Kumar (2012). "Database Backup
and Recovery Best Practices" (https://www.isaca.org/Journal/archives/2012/Volume-1/Page
s/Database-Backup-and-Recovery-Best-Practices.aspx). Archived (https://web.archive.org/w
eb/20180629131442/https://www.isaca.org/Journal/archives/2012/Volume-1/Pages/Databas
e-Backup-and-Recovery-Best-Practices.aspx) from the original on 29 June 2018. Retrieved
12 October 2012.
22. Tariq, Ovais (14 July 2011). "Understanding B+tree Indexes and how they Impact
Performance" (http://www.ovaistariq.net/733/understanding-btree-indexes-and-how-they-imp
act-performance/). ovaistariq.net. Archived (https://web.archive.org/web/20180207203602/htt
p://www.ovaistariq.net/733/understanding-btree-indexes-and-how-they-impact-performance/)
from the original on 7 February 2018. Retrieved 10 October 2018.
23. Shrestha, Raju (2017). High Availability and Performance of Database in the Cloud -
Traditional Master-slave Replication versus Modern Cluster-based Solutions (https://www.re
searchgate.net/publication/317299391). 7th International Conference on Cloud Computing
and Services. Vol. 1: CLOSER. SCITEPRESS – Science and Technology Publications, Lda.
doi:10.5220/0006294604130420 (https://doi.org/10.5220%2F0006294604130420).
ISBN 978-989-758-243-1. Archived (https://web.archive.org/web/20181017152557/https://w
ww.researchgate.net/publication/317299391_High_Availability_and_Performance_of_Data
base_in_the_Cloud_-_Traditional_Master-slave_Replication_versus_Modern_Cluster-base
d_Solutions) from the original on 17 October 2018.
24. "Encyclopedia" (https://www.pcmag.com/encyclopedia/term/40879/database-machine).
Definition of: database machine. Archived (https://web.archive.org/web/20160704205410/htt
p://www.pcmag.com/encyclopedia/term/40879/database-machine) from the original on 4
July 2016. Retrieved 10 October 2018.
25. Burleson, Donald (26 March 2015). "Oracle Backup VLDB tips" (http://www.dba-oracle.com/t
_backup_vldb.htm). Burleson Consulting. Archived (https://web.archive.org/web/201706302
23240/http://www.dba-oracle.com/t_backup_vldb.htm) from the original on 30 June 2017.
Retrieved 11 October 2016.
26. "Oracle Partitioning in Oracle Database 12c Release 2 Extreme Data Management and
Performance for every System" (https://www.oracle.com/technetwork/database/options/partiti
oning/partitioning-wp-12c-1896137.pdf) (PDF). Oracle. March 2017. Archived (https://web.ar
chive.org/web/20171215074909/https://www.oracle.com/technetwork/database/options/partit
ioning/partitioning-wp-12c-1896137.pdf) (PDF) from the original on 15 December 2017.
Retrieved 17 October 2018.
27. Teske, Thomas (8 February 2018). Get the best out of Oracle Partitioning - A practical guide
and reference (https://indico.cern.ch/event/697301/attachments/1598206/2532649/Partitioni
ng_guide_v18.pdf) (PDF) (Speech). Cern. Hermann Bär. 40-S2-C01 - Salle Curie (CERN):
Oracle. Archived (https://web.archive.org/web/20181012172456/https://indico.cern.ch/event/
697301/attachments/1598206/2532649/Partitioning_guide_v18.pdf) (PDF) from the original
on 12 October 2018. Retrieved 12 October 2018.
28. Steel, Phil; Poggemeyer, Liza; Plett, Corey (1 August 2018). "Server Hardware Performance
Considerations" (https://docs.microsoft.com/en-us/windows-server/administration/performan
ce-tuning/hardware/). Microsoft IT Pro Center. Archived (https://web.archive.org/web/201810
17175544/https://docs.microsoft.com/en-us/windows-server/administration/performance-tuni
ng/hardware/) from the original on 17 October 2018. Retrieved 17 October 2018.
29. Li, Yishan; Manoharan, Sathiamoorthy (2013). A performance comparison of SQL and
NoSQL databases. 2013 IEEE Pacific Rim Conference on Communications, Computers and
Signal Processing (PACRIM). IEEE. p. 15. doi:10.1109/PACRIM.2013.6625441 (https://doi.o
rg/10.1109%2FPACRIM.2013.6625441). ISBN 978-1-4799-1501-9.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Very_large_database&oldid=1136354393"

You might also like