KEMBAR78
Insert PDF | PDF | Data Management Software | Data
0% found this document useful (0 votes)
77 views7 pages

Insert PDF

This document provides an overview of how inserting a new tuple works in an in-memory column-oriented database like SanssouciDB. There are three possible scenarios when inserting: 1) inserting without adding a new entry to the column dictionary, 2) inserting and adding to the dictionary without resorting it, and 3) inserting and adding to the dictionary while resorting it. The document then gives a detailed example of inserting the name "Karen Schulze" into the "world_population" table to demonstrate inserting without adding a new dictionary entry.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views7 pages

Insert PDF

This document provides an overview of how inserting a new tuple works in an in-memory column-oriented database like SanssouciDB. There are three possible scenarios when inserting: 1) inserting without adding a new entry to the column dictionary, 2) inserting and adding to the dictionary without resorting it, and 3) inserting and adding to the dictionary while resorting it. The document then gives a detailed example of inserting the name "Karen Schulze" into the "world_population" table to demonstrate inserting without adding a new dictionary entry.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Prof.

Hasso Plattner

A Course in
In-Memory Data Management
The Inner Mechanics
of In-Memory Databases

August 31, 2013

This learning material is part of the reading material for Prof.


Plattner’s online lecture "In-Memory Data Management" taking place at
www.openHPI.de. If you have any questions or remarks regarding the
online lecture or the reading material, please give us a note at openhpi-
imdb@hpi.uni-potsdam.de. We are glad to further improve the material.
Chapter 11
Insert

This chapter outlines what happens when inserting a new tuple into a ta-
ble (execution of an insert statement). Compared to a row-based database,
the insert in a column store is a bit more complicated. For a row-oriented
database, the new tuple is simply appended to the end of the table, i.e., the
tuple is stored as one piece. SanssouciDB uses column-orientation to store
the data physically. A detailed description of the di↵erences between row
store and column store is given in Chapter 8. So, adding a new tuple to the
database means to add a new entry to every column that the table comprises
of. Internally, every column consists of a dictionary and an attribute vector
(see Chapter 6). Adding a new entry to a column means to check the dictio-
nary and adding a new value if necessary. Afterwards, the respective value
of the dictionary entry is added to the attribute vector of the column. Since
the dictionary is sorted, adding a new entry to a column results in three
di↵erent scenarios:
1. Adding without a new dictionary entry
2. Adding with a new dictionary entry, without resorting the dictionary
3. Adding with a new dictionary entry, with resorting the dictionary
In this chapter, we will give a step by step explanation of the three di↵erent
scenarios.

11.1 Example

In this example, we insert the data of a new person into the world_population
table (see Figure 11.1) that we used before. The example outlines what
happens for the column lname, representing the last name of a person, and
fname, representing the first name of a person.

73
74
Example 11 Insert

Example"Table:"world_popula&on"
recID& fname& lname& gender& country& city& birthday&
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
..." ..." ..." ..." ..." ..." ..."

INSERT"INTO"world_popula&on"
VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
2"

Fig. 11.1: Example Database Table Named world_population

11.1.1 Inserting without New Dictionary Entry

To demonstrate a scenario were we have an insert without a new entry to the


dictionary, we will look at the insert of the last name attribute to the lname
INSERT Without
column of our world_population table. Attribute vector and dictionary of the
lname column are initially filled as displayed in Figure 11.2.
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 0" 0" Albrecht"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 1" 1" Berg"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 3" 2" Meyer"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 2" 3" Schulze"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 3" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
..." ..." ..." ..." ..." ..." ..."

A]ribute"Vector"(AV)"
Dic&onary"(D)"

Fig. 11.2: Initial Status of the lname Column

To add the string Schulze to the column, we need to look up whether it


already exists in the dictionary. Since there is another person named Sophie
3"
Schulze (recordID four of the world_population table) in the database, the
dictionary for the lname column already contains an entry with the string
Schulze. As one can see from Figure 11.3, the dictionary position of Schulze
is "3".
Since Schulze is on position 3 of the dictionary, we append 3 to the end of
the attribute vector (see Figure 11.4).
INSERT Without
11.1 Example New Dictionary Entry 75

INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 0" 0" Albrecht"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 1" 1" Berg"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 3" 2" Meyer"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 2" 3" Schulze"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 3" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
..." ..." ..." ..." ..." ..." ..."

A]ribute"Vector"(AV)"

INSERT Without Dic&onary"(D)"

New Dictionary Entry


Fig. 11.3: Position of the String Schulze in the Dictionary of the lname Column

INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 0" 0" Albrecht"
4"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 1" 1" Berg"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 3" 2" Meyer"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 2" 3" Schulze"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 3" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" 3" 5" Schulze"
..." ..." ..." ..." ..." ..." ..."

A]ribute"Vector"(AV)"
Dic&onary"(D)"

Fig. 11.4: Appending valueID of Schulze to the End of the Attribute Vector

11.1.2 Inserting with New Dictionary Entry


5"

When inserting the first name, the first name dictionary is scanned for the
INSERT With
string Karen. As shown in Figure 11.5, this name is not present in the dictio-
nary, yet.
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 2" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 3" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" Mar&n"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" Michael"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 4" 4" Sophie" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" Schulze"
..." ..." ..." ..." ..." ..." ..."

A]ribute"Vector"(AV)"
Dic&onary"(D)"

Fig. 11.5: Dictionary for First Name Column

11"
76 11 Insert

INSERT With
Therefore, the name is appended to the end of the first name dictionary
(see Figure 11.6).
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 2" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 3" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" Mar&n"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" Michael"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 4" 4" Sophie" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" Karen" 5" Schulze"
..." ..." ..." ..." ..." ..." ..."

A]ribute"Vector"(AV)"
Dic&onary"(D)"

Fig. 11.6: Addition of Karen to fname Dictionary

As outlined in Chapter 6, the dictionary needs to be kept sorted. After


appending
12"
Karen to the end of the dictionary, the dictionary needs to be
resorted. Therefore, as shown in Figure 11.7, a new dictionary is created
with a sorted order. In the new dictionary most of the values have been
INSERT With
moved to a new position. For instance, the valueID for Michael changed
from 3 to 4.
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"(old)" D"(new)"
fname& lname& gender& country& city& birthday&
0" 2" 0" Anton" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 3" 1" Hanna" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" Mar&n" 2" Karen"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" Michael" 3" Mar&n" 3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 4" 4" Sophie" 4" Michael" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" Karen" 5" Sophie" 5" Schulze"
..." ..." ..." ..." ..." ..." ..."

A]ribute"Vector"(AV)"
Dic&onary"(D)"

Fig. 11.7: Resorting the fname Dictionary

Based on the changed valueIDs of the new first name dictionary, all val-
ueIDs
13" of the first name attribute vector need to be updated as well. Fig-
ure 11.8 shows the changes to the attribute vector. For instance at position 1,
the valueID for Michael is changed from 3 to 4.
In case the newly added dictionary value is inserted at the end based
on the sorting order of the dictionary, those two steps are omitted. The
dictionary does not need to be resorted and therefore the attribute vector
does not need to be rebuild.
INSERT With
New Dictionary Entry
11.2 Performance Considerations 77

INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV"(old)" AV"(new)" D"(new)"
fname& lname& gender& country& city& birthday&
0" 2" 0" 3" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 3" 1" 4" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" 1" 2" Karen"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" 0" 3" Mar&n" 3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 4" 4" 5" 4" Michael" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" Sophie" 5" Schulze"
..." ..." ..." ..." ..." ..." ..."

A]ribute"Vector"(AV)"
Dic&onary"(D)"

Fig. 11.8: Rebuilding the fname Attribute Vector

INSERT With
Finally the valueID 2, representing the dictionary position of the string
Karen,
14" is appended to the attribute vector (see Figure 11.9).
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 3" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 4" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" Karen"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" Mar&n"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 5" 4" Michael" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" 2" 5" Sophie" 5" Karen" Schulze"
..." ..." ..." ..." ..." ..." ..."

A]ribute"Vector"(AV)"
Dic&onary"(D)"

Fig. 11.9: Appending the valueID representing Karen to the Attribute Vector

15"

11.2 Performance Considerations

When thinking of the world_population example, there are about 8 billion peo-
ple and 5 million unique first names. Every new entry to the dictionary may
cause an overhead regarding resorting of the dictionary and reorganization
of the respective attribute vector. Triggering resorting and reorganization
at every single insert would lead to a performance penalty, which compro-
mises the overall performance of the system. Therefore, an additional insert
layer needs to be added, the di↵erential bu↵er. Chapter 25 explains in detail
how write performance is kept at a high level using periodic merges of the
di↵erential bu↵er and the main store.
78 11 Insert

The vulnerability of a column to reorganization heavily depends on the


column cardinality (the number of distinct values in a dictionary). When
the dictionary only has a few entries, it is most likely that a column needs
to be reorganized with a new insert. However, especially with attributes of
low column cardinality, e.g., gender or country, the likelihood of reorganiza-
tion decreases over time, since most of the possible values for the respective
column have been inserted into the dictionary already. In real world applica-
tions, the dictionary only changes occasionally after it has reached a certain
size. The additional steps necessary for new unique dictionary entries will
occur less frequent and therefore expensive reorganization becomes less fre-
quent.

You might also like