Prof.
Hasso Plattner
A Course in
In-Memory Data Management
The Inner Mechanics
of In-Memory Databases
August 31, 2013
This learning material is part of the reading material for Prof.
Plattner’s online lecture "In-Memory Data Management" taking place at
www.openHPI.de. If you have any questions or remarks regarding the
online lecture or the reading material, please give us a note at openhpi-
imdb@hpi.uni-potsdam.de. We are glad to further improve the material.
Chapter 11
Insert
This chapter outlines what happens when inserting a new tuple into a ta-
ble (execution of an insert statement). Compared to a row-based database,
the insert in a column store is a bit more complicated. For a row-oriented
database, the new tuple is simply appended to the end of the table, i.e., the
tuple is stored as one piece. SanssouciDB uses column-orientation to store
the data physically. A detailed description of the di↵erences between row
store and column store is given in Chapter 8. So, adding a new tuple to the
database means to add a new entry to every column that the table comprises
of. Internally, every column consists of a dictionary and an attribute vector
(see Chapter 6). Adding a new entry to a column means to check the dictio-
nary and adding a new value if necessary. Afterwards, the respective value
of the dictionary entry is added to the attribute vector of the column. Since
the dictionary is sorted, adding a new entry to a column results in three
di↵erent scenarios:
1. Adding without a new dictionary entry
2. Adding with a new dictionary entry, without resorting the dictionary
3. Adding with a new dictionary entry, with resorting the dictionary
In this chapter, we will give a step by step explanation of the three di↵erent
scenarios.
11.1 Example
In this example, we insert the data of a new person into the world_population
table (see Figure 11.1) that we used before. The example outlines what
happens for the column lname, representing the last name of a person, and
fname, representing the first name of a person.
73
74
Example 11 Insert
Example"Table:"world_popula&on"
recID& fname& lname& gender& country& city& birthday&
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
..." ..." ..." ..." ..." ..." ..."
INSERT"INTO"world_popula&on"
VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
2"
Fig. 11.1: Example Database Table Named world_population
11.1.1 Inserting without New Dictionary Entry
To demonstrate a scenario were we have an insert without a new entry to the
dictionary, we will look at the insert of the last name attribute to the lname
INSERT Without
column of our world_population table. Attribute vector and dictionary of the
lname column are initially filled as displayed in Figure 11.2.
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 0" 0" Albrecht"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 1" 1" Berg"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 3" 2" Meyer"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 2" 3" Schulze"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 3" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
..." ..." ..." ..." ..." ..." ..."
A]ribute"Vector"(AV)"
Dic&onary"(D)"
Fig. 11.2: Initial Status of the lname Column
To add the string Schulze to the column, we need to look up whether it
already exists in the dictionary. Since there is another person named Sophie
3"
Schulze (recordID four of the world_population table) in the database, the
dictionary for the lname column already contains an entry with the string
Schulze. As one can see from Figure 11.3, the dictionary position of Schulze
is "3".
Since Schulze is on position 3 of the dictionary, we append 3 to the end of
the attribute vector (see Figure 11.4).
INSERT Without
11.1 Example New Dictionary Entry 75
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 0" 0" Albrecht"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 1" 1" Berg"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 3" 2" Meyer"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 2" 3" Schulze"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 3" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
..." ..." ..." ..." ..." ..." ..."
A]ribute"Vector"(AV)"
INSERT Without Dic&onary"(D)"
New Dictionary Entry
Fig. 11.3: Position of the String Schulze in the Dictionary of the lname Column
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 0" 0" Albrecht"
4"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 1" 1" Berg"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 3" 2" Meyer"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 2" 3" Schulze"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 3" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" 3" 5" Schulze"
..." ..." ..." ..." ..." ..." ..."
A]ribute"Vector"(AV)"
Dic&onary"(D)"
Fig. 11.4: Appending valueID of Schulze to the End of the Attribute Vector
11.1.2 Inserting with New Dictionary Entry
5"
When inserting the first name, the first name dictionary is scanned for the
INSERT With
string Karen. As shown in Figure 11.5, this name is not present in the dictio-
nary, yet.
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 2" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 3" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" Mar&n"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" Michael"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 4" 4" Sophie" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" Schulze"
..." ..." ..." ..." ..." ..." ..."
A]ribute"Vector"(AV)"
Dic&onary"(D)"
Fig. 11.5: Dictionary for First Name Column
11"
76 11 Insert
INSERT With
Therefore, the name is appended to the end of the first name dictionary
(see Figure 11.6).
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 2" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 3" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" Mar&n"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" Michael"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 4" 4" Sophie" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" Karen" 5" Schulze"
..." ..." ..." ..." ..." ..." ..."
A]ribute"Vector"(AV)"
Dic&onary"(D)"
Fig. 11.6: Addition of Karen to fname Dictionary
As outlined in Chapter 6, the dictionary needs to be kept sorted. After
appending
12"
Karen to the end of the dictionary, the dictionary needs to be
resorted. Therefore, as shown in Figure 11.7, a new dictionary is created
with a sorted order. In the new dictionary most of the values have been
INSERT With
moved to a new position. For instance, the valueID for Michael changed
from 3 to 4.
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"(old)" D"(new)"
fname& lname& gender& country& city& birthday&
0" 2" 0" Anton" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 3" 1" Hanna" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" Mar&n" 2" Karen"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" Michael" 3" Mar&n" 3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 4" 4" Sophie" 4" Michael" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" Karen" 5" Sophie" 5" Schulze"
..." ..." ..." ..." ..." ..." ..."
A]ribute"Vector"(AV)"
Dic&onary"(D)"
Fig. 11.7: Resorting the fname Dictionary
Based on the changed valueIDs of the new first name dictionary, all val-
ueIDs
13" of the first name attribute vector need to be updated as well. Fig-
ure 11.8 shows the changes to the attribute vector. For instance at position 1,
the valueID for Michael is changed from 3 to 4.
In case the newly added dictionary value is inserted at the end based
on the sorting order of the dictionary, those two steps are omitted. The
dictionary does not need to be resorted and therefore the attribute vector
does not need to be rebuild.
INSERT With
New Dictionary Entry
11.2 Performance Considerations 77
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV"(old)" AV"(new)" D"(new)"
fname& lname& gender& country& city& birthday&
0" 2" 0" 3" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 3" 1" 4" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" 1" 2" Karen"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" 0" 3" Mar&n" 3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 4" 4" 5" 4" Michael" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" Sophie" 5" Schulze"
..." ..." ..." ..." ..." ..." ..."
A]ribute"Vector"(AV)"
Dic&onary"(D)"
Fig. 11.8: Rebuilding the fname Attribute Vector
INSERT With
Finally the valueID 2, representing the dictionary position of the string
Karen,
14" is appended to the attribute vector (see Figure 11.9).
New Dictionary Entry
INSERT"INTO"world_popula&on"VALUES"(Karen,"Schulze,"f,"GER,"Rostock,"0662062012)"
AV" D"
fname& lname& gender& country& city& birthday&
0" 3" 0" Anton"
0" Mar&n" Albrecht" m" GER" Berlin" 0860561955"
1" 4" 1" Hanna"
1" Michael" Berg" m" GER" Berlin" 0360561970"
2" 1" 2" Karen"
2" Hanna" Schulze" f" GER" Hamburg" 0460461968"
3" 0" 3" Mar&n"
3" Anton" Meyer" m" AUT" Innsbruck" 1062061992"
4" 5" 4" Michael" 4" Sophie" Schulze" f" GER" Potsdam" 0960361977"
5" 2" 5" Sophie" 5" Karen" Schulze"
..." ..." ..." ..." ..." ..." ..."
A]ribute"Vector"(AV)"
Dic&onary"(D)"
Fig. 11.9: Appending the valueID representing Karen to the Attribute Vector
15"
11.2 Performance Considerations
When thinking of the world_population example, there are about 8 billion peo-
ple and 5 million unique first names. Every new entry to the dictionary may
cause an overhead regarding resorting of the dictionary and reorganization
of the respective attribute vector. Triggering resorting and reorganization
at every single insert would lead to a performance penalty, which compro-
mises the overall performance of the system. Therefore, an additional insert
layer needs to be added, the di↵erential bu↵er. Chapter 25 explains in detail
how write performance is kept at a high level using periodic merges of the
di↵erential bu↵er and the main store.
78 11 Insert
The vulnerability of a column to reorganization heavily depends on the
column cardinality (the number of distinct values in a dictionary). When
the dictionary only has a few entries, it is most likely that a column needs
to be reorganized with a new insert. However, especially with attributes of
low column cardinality, e.g., gender or country, the likelihood of reorganiza-
tion decreases over time, since most of the possible values for the respective
column have been inserted into the dictionary already. In real world applica-
tions, the dictionary only changes occasionally after it has reached a certain
size. The additional steps necessary for new unique dictionary entries will
occur less frequent and therefore expensive reorganization becomes less fre-
quent.