KEMBAR78
Memory Management In Python The Basics | PDF
BASICS OF MEMORY
MANAGEMENT IN PYTHON
Nina Zakharenko
WHY SHOULD YOU CARE?
Knowing about memory
management helps you write more
efficient code.
WHAT WILL YOU GET?
∎Vocabulary
∎Basic Concepts
∎Foundation
WHAT WON’T YOU GET?
You won’t be an expert at the end
of this talk.
WHAT’S A
VARIABLE?
What’s a C-style variable?
Memory
variable location Value
a 0x3E8 101
b 0x3E9 101
These values live in a fixed
size bucket.
Can only hold same-sized
data, or an overflow occurs.
What’s a C-style variable?
Memory
location Value
0x3E8 101
0x3E9 101
Later…
110
The data in this
memory location is
overwritten.
PYTHON
HAS NAMES,
NOT
VARIABLES
How are python objects stored in memory?
names
references
objects
A name is just a label
for an object.
In python, each object can have
lots of names.
Simple
• numbers
• strings
Different Types of Objects
Containers
•dict	
•list	
• user defined-
classes
What is a reference?
A name or a container object
pointing at another object.
What is a
reference count?
How can we increase the ref count?
300x	=	300
x
references:	1
+1
How can we increase the ref count?
300
x	=	300	
y	=	300
x
references:	2
y
+1
How can we increase the ref count?
300
z	=	[300,	300]
x
references:	4
y
Decrease Ref Count - del
300
x	=	300	
y	=	300	
del	x
references:	1
yx
What does del	do?
The del statement doesn’t delete
objects.
It:
• removes that name as a reference
to that object
• reduces the ref count by 1
Decrease Ref Count - Change Reference
x	=	300	
y	=	300 300
references:0
yy	=	None
Decrease Ref Count - Going out of Scope
def	print_word():	
				word	=	'Seven'	
				print('Word	is	'	+	word)	
ref count +1
‘seven’ is out of
scope.
ref count -1
print_word()
local vs. global namespace
■If refcounts decrease when an object
goes out of scope, what happens to
objects in the global namespace?
■Never go out of scope! Refcount
never reaches 0.
■Avoid putting large or complex
objects in the global namespace.
Every python object
holds 3 things
∎Its type
∎Its value
∎A reference count
PyObject
type integer
refcount 2
value 300
Names References
x
y
x	=	300	
y	=	300
print(	id(x)	)	
>	28501818
print(	id(y)	)	
>	28501818
print	x	is	y	
>	True
* don’t try this in an interactive
environment (REPL)
GARBAGE
COLLECTION
What is Garbage
Collection?
A way for a program to
automatically release memory
when the object taking up that
space is no longer in use.
Two Main Types of Garbage Collection
Reference
Counting
Tracing
How does reference counting garbage
collection work?
Add and Remove References
Refcount Reaches 0
Cascading Effect
The Good
• Easy to Implement
• When refcount is 0,
objects are
immediately deleted.
Reference Counting Garbage Collection
The Bad
• space overhead -
reference count is
stored for every object
• execution overhead -
reference count
changed on every
assignment
The Ugly
• Not generally thread safe
• Reference counting doesn’t detect cyclical
references
Reference Counting Garbage Collection
Cyclical References By Example
class	Node:	
				def	__init__(self,	value):	
								self.value	=	value	
				def	next(self,	next):	
								self.next	=	next
What’s a cyclical reference?
left right
root rc = 1
rc = 3 rc = 2
root	=	Node('root')	
left	=	Node('left')	
right	=	Node(‘right')	
root.next(left)	
left.next(right)	
right.next(left)
What’s a cyclical reference?
del	root	
del	node1	
del	node2
left right
root rc = 0
rc = 1 rc = 1
Reference counting alone will not
garbage collect objects with cyclical
references.
Two Main Types of Garbage Collection
Reference
Counting
Tracing
Tracing Garbage Collection
■source: http://webappguru.blogspot.com/2015/11/mark-and-sweep-garbage-collection.html
Tracing Garbage Collection
■source: http://webappguru.blogspot.com/2015/11/mark-and-sweep-garbage-collection.html
What does Python use?
Reference
Counting
Generational+
Generational Garbage Collection is
based on the theory that most
objects die young.
■ source: http://cs.ucsb.edu/~ckrintz/racelab/gc/papers/hoelzle-jvm98.pdf
Python maintains a list of every object
created as a program is run.
Actually, it makes 3.
generation 0
generation 1
generation 2
Newly created objects are stored in generation 0.
Only container objects with a
refcount greater than 0 will be
stored in a generation list.
When the number of objects in a
generation reaches a threshold,
python runs a garbage collection
algorithm on that generation, and
any generations younger than it.
What happens during a generational garbage
collection cycle?
Python makes a list for objects to discard.
It runs an algorithm to detect reference cycles.
If an object has no outside references, it’s put on
the discard list.
When the cycle is done, it frees up the objects on
the discard list.
After a garbage collection cycle,
objects that survived will be
promoted to the next generation.
Objects in the last generation (2)
stay there as the program executes.
When the ref count reaches 0, you
get immediate clean up.
If you have a cycle, you need to wait
for garbage collection.
REFERENCE
COUNTING
GOTCHAS
Reference counting is not generally
thread-safe.
We’ll see why this is a big deal™
later.
Remember our cycle from before?
left rightrc = 1 rc = 1
Cyclical references get cleaned up
by generational garbage collection.
Cyclical Reference Cleanup
Except in python2 if they have a
__del__	method.
**fixed in python 3.4! - https://www.python.org/dev/peps/pep-0442/
Gotcha!
The __del__	magic method
■ Sometimes called a “destructor”
■Not the del statement.
■ Runs before an object is removed
from memory
__slots__
What are __slots__?
class	Dog(object):	
				pass	
buddy	=	Dog()	
buddy.name	=	'Buddy'
print(buddy.__dict__)	
{'name':	'Buddy'}
What are __slots__?
'Pug'.name	=	'Fred'
AttributeError																													
Traceback	(most	recent	call	last)	
---->	1	'Pug'.name	=	'Fred'	
AttributeError:	'str'	object	has	no	attribute	
'name'
class	Point(object):	
				__slots__	=	('x',	'y')
What are __slots__?
What is the
type of
__slots__?
point.name	=	"Fred"	
Traceback	(most	recent	call	last):	
		File	"point.py",	line	8,	in	<module>	
				point.name	=	"Fred"	
AttributeError:	'Point'	object	has	no	attribute	
'name'
point	=	Point()	
point.x	=	5	
point.y	=	7
size of dict vs. size of tuple
import	sys	
sys.getsizeof(dict())	
sys.getsizeof(tuple())	
sizeof dict: 288 bytes
sizeof tuple: 48 bytes
When would we want to use __slots__?
■ If we’re going to be creating many
instances of a class
■If we know in advance what
properties the class should have
WHAT’S A
GIL?
GLOBAL
INTERPETER
LOCK
Only one thread can run in the
interpreter at a time.
Upside
Fast & Simple Garbage Collection
Advantages / Disadvantages of a GIL
Downside
In a python program, no matter how many
threads exist, only one thread will be
executed at a time.
■Use multi-processing instead of multi-
threading.
■Each process will have it’s own GIL, it’s
on the developer to figure out a way to
share information between processes.
Want to take advantage of multiple CPUs?
If the GIL limits us,
can’t we just remove
it?
additional reading: https://docs.python.org/3/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock
For better or for
worse, the GIL is
here to stay!
WHAT DID
WE LEARN?
Garbage collection is
pretty good.
Now you know how
memory is managed.
Consider
python3
Or, for scientific
applications numpy
& pandas.
Thanks!
@nnja
nina.writes.code@gmail.com
[TODO SLIDESHARE LINK]
Bonus
Material
Additional Reading
• Great explanation of generational garbage collection and python’s
reference detection algorithm.
• https://www.quora.com/How-does-garbage-collection-in-Python-
work
• Weak Reference Documentation
• https://docs.python.org/3/library/weakref.html
• Python Module of the Week - gc
• https://pymotw.com/2/gc/
• PyPy STM - GIL less Python Interpreter
• http://morepypy.blogspot.com/2015/03/pypy-stm-251-
released.html
• Saving 9GB of RAM with python’s __slots__
• http://tech.oyster.com/save-ram-with-python-slots/
Getting in-depth with the GIL
• Dave Beazley - Guide on how the GIL Operates
• http://www.dabeaz.com/python/GIL.pdf
• Dave Beazley - New GIL in Python 3.2
• http://www.dabeaz.com/python/NewGIL.pdf
• Dave Beazley - Inside Look at Infamous GIL Patch
• http://dabeaz.blogspot.com/2011/08/inside-look-at-gil-
removal-patch-of.html
Why can’t we use the REPL to follow along at
home?
• Because It doesn’t behave like a typical python
program that’s being executed.
• Further reading: http://stackoverflow.com/questions/
25281892/weird-id-result-on-cpython-intobject
PYTHON PRE-LOADS OBJECTS
• Many objects are loaded by Python as the interpreter
starts.
• Called peephole optimization.
• Numbers: -5 -> 256
• Single Letter Strings
• Common Exceptions
• Further reading: http://akaptur.com/blog/2014/08/02/
the-cpython-peephole-optimizer-and-you/
Common Question - Why doesn’t python a
python program shrink in memory after garbage
collection?
• The freed memory is fragmented.
• i.e. It’s not freed in one continuous block.
• When we say memory is freed during garbage
collection, it’s released back to python to use
for other objects, and not necessarily to the
system.
• After garbage collection, the size of the
python program likely won’t go down.
PyListObject
type list
refcount 1
value
size 3
capacity 10
nums
Value -10
refcount 1
type integer
PyObject
Value -9
refcount 2
type integer
PyObject
How does python store container objects?
Credits
Big thanks to:
• Faris Chebib & The Salt Lake City Python Meetup
• The many friends & co-workers who lent me their eyes &
ears, particularly Steve Holden
Special thanks to all the people who made and released
these awesome resources for free:
■ Presentation template by SlidesCarnival
■ Photographs by Unsplash
■ Icons by iconsdb
Memory Management In Python The Basics

Memory Management In Python The Basics

  • 1.
    BASICS OF MEMORY MANAGEMENTIN PYTHON Nina Zakharenko
  • 2.
    WHY SHOULD YOUCARE? Knowing about memory management helps you write more efficient code.
  • 3.
    WHAT WILL YOUGET? ∎Vocabulary ∎Basic Concepts ∎Foundation
  • 4.
    WHAT WON’T YOUGET? You won’t be an expert at the end of this talk.
  • 5.
  • 6.
    What’s a C-stylevariable? Memory variable location Value a 0x3E8 101 b 0x3E9 101 These values live in a fixed size bucket. Can only hold same-sized data, or an overflow occurs.
  • 7.
    What’s a C-stylevariable? Memory location Value 0x3E8 101 0x3E9 101 Later… 110 The data in this memory location is overwritten.
  • 8.
  • 9.
    How are pythonobjects stored in memory? names references objects
  • 10.
    A name isjust a label for an object. In python, each object can have lots of names.
  • 11.
    Simple • numbers • strings DifferentTypes of Objects Containers •dict •list • user defined- classes
  • 12.
    What is areference? A name or a container object pointing at another object.
  • 13.
  • 14.
    How can weincrease the ref count? 300x = 300 x references: 1 +1
  • 15.
    How can weincrease the ref count? 300 x = 300 y = 300 x references: 2 y +1
  • 16.
    How can weincrease the ref count? 300 z = [300, 300] x references: 4 y
  • 17.
    Decrease Ref Count- del 300 x = 300 y = 300 del x references: 1 yx
  • 18.
    What does del do? Thedel statement doesn’t delete objects. It: • removes that name as a reference to that object • reduces the ref count by 1
  • 19.
    Decrease Ref Count- Change Reference x = 300 y = 300 300 references:0 yy = None
  • 20.
    Decrease Ref Count- Going out of Scope def print_word(): word = 'Seven' print('Word is ' + word) ref count +1 ‘seven’ is out of scope. ref count -1 print_word()
  • 21.
    local vs. globalnamespace ■If refcounts decrease when an object goes out of scope, what happens to objects in the global namespace? ■Never go out of scope! Refcount never reaches 0. ■Avoid putting large or complex objects in the global namespace.
  • 22.
    Every python object holds3 things ∎Its type ∎Its value ∎A reference count
  • 23.
  • 24.
  • 25.
  • 26.
    What is Garbage Collection? Away for a program to automatically release memory when the object taking up that space is no longer in use.
  • 27.
    Two Main Typesof Garbage Collection Reference Counting Tracing
  • 28.
    How does referencecounting garbage collection work? Add and Remove References Refcount Reaches 0 Cascading Effect
  • 29.
    The Good • Easyto Implement • When refcount is 0, objects are immediately deleted. Reference Counting Garbage Collection The Bad • space overhead - reference count is stored for every object • execution overhead - reference count changed on every assignment
  • 30.
    The Ugly • Notgenerally thread safe • Reference counting doesn’t detect cyclical references Reference Counting Garbage Collection
  • 31.
    Cyclical References ByExample class Node: def __init__(self, value): self.value = value def next(self, next): self.next = next
  • 32.
    What’s a cyclicalreference? left right root rc = 1 rc = 3 rc = 2 root = Node('root') left = Node('left') right = Node(‘right') root.next(left) left.next(right) right.next(left)
  • 33.
    What’s a cyclicalreference? del root del node1 del node2 left right root rc = 0 rc = 1 rc = 1
  • 34.
    Reference counting alonewill not garbage collect objects with cyclical references.
  • 35.
    Two Main Typesof Garbage Collection Reference Counting Tracing
  • 36.
    Tracing Garbage Collection ■source:http://webappguru.blogspot.com/2015/11/mark-and-sweep-garbage-collection.html
  • 37.
    Tracing Garbage Collection ■source:http://webappguru.blogspot.com/2015/11/mark-and-sweep-garbage-collection.html
  • 38.
    What does Pythonuse? Reference Counting Generational+
  • 39.
    Generational Garbage Collectionis based on the theory that most objects die young. ■ source: http://cs.ucsb.edu/~ckrintz/racelab/gc/papers/hoelzle-jvm98.pdf
  • 40.
    Python maintains alist of every object created as a program is run. Actually, it makes 3. generation 0 generation 1 generation 2 Newly created objects are stored in generation 0.
  • 41.
    Only container objectswith a refcount greater than 0 will be stored in a generation list.
  • 42.
    When the numberof objects in a generation reaches a threshold, python runs a garbage collection algorithm on that generation, and any generations younger than it.
  • 43.
    What happens duringa generational garbage collection cycle? Python makes a list for objects to discard. It runs an algorithm to detect reference cycles. If an object has no outside references, it’s put on the discard list. When the cycle is done, it frees up the objects on the discard list.
  • 44.
    After a garbagecollection cycle, objects that survived will be promoted to the next generation. Objects in the last generation (2) stay there as the program executes.
  • 45.
    When the refcount reaches 0, you get immediate clean up. If you have a cycle, you need to wait for garbage collection.
  • 46.
  • 47.
    Reference counting isnot generally thread-safe. We’ll see why this is a big deal™ later.
  • 48.
    Remember our cyclefrom before? left rightrc = 1 rc = 1 Cyclical references get cleaned up by generational garbage collection.
  • 49.
    Cyclical Reference Cleanup Exceptin python2 if they have a __del__ method. **fixed in python 3.4! - https://www.python.org/dev/peps/pep-0442/ Gotcha!
  • 50.
    The __del__ magic method ■Sometimes called a “destructor” ■Not the del statement. ■ Runs before an object is removed from memory
  • 51.
  • 52.
  • 53.
  • 54.
    class Point(object): __slots__ = ('x', 'y') What are __slots__? Whatis the type of __slots__? point.name = "Fred" Traceback (most recent call last): File "point.py", line 8, in <module> point.name = "Fred" AttributeError: 'Point' object has no attribute 'name' point = Point() point.x = 5 point.y = 7
  • 55.
    size of dictvs. size of tuple import sys sys.getsizeof(dict()) sys.getsizeof(tuple()) sizeof dict: 288 bytes sizeof tuple: 48 bytes
  • 56.
    When would wewant to use __slots__? ■ If we’re going to be creating many instances of a class ■If we know in advance what properties the class should have
  • 57.
  • 58.
  • 59.
    Only one threadcan run in the interpreter at a time.
  • 60.
    Upside Fast & SimpleGarbage Collection Advantages / Disadvantages of a GIL Downside In a python program, no matter how many threads exist, only one thread will be executed at a time.
  • 61.
    ■Use multi-processing insteadof multi- threading. ■Each process will have it’s own GIL, it’s on the developer to figure out a way to share information between processes. Want to take advantage of multiple CPUs?
  • 62.
    If the GILlimits us, can’t we just remove it? additional reading: https://docs.python.org/3/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock
  • 63.
    For better orfor worse, the GIL is here to stay!
  • 64.
  • 65.
  • 66.
    Now you knowhow memory is managed.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
    Additional Reading • Greatexplanation of generational garbage collection and python’s reference detection algorithm. • https://www.quora.com/How-does-garbage-collection-in-Python- work • Weak Reference Documentation • https://docs.python.org/3/library/weakref.html • Python Module of the Week - gc • https://pymotw.com/2/gc/ • PyPy STM - GIL less Python Interpreter • http://morepypy.blogspot.com/2015/03/pypy-stm-251- released.html • Saving 9GB of RAM with python’s __slots__ • http://tech.oyster.com/save-ram-with-python-slots/
  • 72.
    Getting in-depth withthe GIL • Dave Beazley - Guide on how the GIL Operates • http://www.dabeaz.com/python/GIL.pdf • Dave Beazley - New GIL in Python 3.2 • http://www.dabeaz.com/python/NewGIL.pdf • Dave Beazley - Inside Look at Infamous GIL Patch • http://dabeaz.blogspot.com/2011/08/inside-look-at-gil- removal-patch-of.html
  • 73.
    Why can’t weuse the REPL to follow along at home? • Because It doesn’t behave like a typical python program that’s being executed. • Further reading: http://stackoverflow.com/questions/ 25281892/weird-id-result-on-cpython-intobject PYTHON PRE-LOADS OBJECTS • Many objects are loaded by Python as the interpreter starts. • Called peephole optimization. • Numbers: -5 -> 256 • Single Letter Strings • Common Exceptions • Further reading: http://akaptur.com/blog/2014/08/02/ the-cpython-peephole-optimizer-and-you/
  • 74.
    Common Question -Why doesn’t python a python program shrink in memory after garbage collection? • The freed memory is fragmented. • i.e. It’s not freed in one continuous block. • When we say memory is freed during garbage collection, it’s released back to python to use for other objects, and not necessarily to the system. • After garbage collection, the size of the python program likely won’t go down.
  • 75.
    PyListObject type list refcount 1 value size3 capacity 10 nums Value -10 refcount 1 type integer PyObject Value -9 refcount 2 type integer PyObject How does python store container objects?
  • 76.
    Credits Big thanks to: •Faris Chebib & The Salt Lake City Python Meetup • The many friends & co-workers who lent me their eyes & ears, particularly Steve Holden Special thanks to all the people who made and released these awesome resources for free: ■ Presentation template by SlidesCarnival ■ Photographs by Unsplash ■ Icons by iconsdb