KEMBAR78
Image classification with neural networks | PDF
Amirkabir University of Technology
Department of Computer Engineering
and Information Technology
Image Classification with
Deep Convolutional
Neural Networks
Sepehr Rasouli
Introduction > Methods > Results > Conclusion2
Outline
• Introduction to Image Classification
& Deep Networks
• Proposed Method
• Main Idea
• Data Set
• Architecture
• Techniques
• Comparison & Results
• Conclusion
Introduction > Methods > Results > Conclusion3
Image Classification
Introduction > Methods > Results > Conclusion4
Why Deep Learning?
•“Shallow” vs. “deep” architectures
Learn a feature hierarchy all the way from pixels to classifier
Hand	Designed	
Feature	
Extraction
Trainable	
Classifier
Layer	1 Layer	N
Simpler	
classifier
Introduction > Methods > Results > Conclusion5
Our Method
• Deep Convolutional Neural Network
• 5 convolutional and 3 fully connected layers
• 650,000 neurons, 60 million parameters
• Techniques used for boosting up performance
• ReLU nonlinearity
• Training on Multiple GPUs
• Overlapping max pooling
• Data Augmentation
• Dropout
Introduction > Methods > Results > Conclusion6
Overall Architecture
• Trained with stochastic gradient descent on two NVIDIA GPUs for about a
week (5~6 days)
• 650,000 neurons, 60 million parameters, 630 million connections
• The last layer contains 1,000 neurons which produces a distribution over the
1,000 class labels.
Introduction > Methods > Results > Conclusion7
Dataset
• ImageNet
§ Over 15 million high-quality labeled images
§ About 22,000 categories
§ Collected from the web, labeled by humans on Amazon's Mechanical
Turk
§ Variable-resolution images
• ILSVRC Competition
§ ImageNet Large-Scale Pascal Visual Object Challenge
§ Annual competition of image classification at large scale
§ Subset of ImageNet
§ 1,000 categories with about 1,000 images each
§ 1.2M images in 1K categories
§ Classification: make 5 guesses about the image label
Introduction > Methods > Results > Conclusion8
Rectified Linear Units
𝑥 = 𝑤$ 𝑓 𝑍$ + 𝑤( 𝑓 𝑍(
+𝑤) 𝑓 𝑍)
x	is	called	the	total	input	
to	the	neuron,	and	f(x)	
is	its	output
Very	bad	
(slow	to	train	)
Very	good	
(quick	to	train)
f(x)	=	max(0,x)f(x)	=	tanh(x)
Introduction > Methods > Results > Conclusion9
Rectified Linear Units
• Biological plausibility: One-sided, compared
to the antisymmetry of tanh.
• Sparse activation: For example, in a randomly
initialized network, only about 50% of hidden
units are activated (having a non-zero output).
• Efficient gradient propagation: No vanishing
gradient problem or exploding effect.
• Efficient computation: Only comparison,
addition and multiplication
Introduction > Methods > Results > Conclusion10
Training on Multiple GPUs
• Spread across two GPUs
• GTX 580 GPU with 3GB memory
• Particularly well-suited to cross-GPU
parallelization
• Very efficient implementation of CNN on
GPUs
Model Top-1 Top-5
Sparse	coding [3] 47.1% 28.2%
SIFT	+	FVs	[4] 45.7% 25.7%
CNN 37.5 17.0%
Introduction > Methods > Results > Conclusion11
Results & Comparison
•ILSVRC-2010	test	set
ILSVRC-2010	winner
Previous	best
published	result
Our	Method
Comparison	of	results	on	ILSRVCs	2010	
test	set.	In	italics	best	results	achieved	
by	others.
Introduction > Methods > Results > Conclusion12
Conclusion
• Large, deep convolutional neural networks for large
scale image classification was proposed
• 5 convolutional layers, 3 fully-connected layers
• 650,000 neurons, 60 million parameters
• Several techniques for boosting up performance
• The proposed method won the ILSVRC-2012
• Achieved a winning top-5 error rate of 15.3%,
compared to 26.2% achieved by the second-best entry
Introduction > Methods > Results > Conclusion13
Conclusion
Introduction > Methods > Results > Conclusion14
References
[1]	 http://cs.nyu.edu/~fergus/tutorials/
deep_learning_cvpr12/fergus_dl_tutorial_final.pptx
[2]	 reference	:	http://web.engr.illinois.edu/
~slazebni/spring14/lec24_cnn.pdf
[3]	 A.	Berg,	J.	Deng,	and	L.	Fei-Fei.	Large	scale	
visual	recognition	challenge	2010.	
www.imagenet.org/challenges.	2010.	[4]	
S.	Tara,	Brian	Kingsbury,	A.-r.	Mohamed	and
B.	Ramabhadran,	"Learning	Filter	Banks	within	a	Deep
[4]	 J.Sánchezand F.Perronnin.High-dimensional	
signature	compression	for	large-scale	image	classification.
In	Computer	Vision	and	Pattern	Recognition(CVPR),
2011IEEEConferenceon,pages1665–1672.IEEE,	2011.
Introduction > Methods > Results > Conclusion15
Thank you for your attention
Any Questions?
Introduction > Methods > Results > Conclusion16
Results 2012
• ILSVRC-2012	results
Proposed	method
Top-5	error	rate	:	16.422%
Runner-up
Top-5	error	rate	:	26.172%
Introduction > Methods > Results > Conclusion17
Convolutional NNs
Introduction > Methods > Results > Conclusion18
Pooling
• Spatial Pooling
• Non-overlapping / overlapping regions
• Sum or max
Max
Sum
Introduction > Methods > Results > Conclusion19
Dropout
• Independently	set	each	hidden	unit	activity	to			zero	with	0.5	
probability
• Used	in	the	two	globally-connected	hidden	layers	at	the	net's	
output

Image classification with neural networks

  • 1.
    Amirkabir University ofTechnology Department of Computer Engineering and Information Technology Image Classification with Deep Convolutional Neural Networks Sepehr Rasouli
  • 2.
    Introduction > Methods> Results > Conclusion2 Outline • Introduction to Image Classification & Deep Networks • Proposed Method • Main Idea • Data Set • Architecture • Techniques • Comparison & Results • Conclusion
  • 3.
    Introduction > Methods> Results > Conclusion3 Image Classification
  • 4.
    Introduction > Methods> Results > Conclusion4 Why Deep Learning? •“Shallow” vs. “deep” architectures Learn a feature hierarchy all the way from pixels to classifier Hand Designed Feature Extraction Trainable Classifier Layer 1 Layer N Simpler classifier
  • 5.
    Introduction > Methods> Results > Conclusion5 Our Method • Deep Convolutional Neural Network • 5 convolutional and 3 fully connected layers • 650,000 neurons, 60 million parameters • Techniques used for boosting up performance • ReLU nonlinearity • Training on Multiple GPUs • Overlapping max pooling • Data Augmentation • Dropout
  • 6.
    Introduction > Methods> Results > Conclusion6 Overall Architecture • Trained with stochastic gradient descent on two NVIDIA GPUs for about a week (5~6 days) • 650,000 neurons, 60 million parameters, 630 million connections • The last layer contains 1,000 neurons which produces a distribution over the 1,000 class labels.
  • 7.
    Introduction > Methods> Results > Conclusion7 Dataset • ImageNet § Over 15 million high-quality labeled images § About 22,000 categories § Collected from the web, labeled by humans on Amazon's Mechanical Turk § Variable-resolution images • ILSVRC Competition § ImageNet Large-Scale Pascal Visual Object Challenge § Annual competition of image classification at large scale § Subset of ImageNet § 1,000 categories with about 1,000 images each § 1.2M images in 1K categories § Classification: make 5 guesses about the image label
  • 8.
    Introduction > Methods> Results > Conclusion8 Rectified Linear Units 𝑥 = 𝑤$ 𝑓 𝑍$ + 𝑤( 𝑓 𝑍( +𝑤) 𝑓 𝑍) x is called the total input to the neuron, and f(x) is its output Very bad (slow to train ) Very good (quick to train) f(x) = max(0,x)f(x) = tanh(x)
  • 9.
    Introduction > Methods> Results > Conclusion9 Rectified Linear Units • Biological plausibility: One-sided, compared to the antisymmetry of tanh. • Sparse activation: For example, in a randomly initialized network, only about 50% of hidden units are activated (having a non-zero output). • Efficient gradient propagation: No vanishing gradient problem or exploding effect. • Efficient computation: Only comparison, addition and multiplication
  • 10.
    Introduction > Methods> Results > Conclusion10 Training on Multiple GPUs • Spread across two GPUs • GTX 580 GPU with 3GB memory • Particularly well-suited to cross-GPU parallelization • Very efficient implementation of CNN on GPUs
  • 11.
    Model Top-1 Top-5 Sparse coding[3] 47.1% 28.2% SIFT + FVs [4] 45.7% 25.7% CNN 37.5 17.0% Introduction > Methods > Results > Conclusion11 Results & Comparison •ILSVRC-2010 test set ILSVRC-2010 winner Previous best published result Our Method Comparison of results on ILSRVCs 2010 test set. In italics best results achieved by others.
  • 12.
    Introduction > Methods> Results > Conclusion12 Conclusion • Large, deep convolutional neural networks for large scale image classification was proposed • 5 convolutional layers, 3 fully-connected layers • 650,000 neurons, 60 million parameters • Several techniques for boosting up performance • The proposed method won the ILSVRC-2012 • Achieved a winning top-5 error rate of 15.3%, compared to 26.2% achieved by the second-best entry
  • 13.
    Introduction > Methods> Results > Conclusion13 Conclusion
  • 14.
    Introduction > Methods> Results > Conclusion14 References [1] http://cs.nyu.edu/~fergus/tutorials/ deep_learning_cvpr12/fergus_dl_tutorial_final.pptx [2] reference : http://web.engr.illinois.edu/ ~slazebni/spring14/lec24_cnn.pdf [3] A. Berg, J. Deng, and L. Fei-Fei. Large scale visual recognition challenge 2010. www.imagenet.org/challenges. 2010. [4] S. Tara, Brian Kingsbury, A.-r. Mohamed and B. Ramabhadran, "Learning Filter Banks within a Deep [4] J.Sánchezand F.Perronnin.High-dimensional signature compression for large-scale image classification. In Computer Vision and Pattern Recognition(CVPR), 2011IEEEConferenceon,pages1665–1672.IEEE, 2011.
  • 15.
    Introduction > Methods> Results > Conclusion15 Thank you for your attention Any Questions?
  • 16.
    Introduction > Methods> Results > Conclusion16 Results 2012 • ILSVRC-2012 results Proposed method Top-5 error rate : 16.422% Runner-up Top-5 error rate : 26.172%
  • 17.
    Introduction > Methods> Results > Conclusion17 Convolutional NNs
  • 18.
    Introduction > Methods> Results > Conclusion18 Pooling • Spatial Pooling • Non-overlapping / overlapping regions • Sum or max Max Sum
  • 19.
    Introduction > Methods> Results > Conclusion19 Dropout • Independently set each hidden unit activity to zero with 0.5 probability • Used in the two globally-connected hidden layers at the net's output