KEMBAR78
Project 2 Data Mining Part 1 | PDF
Project II
Data Mining a
Mushroom Dataset
Group 1
Raymond Borges
Jarilyn Hernandez
The Mushroom Dataset
Data Set                      Number of
                 Multivariate            8124 Area:           Life
Characteristics:              Instances:
Attribute                    Number of           Date
                 Categorical             22               1987
Characteristics:             Attributes:         Donated:

This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the
Agaricus and Lepiota Family.

Each species is identified as definitely edible, definitely
poisonous, or of unknown edibility and not recommended.
This latter class was combined with the poisonous one.
Mushroom Dataset
 22 Independent attributes
 1 Class Attribute (Can you eat it?)
Edible(4,208)51.8%
Poisonous(3,916)48.2%
Mushroom Dataset
22 Attributes Total
18 Intrinsically
on Mushroom

4 Others
1 Habitat
1 Population
1 Bruises
1 Odor
Odor attribute, 1R Learner
The Simplest Rule 98.52% Acc.
A = almond             N = none
C = creosote           P = pungent
F = foul               S = spicy
L = anise              Y = fishy
M = musty




           a   c   f   l    m n      p   s   y
J48 Tree 100%                                                     E = Edible
Classification                                                    P = Poisonous



   E       P           P         E          P                 P        P           P
almond creosote    foul      anise        musty   none pungent spicy              fishy


   E      E        E         E             P          E       E                   E

 black   brown    buff chocolate green orange purple white                    yellow


                                                                              E
                            P                             E
                                                              narrow       broad
                           close         crowded distant

          E            P             E            E           E        E
       abundant clustered numerous scattered several               solitary
Simplest rule-set (Benchmark)
These are Poisonous
1. Odor = not almond or anise or none
(120 poisonous cases missed, 98.52% accuracy)

2. Spore-print-color =green
(48 cases missed, 99.41% accuracy)

3. Odor=none and stalk-surface-below-ring = scaly
 and stalk-color-above-ring= not brown
(8 cases missed, 99.90% accuracy)

4. Habitat= leaves and cap-color=white
4. May also be population=clustered and cap-color=white
(100% accuracy)
Habitat Insights
Waste is safe but stay away from paths




Woods   Grasses   Leaves Meadows Paths   Urban   Waste
Population Insights
  Mushrooms travel safer in groups




Abundant Clustered Numerous Scattered   Several   Solitary
Information  Knowledge

         Population Data                                        %Rates vs. Mushrooms
                                                           120.00%

                                                           100.00%

                                                            80.00%

                                                            60.00%

                                                            40.00%

                                                            20.00%

Abundant Clustered Numerous Scattered Several   Solitary     0.00%




                                                                     % Poisonous   % Edible
Poisonous/Edible Ratio
vs. Mushroom Population Density
                         300.00%


                         250.00%
                                                          several
Poisonous/Edible Ratio




                         200.00%


                         150.00%


                         100.00%


                          50.00%           solitary
                                                                        scattered
                                                                                           clustered
                           0.00%                                                    numerous         abundant
                                   0   1              2             3          4          5        6       7

                         -50.00%
                                                             Mushroom Density
Conclusions
 If   it stinks don’t eat it, 98.52% accuracy

 Ifit doesn’t stink and it’s spore color is not
  green then you have a 99.41% chance of
  survival

 Odor  and spore color may be the best
  attributes statistically but not in the field
Future Work
   Use more easily identified attributes to classify
    mushrooms to produce a method of easier
    visual classification

   Eliminate nonvisual attributes

Focus on visual-queue attributes, e.g.
habitat, population, cap and stalk

   Compare the two methods

Project 2 Data Mining Part 1

  • 1.
    Project II Data Mininga Mushroom Dataset Group 1 Raymond Borges Jarilyn Hernandez
  • 2.
    The Mushroom Dataset DataSet Number of Multivariate 8124 Area: Life Characteristics: Instances: Attribute Number of Date Categorical 22 1987 Characteristics: Attributes: Donated: This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one.
  • 3.
    Mushroom Dataset  22Independent attributes  1 Class Attribute (Can you eat it?) Edible(4,208)51.8% Poisonous(3,916)48.2%
  • 4.
    Mushroom Dataset 22 AttributesTotal 18 Intrinsically on Mushroom 4 Others 1 Habitat 1 Population 1 Bruises 1 Odor
  • 5.
    Odor attribute, 1RLearner The Simplest Rule 98.52% Acc. A = almond N = none C = creosote P = pungent F = foul S = spicy L = anise Y = fishy M = musty a c f l m n p s y
  • 6.
    J48 Tree 100% E = Edible Classification P = Poisonous E P P E P P P P almond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E narrow broad close crowded distant E P E E E E abundant clustered numerous scattered several solitary
  • 7.
    Simplest rule-set (Benchmark) Theseare Poisonous 1. Odor = not almond or anise or none (120 poisonous cases missed, 98.52% accuracy) 2. Spore-print-color =green (48 cases missed, 99.41% accuracy) 3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown (8 cases missed, 99.90% accuracy) 4. Habitat= leaves and cap-color=white 4. May also be population=clustered and cap-color=white (100% accuracy)
  • 8.
    Habitat Insights Waste issafe but stay away from paths Woods Grasses Leaves Meadows Paths Urban Waste
  • 9.
    Population Insights Mushrooms travel safer in groups Abundant Clustered Numerous Scattered Several Solitary
  • 10.
    Information  Knowledge Population Data %Rates vs. Mushrooms 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% Abundant Clustered Numerous Scattered Several Solitary 0.00% % Poisonous % Edible
  • 11.
    Poisonous/Edible Ratio vs. MushroomPopulation Density 300.00% 250.00% several Poisonous/Edible Ratio 200.00% 150.00% 100.00% 50.00% solitary scattered clustered 0.00% numerous abundant 0 1 2 3 4 5 6 7 -50.00% Mushroom Density
  • 12.
    Conclusions  If it stinks don’t eat it, 98.52% accuracy  Ifit doesn’t stink and it’s spore color is not green then you have a 99.41% chance of survival  Odor and spore color may be the best attributes statistically but not in the field
  • 13.
    Future Work  Use more easily identified attributes to classify mushrooms to produce a method of easier visual classification  Eliminate nonvisual attributes Focus on visual-queue attributes, e.g. habitat, population, cap and stalk  Compare the two methods

Editor's Notes