1.
Import the dataset and do usual exploratory analysis steps like checking the
   structure & characteristics of the dataset
       1. Get number of rows in the data :
                 select   count(1)   from   `targetcasestudy1995.targetdb.customers' : 99441
                 select   count(1)   from   `targetcasestudy1995.targetdb.geolocation` :1000163
                 select   count(1)   from   `targetcasestudy1995.targetdb.order_items` :112650
                 select   count(1)   from   `targetcasestudy1995.targetdb.order_reviews` :99224
                 select   count(1)   from   `targetcasestudy1995.targetdb.orders`
                 select   count(1)   from   `targetcasestudy1995.targetdb.payments`
                 select   count(1)   from   `targetcasestudy1995.targetdb.products`
                 select   count(1)   from   `targetcasestudy1995.targetdb.sellers`
         2. Number of null or missing values in a column
         3. Data type of columns in a table
                  1. DESC `targetcasestudy1995.targetdb.customers`
                  2. DESC `targetcasestudy1995.targetdb.geolocation`
                  3. DESC `targetcasestudy1995.targetdb.order_items`
                  4. DESC `targetcasestudy1995.targetdb.order_reviews`
                  5. DESC `targetcasestudy1995.targetdb.orders`
                  6. DESC `targetcasestudy1995.targetdb.payments`
                  7. DESC `targetcasestudy1995.targetdb.products`
                  8. DESC `targetcasestudy1995.targetdb.sellers`
         4. Get the time period for which the data is given
                  1. select max(order_purchase_timestamp),min(order_purchase_timestamp) from
                     `targetcasestudy1995.targetdb.orders` :
     Row        MAX                                        MIN
     1          2018-10-17 17:30:18 UTC                    2016-09-04 21:15:19 UTC
         5. Number of cities in our dataset
                  1. Select
                     (distinct geolocation_cities) from `targetcasestudy1995.targetdb.geoloca
                     tion`: 8011
         6. Number of states in our dataset
                  1. select count(distinct geolocation_state) from `targetcasestudy1995.targe
                     tdb.geolocation` : 27
In-depth Exploration:
   1. How many orders do we have for each order status?
             select order_status ,count(1) as Count from `targetcasestudy1995.targetdb.order
              s` group by order_status
   2. Is there a growing trend on e-commerce in Brazil? How can we describe a complete
      scenario?
             select EXTRACT(YEAR FROM order_purchase_timestamp) as Year, count(1) as COunt fr
              om `targetcasestudy1995.targetdb.orders` group by 1 order by Year
   3. On what day of week brazilians customers tend to do online purchasing?
             select EXTRACT(DAYOFWEEK FROM order_purchase_timestamp) as DAY, count(1) as Coun
              t from `targetcasestudy1995.targetdb.orders` group by 1 order by Count
      4. What time do Brazilian customers tend to buy (Dawn, Morning, Afternoon or Night)?
Row      Hour   Count
1        5      188
2        4      206
3        3      272
4        6      502
5        2      510
6        1      1170
7        7      1231
8        0      2394
9        8      2967
10       23     4123
11       9      4785
12       18     5769
13       22     5816
14       19     5982
15       12     5995
16       17     6150
17       10     6177
18       20     6193
19       21     6217
20       15     6454
21       13     6518
22       14     6569
23       11     6578
24       16     6675
                select EXTRACT(hour FROM order_purchase_timestamp) as Hour, count(1) as Count fr
         om `targetcasestudy1995.targetdb.orders` group by 1 order by Count
                Brazilian like to shop in the Afternoon
   5. Feature Extraction: Through order_purchase_timestamp in “orders” dataset extract
select  EXTRACT(Year FROM order_purchase_timestamp) as Year,
        EXTRACT(Month FROM order_purchase_timestamp) as Month,
        EXTRACT(Day FROM order_purchase_timestamp) as Day,
        EXTRACT(date FROM order_purchase_timestamp) as date,
        EXTRACT(Dayofweek FROM order_purchase_timestamp) as dayofweek,
        CASE
             when EXTRACT(Dayofweek FROM order_purchase_timestamp) = 1
             then "Sunday"
             when EXTRACT(Dayofweek FROM order_purchase_timestamp) = 2
             then "Monday"
             when EXTRACT(Dayofweek FROM order_purchase_timestamp) = 3
             then "Tuesday"
             when EXTRACT(Dayofweek FROM order_purchase_timestamp) = 4
             then "Wednesday"
             when EXTRACT(Dayofweek FROM order_purchase_timestamp) = 5
             then "Thursday"
             when EXTRACT(Dayofweek FROM order_purchase_timestamp) = 6
             then "Friday"
             when EXTRACT(Dayofweek FROM order_purchase_timestamp) = 7
             then "Saturday"
          end as day_of_week_name,
        EXTRACT(time FROM order_purchase_timestamp) as time,
         count(1) as Count from `targetcasestudy1995.targetdb.orders` group by 1,2,3,4,5,6,7 o
rder by Count
   2. Evolution of E-commerce orders in the Brazil region:
         1. Get month on month orders by region
2. Total of customer orders by state
     Highest Orders are from SP state
3. Top 10 brazilian cities most no. of orders
Row     customer_city    Count_Of_Orders
1       sao paulo        15540
2       rio de janeiro   6882
3       belo horizonte   2773
4       brasilia         2131
5       curitiba         1521
6       campinas         1444
7       porto alegre     1379
8       salvador         1245
9       guarulhos        1189
        sao bernardo
10      do campo         938
4. How are customers distributed in Brazil
Row    customer_state   Count_Of_Customers
1      SP               41746
2      RJ               12852
3      MG               11635
4      RS               5466
5      PR               5045
6      SC               3637
7      BA               3380
8      DF               2140
9      ES               2033
10     GO               2020
11     PE               1652
12     CE               1336
13     PA               975
14     MT               907
15     MA               747
16     MS               715
17     PB               536
18     PI               495
19     RN               485
20     AL               413
21     SE               350
22     TO               280
23     RO               253
24     AM               148
25     AC               81
26     AP               68
27     RR               46
      5. City wise number of unique customers
      Row    customer_city    Count_Of_Customers
      1      sao paulo        15540
      2      rio de janeiro   6882
      3      belo horizonte   2773
      4      brasilia         2131
      5      curitiba         1521
4. Impact on Economy: Analyze the money movemented by e-commerce by looking at
   order prices, freight and others.
   Answer the following questions:
1. Total amount sold in 2017 between Jan to August
2. Total amount sold in 2018 between Jan to august
3. % increase from 2017 to 2018: 16.65%
Step 3: Join (orders+order_items) table from previous step with “customers” table on
Customer_id and find:
                  1. Mean & Sum of price by customer state
                     Mean:
SUM:
       2. Mean & Sum of freight value by customer state
          MEAN:
SUM:
5. Analysis on sales, freight and delivery time
1. Calculating days between purchasing, delivering and estimated delivery         :
       3. Grouping data by state, take mean of freight_value, time_to_delivery,
       diff_estimated_delivery
4. Sort the data to get the following:
a. Top 5 states with highest/lowest average freight value
b. Top 5 states with highest/lowest average time to delivery
c. Top 5 states where delivery is really fast/ not so fast compared to estimated date
6. Payment type analysis: Join “payments” dataset with the existing data on order_id
a. Count of orders for different payment types
b. Distribution of payment installments and count of orders
c. Count of orders for different payment types Month over Month
Row   payment_installments   month   count_of_payments
1     1                      8       5235
2     1                      5       4977
3     1                      7       4970
4     1                      3       4891
5     1                      4       4675
6     1                      6       4495
7     1                      2       4441
8     1                      1       4230
9     1                      11      3571
10    1                      12      2817
Row      payment_installments   month   count_of_payments
11       1                      10      2311
12       1                      9       1996
13       2                      5       1318
14       2                      8       129
      6. Actionable Insights:
             Delivery time and estimated time are high in many case.
             Delivery and estimates are not near to each other for orders
             Small cities do not have much costumers and orders as well.
             Night time has few purchases
      7. Recommendations
             Delivery time can be improved and 1 day and 2 day delivery should be
                introduced
             Estimates and actual delivery should be near, company should work on their
                estimation software
             Company should focus on Smaller Cities as well.
             Offers should be given at the night time so that people visit the website and
                purchase at night as well