Category: Uncategorized

  • Air quality 2019: Things you should know about Whitefield, Bangalore

    Air quality 2019: Things you should know about Whitefield, Bangalore

    Introduction

    This article describes the assessment of air quality around Whitefield (Bangalore, India) for the period Aug 2018 to Dec 2019 based on the data from 13 citizen managed low cost realtime air quality monitors and one government manual monitor. For the assessment, we will be using the PM2.5 (particulate matter of size 2.5 microns or below measured in μg/m3) measurements done by 14 PM2.5 monitors at various locations around Whitefield.

    https://aircare.mapshalli.org
    Figure 1: Air quality monitors showing current PM2.5 values @Jan-18-2020 10:30AM

    For an in depth analysis, we will be using data from an AirCare monitor at Ferns Paradise and compare with the data from the AirCare monitor at Windmills.

    PM2.5 are particles or droplets of size 2.5 microns or less are a major part of polluted air and is associated with various negative health effects. There is no safe limit for PM25 and the WHO guideline value is 10 µg/m3 for annual average. Long term exposure to these particles cause increased rates of heart disease, stroke, lung diseases, kidney disease, and diabetes. For 10 µg/m3 increase in PM2.5 the life expectancy reduces by one year. Exposure to 10 µg/m3 of PM2.5 is equivalent to smoking half a cigarette per day. You can read about PM2.5 and its harmful effects here.

    https://aircare.mapshalli.org
    Figure 2: PM2.5, last 24 hours averages, and loss of life expectancy @ Jan-18-2020 10:30AM

    Findings & Analysis

    Why do we want to use the USA air quality standards in our analysis?

    In the analysis described in this article, we have used the US air quality standards instead of the India air quality standards.

    Figure 3: Mismatch of air quality assessment between WHO/USA vs Indian standards.
    Values displayed are for PM2.5 in µg/m3

    Indian air quality standards are very relaxed. For example, an 60 µg/m3 24 hour PM2.5 exposure is considered euphemistic Satisfactory, whereas as per USA standards it is considered Unhealthy!

    60 µg/m3 of PM2.5 is 6x the WHO guidelines, equivalent to smoking 3 cigarettes a day, and losing of 6 years of life!

    In India, the air quality is reported via air quality index that is in the range (0-500). Air quality index is more suitable to report a single index number when considering many pollutants like PM2.5, PM10, NO2, etc. All the research reports that analyze health effects are based on raw PM2.5 values averaged to 24 hours and yearly. We have used raw PM2.5 number for the analysis. The air quality assessment is based on USA air quality standards are described here.

    Yearly snapshot

    The following picture shows the daily air quality assessment for two Whitefield locations for everyday of the year.

    Figure 4: Ferns Paradise – Daily air quality assessment as per USA air quality standards – PM2.5 µg/m3
    Figure 5: Windmills – Daily air quality assessment as per USA air quality standards – PM2.5 µg/m3

    The above two figures shows that Ferns Paradise had more Good air quality days when compared to Windmills. Note the high number of Unhealthy air quality level days during the months of January and February.

    Table 1: Air quality ranking and loss of life expectancy

    The table below shows the best air quality rank, Aug 2018 to Dec 2019 period PM2.5 averages, and loss of life expectancy for Whitefield locations.

    Table 1: Ranking, PM2.5 and loss of life expectancy
    * Based on partial data

    The PM2.5 levels are 3.8x times of that of WHO guidelines and fall in the Unhealthy for Sensitive Groups USA air quality assessment category, meaning that members of sensitive groups may experience health effects and the general public is not likely to be affected. The following table shows a comparison of average life expectancy lost due to various causes.

    Source: https://aqli.epic.uchicago.edu/pollution-facts/
    Figure 6: Life expectancy lost due to various causes

    Table 2: 2018 to 2019 reduction in PM2.5 pollution

    The table below shows the yearly PM2.5 pollution level changes from 2018 (Aug-Dec) to 2019 (Aug-Dec):

    Table 2: 2018 – 2019 changes in PM2.5 and ranking of locations
    * Do not have data for the comparison

    From 2018 to 2019, the air quality has improved from Unhealthy for Sensitive Groups to Moderate category. Moderate means that the air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution. All location except Ramagondanahalli show good improvement in air quality.

    The reduction in PM2.5 levels can be attributed to the following known significant factors:

    • 24% increase in Wind Speed from 2018 to 2019.
    • 54% increase in rain from 2018 to 2019.
    • Graphite India Private Ltd factory partial shutdowns during the months of Oct-Dec 2018 and permanent closing down in Feb 2019.

    There is an unquantified increase in pollution due to increased number of vehicles, open waste burning, construction, and road dust. The following chart shows the source contributions for the year 2015 and 2030 (projected) from http://www.urbanemissions.info/india-apna/bengaluru-india/.

    Source: http://www.urbanemissions.info/india-apna/bengaluru-india/
    Figure 7: 2015 Sources of pollution

    .

    Source: http://www.urbanemissions.info/india-apna/bengaluru-india/
    Figure 8: 2030 Projected sources of pollution

    PM2.5 Monthly History

    Figure 9: PM2.5 History from 11-Aug-2018 to 11-Jan-2020

    Monthly Averages

    Figure 10: Monthly Averages

    The Unhealthy (64 µg/m3) air quality level is found during January. Top three months with Unhealthy for Sensitive Groups (47 µg/m3) air quality levels are: November, December, and February. Top three months with Moderate (16 µg/m3 ) air quality levels are : June, July, and August.

    During the months of June to September, the south west monsoon has a monthly average wind speed greater than 6 km/hr that effectively transports away the particulate matter reducing the pollution. During the same time, the monsoon rain also washes away the particulate matter but to a lesser extent. The monthly average wind speed (0.2 km/hr ) and the rain fall is at their lowest in January causing the particulate matter to linger close to the source.

    Figure 11: Relationship between PM2.5 pollution and Wind speed

    Wind speeds greater than 6 km/hr significantly reduces the PM2.5 pollution to levels below 20 µg/m3. Large number of open lands being converted buildings would reduce the wind speed and hence would increase the pollution levels.

    Day of the Month

    Figure 12: Day of the Month

    Days of the month with Unhealthy for Sensitive Groups (40 µg/m3 ) air quality levels are: 4th, 5th, and the 6th. Top three days of the month with Moderate (29 µg/m3) air quality levels are: 2nd, 14th, and the 15th.

    Day of the Week

    Figure 13: Day of the Week

    The PM2.5 pollution levels remain the same independent of the day of the week.

    Hour of the Day

    Figure 14: Hour of the Day

    Hours of the day with Unhealthy for Sensitive Groups (45 µg/m3) air quality are: 06:00AM – 08:00AM. Hours of the day with the Moderate (25 µg/m3 ) air quality levels are: 01:00PM – 03:00PM.

    The air that is closer to the earth is warmer and denser than the air above. The higher you go the air becomes colder and thinner. The air temperature decreases by ~1 degree centigrade for every 100m increase in altitude. The warmer air that closer to the ground rises transporting any particulate matter.

    During the winter months (Nov, Dec, and Jan) the skies are clear, the air is calm and stable, and the nights are longer. The winter sun, low in the sky supplies less heat to the earth’s surface; this heat is quickly radiated cooling air closer to the ground. A less dense warm air moves in to create a warm air cap/lid called the Inversion Layer, as shown in Figure 15.

    Figure 15: Temperature Inversion

    During the winter early morning hours, this inversion layer traps the pollutants along with the cooler air rich in moisture and is called the smog as shown below in Figure 16. This is the primary cause of high pollution during the winter mornings.

    Figure 16: Smog trapped over the city of Almaty, Kazakhstan during a temperature inversion.

    Once the Sun heats up the land and the air later in the day, the inversion layer is broken and the pollutants are carried away.

    Recommendations

    Summary

    Annual PM2.5 pollution levels across Whitfield (Bangalore) is estimated to be around 38 µg/m3. This is 3.8x times of the WHO guideline of 10 µg/m3. An estimated 3.9 years of life expectancy is lost due to PM2.5 pollution; this is 2x of the years lost due to smoking!

    Mother nature has helped in reducing pollution by 31% year over year through winds and extended monsoon rain. Citizens have played a role in shutting down multiple polluting industries around Whitefield.

    The number vehicles, road dust, and open waste burning continue to rise unabated every year.

    Citizens

    Citizens should become more aware of air quality standards and apply pressure to the government to follow WHO guidelines and adopt international standards. Become aware of who can influence actions to improve air quality. Residents need to actively promote walkability in the neighborhood and public transport through increasing number of BMTC Buses, METRO, Sub-urban rail.

    Citizens should eliminate their outdoor activities like walking, playing, and exercising when the air quality levels are Unhealthy for Sensitive Groups or worse: be aware of the poor air quality till 08:00AM. You can get a real time air quality for your location (or closer) at https://aircare.mapshalli.org. You can reduce the indoor air pollution level to less than 10 µg/m3 by installing and operating an indoor air purifier.

    Home Owner Associations

    Home owners associations should maintain their Diesel Generators and eliminate mosquito fogging as it causes significant pollution with no real benefits; instead use alternate and proven methods for mosquito control.

    Government

    Central Pollution Control Board (CPCB)

    From a policy front, Indian government should come up a realistic air quality standards aligned with WHO standards and adopt best practices from other countries.

    Karnataka State Pollution Control Board (KSPCB)

    Whitefield, the technical powerhouse of India just has one manual government air quality monitor! It is high time to replace the manual government monitor installed in EPIP, Whitefield with a real time continuous monitor that provides timely information to the citizens.

    BBMP, local Corporators, and the local MLA

    Include improvement of air quality in your manifesto and in the ward improvement plans.

    Ensure that the roads are cleared off dust and improve road infrastructure to prevent resuspension of road dust. Increase street cleaning using mechanized cleaners and manually remove dust from the roads and footpaths

    Proper disposal of garbage and penalizing garbage burning will also help.

    Acknowledgments

    We thank various individuals and home owner associations for hosting the 12 AirCare air quality monitors and one Purpleair monitor. They also spend their energy on a continuous basis to keep the monitors up and running. Rahul Bedi provides us with critical weather data for analysis.

    A list of individuals whom we want to thank are: Manoj (RxDx), Dr. Sunitha Maheshwari (RxDx), Clement Jayakumar (Ferns Paradise), Ajit Lakshmiratan, (Gulmohar), Zibi Jamal (Wind Mills), Ramakrishnan (Palm Meadows), Srinivas Ganji (Brigade Lakefrnont), Vivekanand (Gadjoy), Dr. Jagadish Prasad (Femiint Health), Mithun (PSN), Mukesh (Bren Unity), and Rahul Bedi (Pride Orchid).

    Epilogue

    How good are the low cost air quality monitors?

    The questions people always ask are the following: how good are the low cost sensors, have they been calibrated and recalibrated, will the data be used and accepted by government including the courts?

    The following figure shows correlation between the government monitor and one citizen air quality monitor.

    Figure 17: High correlation (R2=0.9) between government and private monitors
    Data shown is monthly PM2.5 in µg/m3

    Government including the courts will not accept data from low cost air quality monitors operated by citizens. We have shown that the data collected by large number of low cost monitors is as good as the expensive government monitors. The citizens air quality monitors provide better real time data and are able to capture the variations from location to location more effectively. We hope the data can be used in discussions to influence government to install more government real time air quality monitors.

  • Faster and better transfer learning training with deep neural networks (AI) to detect eye diseases

    This is an continuation of my previous article:

    Helping Eye Doctors to see better with machine learning (AI)

    In this previous article, I  explain the transfer learning approach to train a deep neural network with 94% accuracy to diagnose three kinds of eye diseases along with normal eye conditions. In this article, I will explain a different and a better approach to transfer learning to achieve >98% accuracy at 1/10th of the original training speed.

    In this new article, I will provide a background of the previoust implementation and the drawbacks of the previous approach. Next, I will provide an overview of the new approach. Rest of the article will explain the new method in detail with annotated Python code samples. I have posted the links at the end of the article for you to try out the methodology and  the new model.

    Part 1 – Background and Overview

    Transfer learning – using a fully trained model as a whole

    The previous article utilized the following method of transfer learning.

    • Use InceptionV3 model previously trained with imagenet dataset. Remove the fully connected layers and the classifier at the end of the network. Let us call this model, the base model.
    • Lock the base model so that it does not get trained with the training images.
    • Attach few fully connected layers and a 4 way softmax classifier at the end of the network that have been randomly initialized.
    • Train the network by feeding the images randomly for multiple iterations (epochs).

    This model was inefficient for the following reasons:

    • Could not achieve state of the art accuracy of 96% but could achieve only 94%.
    • Best performing model was obtained after 300 epochs.
    • Each epoch took around 12 minutes to train as the image data was fed through the whole InceptionV3 model plus the new layers in every epoch.
    • The whole training effort run took 100 hours! (4 days).
    • Long training time per epoch made it difficult to explore different end layer topologies, learning rates, and number of units in each layer.

    Transfer learning – extract features (bottlenecks), save them and feed to a shallow neural network

    In the previous approach, each image was fed to the base model and the output of the base model was fed into the new layers. As the base model parameters (weights) were not updated, we were just doing the same computation in the base model in each epoch!

    In the new approach,  we use the following methods:

    First, we feed all the images (training and validation) to extract the output of the base InceptionV3 model. Save the outputs, i.e, features (bottlenecks)  and the associated labels in a file.

    Next, build a shallow neural network with the following layers:

    • Convolution 2d layer that can take the saved features as input.
    • Batch normalization layer to increase speed and accuracy.
    • Relu activation.
    • Dropout layer to prevent overfitting.
    • Dense layer with 4 units (corresponding to 4 output classes) with softmax activation
    • Use adam optimizer with learning rate of 0.001.

    Next, feed the saved features to the shallow network and train the model. Save the best performing model found during training and reduce the learning rate if the validation loss remains flat for 5 epochs.

    While making predictions, feed the image first to the InceptionV3 (trained in imagenet), and feed its output to the shallow network. Use the first convolutional layer in the shallow network to create occlusion maps.

    This approach gave the following results:

    • Best performing model at 99.10% accuracy
    • Repeatable accuracy at >98%
    • Each epoch take around 1.5 minutes compared to 12 minutes as before.
    • Requires only 50 epochs (75 minutes) when compared to 500 epochs (100 hours) to achieve convergence.
    • Model size has reduced from 84MB to 1.7MB

    In the rest of the article I will explain the new method in detail with annotated Python code samples. I have posted the links at the end of the article for you to try out the methodology and the new model.

    Part 2 – Implementation

    Extract features using imagenet trained InceptionV3 model

    Refer to: https://github.com/shivshankar20/eyediseases-AI-keras-imagenet-inception/blob/master/Features-Extract.ipynb

    Import the required modules and load the InceptionV3 model

    from keras.applications.inception_v3 import InceptionV3, conv2d_bn
    from keras.models import Model
    from keras.layers import Dropout, Flatten, Dense, Input
    from keras import optimizers
    import os
    import numpy as np
    from keras.preprocessing.image import ImageDataGenerator
    import h5py
    from __future__ import print_function
    
    conv_base = InceptionV3(weights='imagenet', include_top=False)
    

    Import the required modules including conv2d_bn function from Keras applications. This handy conv2d_bn function create a convolution 2d layer, batch normalization, and relu activation.

    We then load the InceptionV3 model with imagenet weights without the top fully connected layers.

    Extract features by feeding images and save the features to a file

    train_dir = '../OCT2017/train' 
    validation_dir = '../OCT2017/test'
    
    def extract_features(file_name, directory, key, 
       sample_count, target_size, batch_size, 
       class_mode='categorical'):
        
        h5_file = h5py.File(file_name, 'w')
        datagen = ImageDataGenerator(rescale=1./255)
    
        generator = datagen.flow_from_directory(directory, 
          target_size=target_size,
          batch_size=batch_size, class_mode=class_mode)
        
        samples_processed = 0
        batch_number = 0
        if sample_count == 'all':
            sample_count = generator.n
              
        print_size = True
        for inputs_batch, labels_batch in generator:
            features_batch = conv_base.predict(inputs_batch)
            
            if print_size == True:
                print_size = False
                print('Features shape', features_batch.shape)
                
            samples_processed += inputs_batch.shape[0]
            h5_file.create_dataset('features-'+ str(batch_number), data=features_batch)
            h5_file.create_dataset('labels-'+str(batch_number), data=labels_batch)
            batch_number = batch_number + 1
            print("Batch:%d Sample:%d\r" % (batch_number,samples_processed), end="")
            if samples_processed >= sample_count:
                break
      
        h5_file.create_dataset('batches', data=batch_number)
        h5_file.close()
        return
    
    extract_features('./data/train.h5', train_dir, 
       key='train', sample_count='all', 
       batch_size=100, target_size=(299,299))
    
    extract_features('./data/validation.h5', validation_dir,
      key='validation', sample_count='all', 
      batch_size=100, target_size=(299,299))

    Using Keras image generator functionality we process sample_count images with batch_size images in a batch. The output is stored in a h5 file as values with the following keys:

    batches : Total number of batches. Each batch will have batch_size number of images and the last batch might have less than batch_size images.

    features-<batch_number> (Example: features-10): extracted features of shape (100,  8, 8, 2048) for batch number 10. Here is the 100 is number of images per batch (batch_size) and (8, 8, 2048) is the feature map. This is the output of mixed 9 layer of InceptionV3.

    labels<-batch_number> (Example: labels-10): extracted labels of shape (100, 4) for batch number 10. Here 100 is the batch size and 4 is the number of output classes.

    Build and train a shallow neural network

    Refer to: https://github.com/shivshankar20/eyediseases-AI-keras-imagenet-inception/blob/master/Features-Train.ipynb

    Import the required modules

    import keras
    from keras.applications.inception_v3 import InceptionV3, conv2d_bn
    from keras.models import Model
    from keras.layers import Dropout, Flatten, Dense, Input
    from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
    from keras import optimizers
    import os
    import numpy as np
    from keras.preprocessing.image import ImageDataGenerator
    import h5py
    import matplotlib.pyplot as plt
    from __future__ import print_function
    %matplotlib inline

    Setup a generator to feed saved features to the model

    def features_from_file(path, ctx):
        h5f = h5py.File(path, 'r')
        batch_count = h5f['batches'].value
        print(ctx, 'batches:', batch_count)       
        
        def generator():
            while True:
                for batch_id in range(0, batch_count):
                    X = h5f['features-' + str(batch_id)]
                    y = h5f['labels-' + str(batch_id)]
                    yield X, y
                
        return batch_count, generator()
    
    train_steps_per_epoch, train_generator = features_from_file('./data/train-ALL.h5', 'train')
    validation_steps, validation_data = features_from_file('./data/validation-ALL.h5', 'validation')

    Here, we setup two generators to read  features and labels stored in h5 files. We have renamed the h5 files so that we don’t overwrite by mistake during another round of feature extraction.

    Build a shallow neural network model

    np.random.seed(7) 
    inputs = Input(shape=(8, 8, 2048)) 
    x = conv2d_bn(inputs, 64, 1, 1) 
    x = Dropout(0.5)(x) 
    x = Flatten()(x) 
    outputs = Dense(4, activation='softmax')(x) 
    model = Model(inputs=inputs, outputs=outputs) 
    model.compile(optimizer=optimizers.adam(lr=0.001), 
       loss='categorical_crossentropy', metrics=['acc'])
    model.summary()

    The input shape should match the shape of the saved features.  We use Dropout to add regularization so that the model does overfit data. Model summary is shown below:

    Typically, one would use only fully connected layers. Here, we use convolutional layer so that we can visualize occlusion maps.

    Train the model, save the best model and tune the learning rate

    # Setup a callback to save the best model
    callbacks = [ 
        ModelCheckpoint('./output/model.features.{epoch:02d}-{val_acc:.2f}.hdf5', 
          monitor='val_acc', verbose=1, save_best_only=True, 
          mode='max', period=1),
                 
        ReduceLROnPlateau(monitor='val_loss', verbose=1, 
         factor=0.5, patience=5, min_lr=0.00005)
                ]
    
    history = model.fit_generator(
       generator=train_generator, 
       steps_per_epoch=train_steps_per_epoch,  
       validation_data=validation_data, 
       validation_steps=validation_steps,
       epochs=100, callbacks=callbacks)

    Using ModelCheckpoint keras callback, we want to save the best performing model based on validation accuracy. This check and save is done for every epoch (period parameter).

    Using ReduceLROnPlateau keras callback we monitor validation loss. If the validation loss remains flat for 5 (patience parameter) epochs, apply a new learning rate by multiplying the old learning rate with 0.5 (factor parameter) but never reduce the learning rate below 0.00005 (min_lr parameter).

    If everything goes well, you should have a best models saved in the disk. Please refer to the github repo for the code to display the accuracy and loss graphs.

    Evaluate the model

    Refer to: https://github.com/shivshankar20/eyediseases-AI-keras-imagenet-inception/blob/master/Features-Evaluate.ipynb

    Import the required modules and load the saved model

    import os
    import numpy as np
    
    import keras
    from keras.applications.inception_v3 import InceptionV3
    from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
    
    from keras.models import load_model
    from keras import backend as K
    
    from io import BytesIO
    from PIL import Image
    import cv2
    
    import matplotlib.pyplot as plt
    import matplotlib.image as mpimg
    from matplotlib import colors
    
    import requests
    
    #set the learning phase to not training
    K.set_learning_phase(0) 
    base_model = InceptionV3(weights='imagenet', 
      include_top=False)
    model = load_model('output/model.24-0.99.hdf5')

    We need to load the InceptionV3 imagenet trained model as well as the best saved model.

    Evaluate the model by making predictions and viewing the occlusion maps for multiple images

    # Utility functions
    classes = ['CNV', 'DME', 'DRUSEN', 'NORMAL']
    # Preprocess the input
    # Rescale the values to the same range that was used during training 
    def preprocess_input(x):
        x = img_to_array(x) / 255.
        return np.expand_dims(x, axis=0) 
    
    # Prediction for an image path in the local directory
    def predict_from_image_path(image_path):
        return predict_image(load_img(image_path, target_size=(299, 299)))
    
    # Prediction for an image URL path
    def predict_from_image_url(image_url):
        res = requests.get(image_url)
        im = Image.open(BytesIO(res.content))
        return predict_from_image_path(im.fp)
        
    # Predict an image
    def predict_image(im):
        x = preprocess_input(im)
        x = base_model.predict(x)
        pred = np.argmax(model.predict(x))
        return pred, classes[pred]
    
    image_names = ['DME/DME-30521-15.jpeg',      'CNV/CNV-154835-1.jpeg', 
                   'DRUSEN/DRUSEN-95633-5.jpeg', 'NORMAL/NORMAL-12494-3.jpeg']
    
    for image_name in image_names:
        path = '../OCT2017/eval/' + image_name
        print(predict_from_image_path(path))
        grad_CAM(path)
    
    

    While making predictions, we need to feed the image to the base model (InceptionV3) and then feed its output to our shallow model.

    Occlusion map

    The above image shows which part of the image did the model look at to make the prediction.

    The Gradient-weighted Class Activation Mapping (Grad-CAM) technique is being used to produce these occlusion maps. For the grad_CAM source and code to show incorrect predictions, refer to the github repo.

    Part 3 – Summary and Download links

    In this article, I showed how to feed all the images (training and validation) to extract the output of the base InceptionV3 model.  We saved the outputs, i.e, features (bottlenecks)  and the associated labels in a file.

    We created a shallow neural network, fed the saved features to the shallow network and trained the model. We saved the best performing model found during training and reduced the learning rate if the validation loss remains flat for 5 epochs.

    We made predictions by first feeding the image to the InceptionV3 (trained in imagenet), and then fed its output to the shallow network. Using the first convolutional layer in the shallow network we created occlusion maps.

    This approach gave the following results:

    • Best performing model at 99.10% accuracy
    • Repeatable accuracy at >98%
    • Each epoch take around 1.5 minutes compared to 12 minutes as before.
    • Requires only 50 epochs (75 minutes) when compared to 500 epochs (100 hours) to achieve convergence.
    • Model size has reduced from 84MB to 1.7MB

    Full source code along with the best performing model is available at:

    https://github.com/shivshankar20/eyediseases-AI-keras-imagenet-inception

    Want to know details about the eye diseases and how to setup a GPU based hardware for your training? Please refer to my first article:

    Helping Eye Doctors to see better with machine learning (AI)

    I hope enjoyed reading the article! Please share your feedback and experience.