Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Using NLP in Disaster Response
Latest   Machine Learning

Using NLP in Disaster Response

Last Updated on July 25, 2023 by Editorial Team

Author(s): Abhishek Jana

Originally published on Towards AI.

In this project, we’ll apply the ETL, NLP, and ML pipeline to analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages.

This is one of the most critical problems in data science and machine learning. During a disaster, we get millions and millions of messages either directly or via social media. We’ll probably see 1 in 1000 relevant messages. A few critical words like water, blocked roads, and medical supplies are used during a disaster response. We have a categorical dataset with which we can train an ML model to see if we identify which messages are relevant to disaster response.

Main Web Interface

Image by author

In this project, three main features of a data science project have been utilized:

  1. Data Engineering β€” In this section, I worked on how to Extract, Transform and Load the data. After that, I prepared the data for model training. For preparation, I cleaned the data by removing bad data (ETL pipeline), then used NLTK to tokenize and lemmatize the data (NLP Pipeline). Finally used, custom features like StartingVerbExtractor, and StartingNounExtractor to add new to the main dataset.
  2. Model Training β€” I used XGBoost Classifier to create the ML pipeline for model training.
  3. Model Deployment β€” For model deployment, I used the flask API.

This project is done on an anaconda platform using jupyter notebook jupyter notebook. Detailed instructions on how to install an anaconda can be found here. To create a virtual environment, see here

in the virtual environment, clone the repository :

git clone https://github.com/abhishek-jana/Disaster-Response-Pipelines.git

Python Packages used for this project are:

Numpy 
Pandas
Scikit-learn
NLTK
re
sqlalchemy
pickle
Flask
Plotly
  1. Run the following commands in the project’s root directory to set up your database and model.
  • To run an ETL pipeline that cleans data and stores it in the database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
  • To run an ML pipeline that trains classifiers and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl

2. Run the following command in the app’s directory to run your web app. python run.py

The project is structured as follows:

the data folder contains the data β€œdisaster_categories.csv”, and β€œdisaster_messages.csv” to extract the messages and categories. β€œDisasterResponse.db” is a cleaned version of the dataset saved in the SQLite database. β€œETL Pipeline Preparation.ipynb” is the jupyter notebook explaining the data preparation method. β€œprocess_data.py” is the python script of the notebook.

β€œML Pipeline Preparation.ipynb” is the jupyter notebook explaining the model training method. The relevant python file β€œtrain_classifier.py” can be found in the β€œmodels” folder. The final trained model is saved as β€œclassifier.pkl” in the β€œmodels” folder.

The app folder contains the β€œrun.py” script to render the visualization and results on the web. templates folder contains the .html files for the web interface.

The accuracy, precision, and recall are:

accuracy

precision and recall

Some of the predictions on messages are given as well:

message 1

message 2

message 3

In the future, I am planning to work on the following areas of the project:

  1. Testing different estimators and adding new features in the data to improve the model accuracy.
  2. Add more visualizations to understand the data.
  3. Improve the web interface.
  4. Based on the categories that the ML algorithm classifies text into, advise some organizations to connect to.

5. This dataset is imbalanced (i.e., some labels like water have few examples). In the README, discuss how this imbalance affects training the model and your thoughts about emphasizing precision or recall for the various categories.

The GitHub link of the project can be found here.

Acknowledgment

I am thankful to the Udacity Data Science Nanodegree program and figure eight for motivating me in this project.

I am also thankful to figure eight for making the data publicly available.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓