Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Simplifying MongoDB for Data Scientists: Essential Commands You Should Know
Latest   Machine Learning

Simplifying MongoDB for Data Scientists: Essential Commands You Should Know

Last Updated on July 17, 2023 by Editorial Team

Author(s): Gaurav Nair

Originally published on Towards AI.

A Guide to NoSQL Fundamentals and MongoDB Commands for Beginners

Image source: https://www.mongodb.com/brand-resources

Table of Contents

  1. Introduction
  2. What is NoSQL?
  3. Limitations of RDBMS and the need for NoSQL
  4. SQL vs NoSQL
  5. MongoDB
  6. How do Document Databases work?
  7. Installing MongoDB
  8. MongoDB Commands
    1. Database Commands
    2. Collection Commands
    3. CRUD Commands
    4. Comparison Operators
    5. Query Modifiers
    6. Field Update Operators
    7. Logical Operators
  9. Conclusion

Introduction

To achieve success today, every business must spend a good deal of time and money analyzing past data. This is no longer an era when companies would trust their gut while making decisions. From generating more revenue, minimizing risks, and strategically planning the future, data-driven decisions can help businesses in many ways. However, more than just having the data is needed. As the British Mathematician and data science pioneer, Clive Humby said:

“Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; so, must data be broken down, analyzed for it to have value.”

Hence, simply having the data is not enough to derive value from it. It is really important to have business-relevant data, tools to efficiently store and analyze the data, and the skill to extract meaningful insights. Relational Database Management System(RDBMS) is one of the popular and widely used software systems that enable businesses to manage relational databases, i.e., store and organize the data into tables with rows and columns. But there are schema-less data meaning that data can be unstructured and would not fit into the traditional RDBMS table format. This is where “NoSQL” short for “Not only SQL” comes into play.

What is NoSQL?

NoSQL databases are databases that store data in a format other than relational tables. Depending on the type of the data model, NoSQL databases come in a variety of types:

  1. Document Databases: Document databases store data in the form of documents, usually JSON, BSON, or XML documents. A document can be nested that is, it can contain other documents or lists within them. This allows them to have more complex data to be stored within a single document.
  2. Key-Value Databases: These databases store data as key-value pairs. Data can be retrieved using the unique key assigned to each element. They can store simple data types like numbers or strings.
  3. Column-Oriented Databases: These databases store data in columns in contrast to rows in relational databases. This is beneficial if you want to run analytics on a small number of columns. This makes the data easier to read and do faster operations.
  4. Graph Databases: These databases use graph structure to store data. It has nodes that store the data and the connection between different nodes defines the relationship between them.

Popular NoSQL databases are MongoDB, Cassandra, CouchDB, Redis, etc. We will discuss the fundamentals of MongoDB in this article.

Limitations of RDBMS and the need for NoSQL

Let’s say you have been appointed to maintain the records of all students in a school. You are using a traditional table format to keep all the records. The table has columns such as Name, Student ID, Email, Phone Number, Address, State, and Postal Code. The table looks like this:

Student Records. Image by the Author.

Now suppose, there is a need to add home phone numbers for some of the students. You added a new column named ‘home’ to accommodate these numbers. However, suppose an additional phone number (or any other information such as an Emergency contact number, other address, etc.) needs to be added for some students; this will result in a lot of empty cells.

Updated student records with empty cells. Image by the Author.

If you would want to analyze the data in the future, there will be a lot of problems dealing with these empty cells.

One solution to this problem is to create multiple tables with a primary key column to link them using joins when needed. However, this can add complexity when dealing with large amounts of data and can make it difficult to add new features or efficiently retrieve data.

An alternative approach is to use a document-based database where each record is stored as a document. This allows for more flexibility in adding details to individual records without the need for empty cells or complex table joins. Documents are easy for both computers to process and humans to read. The above table with empty cells will look like this as a JSON document:

Relational Table with empty cells converted to JSON. Image by the Author.

NoSQL databases were created in response to the limitations of relational databases. These limitations include inflexible schema, difficulty in horizontal scaling to handle large volumes of data and traffic, slower query operations due to complex joins, and a data model that is not diverse enough to fit all types of data and its applications. NoSQL databases address these limitations as they have a schemaless design, employ distributed architecture which helps in horizontal scaling and faster query operations, and support a variety of data types and use cases.

SQL vs. NoSQL

SQL vs NoSQL. Source: https://www.mongodb.com/nosql-explained/nosql-vs-sql

MongoDB

MongoDB is a database program based on the document model. It came out in 2009 and it quickly gained a lot of popularity among developers as it allowed the developers to work with data in the same document format they were using. It also provided developers with its native drivers through which they were able to work seamlessly with data. Some of the top companies that use MongoDB are Forbes, Toyota, Senoma, and AstraZeneca, to name a few.

Some of the important features that MongoDB provides are:

  1. Schema Validation: MongoDB uses a flexible schema model, that is, documents inside the collection can have different fields. However, once the application schema is established, we can use schema validation to ensure there are no unintended schema changes.
  2. ACID Compliance: ACID is short for Atomicity, Consistency, Isolation, and Durability. MongoDB supports multi-document ACID transactions. That is, you can perform transactions that involve multiple documents and be confident that the changes will be processed reliably.
  3. Scaling: MongoDB was built with distributed systems in mind, allowing the developers to scale their data vertically and horizontally. Vertical scaling is adding more resources to a single machine while horizontal scaling is adding more machines to a system to spread out your work.
  4. Sharding: Sharding is distributing data across multiple machines. It allows for horizontal scaling by dividing the dataset over multiple servers which in turn increases its capacity while deploying large datasets.
  5. Client-side field-level encryption: This feature allows you to encrypt sensitive data fields before sending them to the server. If you have some sensitive information in your data, such as credit card numbers, you would want to encrypt it, and MongoDB provides that option on the go.
Sharding, Scaling, and the Schemaless structure in MongoDB. Image by the Author.

MongoDB provides various useful resources for anyone getting started. You can check out MongoDB’s basic course here.

Once you are familiar with MongoDB, you also have the option to get a certification directly from them. Getting certified increases your marketability in current as well as future roles. Showcasing the certifications on your resume can attract bigger and better opportunities. MongoDB provides two certification exams: MongoDB Associate DBA Exam and MongoDB Associate Developer Exam. You can learn more about the certifications here.

Advantages of MongoDB

  • Schema-less, flexible storage format(JSON type)
  • Very flexible queries
  • High speed and easily scalable via replication and sharding
  • Available as a highly powerful community edition

MongoDB terms and their counterparts to Relational Databases

Documents — Just as rows in Relational databases, MongoDB has documents to store the data. Every document starts and ends with curly braces and they contain the field-value pair. Values in MongoDB can be of a variety of types, including strings, numbers, arrays, timestamps, etc.

Collections — A collection is a group of documents similar to a table in relational databases. MongoDB stores data as documents that are gathered together in a collection. A database can store one or more collections.

Mongod — Mongod or the Mongo Daemon is the host process for the database. Mongod does all the operations, such as handling the data requests from the MongoDB shell or MongoDB drivers, managing data access, and performing background management options.

Mongosh — Mongosh is the MongoDB Shell where you can write queries and operations to interact with your database.

RDBMS to NoSQL. Image by the Author.
Collections, Documents, Fields, and Values in MongoDB. Image by the Author.

How do Document Databases work?

As we have already discussed, document databases store information in documents. When data is inserted into MongoDB, it is inserted as a JSON document. This makes it easy for us to work with, as JSON is a human-friendly format that is easily readable and writable.

However, in the backend, MongoDB stores data as BSON, which stands for Binary JSON. BSON is a binary-encoded format that is optimized for storage and retrieval performance. It is similar to JSON in structure but includes additional data types such as dates and binary data.

Examples of JSON and BSON Documents generated using chatGPT

BSON is the native data format used in MongoDB and is the internal representation of data within the database system. Using BSON documents in the backend provides various benefits to MongoDB, some of them are:

  1. Efficiency: BSON documents are compact and occupy lesser storage compared to JSON documents. The binary format reduces the storage footprint and improves read and write performance.
  2. Data Types: As discussed earlier, BSON supports more data types than JSON, like dates and binary data. These data types allow MongoDB to handle diverse and complex data.
  3. Indexing: The binary format of BSON allows the indexing of the documents. These indexes on the BSON fields allow for faster query performance and faster retrieval of data.
  4. Native Integration: By storing data as BSON documents, MongoDB seamlessly works with its drivers and libraries. This allows developers to easily interact with MongoDB databases using their preferred programming language.

Installing MongoDB

If you wish to manage the deployment yourself, you can take a look at MongoDB Agent, which is a service that helps you perform operations, backup, and monitor the deployment.

If you wish to opt-in for hosted MongoDB solutions, then you can look for multiple service providers that offer MongoDB as a service. Some of the hosted MongoDB solutions are

  1. Clever Cloud
  2. IBM Cloud
  3. ObjectRocket
  4. ScaleGrid
  5. DigitalOcean
  6. MongoDB Atlas

You have the option to download and install MongoDB Community Edition and MongoDB Enterprise Edition. Follow this link for a detailed guide on installing MongoDB on different operating systems.

We will be working with the MongoDB Community Edition for this article.

MongoDB Commands

To become familiar with MongoDB commands, let’s start by creating a Database and inserting Collections and Documents in it. I will be creating a Database named playlist which will have song names, artists, release year, and genre.

So, let’s open MongoDB Compass and create a Database named ‘fav_songs’. Before creating a Database, we will need to connect it to a MongoDB deployment. You can simply do that by clicking on ‘New Connection’. The default port for MongoDB is 27017, we will leave everything as it is and click on “Connect”.

MongoDB Compass. Image by the Author.

Once connected, we need to go to the Database tab and click on the Create Database button. Fill in the database name, collection name, and click on Create database.

Creating Database and Collection in MongoDB. Image by the Author.

Importing your data

Once you have created your database, you can import your data in MongoDB Compass through the ‘Add Data’ button. You will have the option to select a JSON or CSV file from your local system and you also have the option to insert documents manually. We will start by adding the data manually using the MongoDB Shell.

To add the data, copy the connection string on Compass and open Mongosh.

Connection String in MongoDB. Image by the Author.

After opening Mongosh, you will be prompted to enter the connection string, paste the copied connection string and hit enter. Let’s look at the database and collection commands in MongoDB.

Database Commands

Create a new Database or switch to a Database
Syntax — use dbName

use fav_songs

View current Database
Syntax — db

db

View Database
Syntax — show dbs

show dbs

Delete Database
Syntax — db.dropDatabase()

db.dropDatabase()

{ ok: 1, dropped: ‘fav_songs’ } confirms the database was deleted. Always ensure to check the database you are working on using the db command before dropping the database as this operation cannot be reversed.

Database Commands in MongoDB. Image by the Author.

Collection Commands

View Collections
Syntax — show collections

show collections

Drop a collection
Syntax — db.collection.drop()

db.playlist.drop()

Create a collection
Syntax — db.createCollection(‘collection_name’)

db.createCollection('playlist')
Collection Commands in MongoDB. Image by the Author.

As we have seen the basic Database and Collection commands, let’s look at the CRUD (Create, Read, Update, Delete) commands in MongoDB.

CRUD Commands

Insert(): Inserts a single document or multiple documents into the collection.

Syntax — db.collection.insertOne({ <document>})

db.playlist.insertOne({song_name: 'Under The Bridge', artists: 'Red Hot Chilli Peppers', genre: 'Rock, Alternative Rock', year: 1991})

Syntax — db.collection.insertMany([{ <document> }, { <document>}])

db.playlist.insertMany([{song_name: 'Otherside', artists: 'Red Hot Chilli Peppers', genre: 'Rock, Alternative/Indie', year: 1999}, {song_name: 'Snow', artists: 'Red Hot Chilli Peppers', genre: 'Rock, Alternative/ Indie', year: 2006}])
Insert Commands in MongoDB. Image by the Author.

find(): Retrieves documents from a collection.

Syntax — db.collection.find()

db.playlist.find()

The find() method with no parameter will return the first 20 documents.

Find Command in MongoDB. Image by the Author.

We can also pass multiple conditions as parameters for the find() command.

db.playlist.find({song_name: 'Otherside'})
Find Command in MongoDB. Image by the Author.

updateOne(): Updates or modifies existing document in a collection

Syntax — db.collection.updateOne({}, {})

db.playlist.updateOne({song_name: 'Under The Bridge'}, {$set: {artists: 'RHCP'}})
Update One Command in MongoDB. Image by the Author.

There are a couple of terms returned as the update operator was executed. Let’s understand each of them.

‘acknowledged: true’ indicates that the operation was acknowledged by the server.

‘insertedID: null’ indicates that the update operation did not insert a new document and, therefore there is no insert id.

‘matchedCount: 1’ indicates the number of documents that matched the value.

‘modifiedCount: 1’ indicates the number of documents that were modified.

‘upsertedCount: 0’ indicates the number of documents that were upserted. Upsert is simply an update and an insert operation. If MongoDB finds the matching document, it will update the document, and if it does not find the matching document, it will insert a new document with the specified value.

I have also used the $set operator in this query which is a field update operator that replaces the value of a field with a specified value. We will learn more about it in the later part of this article.

updateMany(): Updates all documents that match the specified filter for a collection.

Syntax — db.collection.updateMany({}, {})

db.playlist.updateMany({artists: 'Red Hot Chilli Peppers'}, {$set: {artists: 'RHCP'}})
Update Many Command in MongoDB. Image by the Author.

deleteOne(): Deletes documents from the collection.

Syntax: db.collection.deleteOne({})

db.playlist.deleteOne({song_name: 'Under The Bridge'})
Delete One Command in MongoDB. Image by the Author.

deleteMany(): Removes all document that matches the specified value.

Syntax: db.collection.deleteMany({})

db.playlist.deleteMany({artists: 'Red Hot Chilli Peppers'})
Delete Many Command in MongoDB. Image by the Author.

We can see the deleted count as 2 that means, now our collection is empty.

Let’s create a new database to see some of the essential operators in action. If you are following along, create a new employee database and collection.

In Mongosh, run the use database_name command to ensure you are on the database you just created. Copy the below command to insert the data in the collection. The data consists of 5 documents of employee details.

db.employee_data.insertMany([{'Name': 'Aarav Gupta', 'Age': 27, 'City': 'Mumbai', 'contact': {'email': "[email protected]", 'phone': '555-123-4567'}, 'Branch': 'Mechanical', 'Passing Year': 2018, 'Salary': 50000}, {'Name': 'Riya Sharma', 'Age': 31, 'City': 'Delhi', 'contact': {'email': "[email protected]", 'phone': '555-234-5678'}, 'Branch': 'Computers', 'Passing Year': 2015, 'Salary': 65000}, {'Name': 'Rohan Patel', 'Age': 24, 'City': 'Ahmedabad', 'contact': {'email': "[email protected]", 'phone': '555-345-6789'}, 'Branch': 'Electronics', 'Passing Year': 2019, 'Salary': 45000}, {'Name': 'Niharika Singh', 'Age': 29, 'City': 'Kolkata', 'contact': {'email': "[email protected]", 'phone': '555-456-7890'}, 'Branch': 'Mechanical', 'Passing Year': 2016, 'Salary': 55000}, {'Name': 'Raghav Sharma', 'Age': 28, 'City': 'Hyderabad', 'contact': {'email': "[email protected]", 'phone': '555-567-8901'}, 'Branch': 'Mechanical', 'Passing Year': 2017, 'Salary': 52000}])

Comparison Operators

$eq — (=)
This is an equal operator. Returns values that are equal to a specified value. Let’s see who completed their graduation in 2016.

db.employee_data.find({'Passing Year': {$eq: 2016}})
$eq operator in MongoDB. Image by the Author.

In a similar fashion, we can use the rest of the comparison operators.

$gt — (>)
This is greater than operator. Returns values that are greater than the specified value.

$gte — (≥)
Greater than or equal to operator. Returns values that are greater than or equal to a specified value.

$lt — (<)
Lower than operator. Return values that are lower than a specified value.

$lte — (≤)
Lower than or equal to operator. Returns values that are lower than or equal to a specified value.

$ne — (≠)
Not equal to operator. Returns values that are not equal to a specified value.

$in — (ϵ)
Matches any of the values specified in an array. We had like to see all those who are from the Computers and Electronics branch. We can use the $in operator to write this query.

db.employee_data.find({'Branch': {$in: ['Computers', 'Electronics']}})
$in operator in MongoDB. Image by the Author.

$nin
Matches none of the values specified in an array. It is not $in operator.

Query Modifiers

limit(): Limits the number of documents you wish to see.

db.employee_data.find().limit(1)

count(): Used to get the number of documents in a collection that match certain criteria. Let’s check the count of employees who are from the Mechanical Branch.

db.employee_data.find({Branch: {$eq: 'Mechanical'}}).count()

sort(): Specifies the order in which the query returns the documents that match the specified criteria. 1 is used for ascending order and -1 for descending order. Arranging the data in ascending order according to the employees’ salaries.

db.employee_data.find().sort({Salary: 1})
Query modifiers in MongoDB. Image by the Author.

Field Update Operators

$currentDate: Sets the value of a field to the current date and timestamp. Suppose we hired one of the employees Riya Sharma today and we need to update it. We can do this easily using the query:

db.employee_data.updateOne({Name: 'Riya Sharma'}, {$currentDate: {'joining_date': true}})
$currentDate operator in MongoDB. Image by the Author.

$inc: Increments the value of a field by a specified amount. In the database, we need to update the details of one of the employees, Raghav Sharma. The passing year needs to be updated as 2019 and the salary as 50000. Let’s use the $inc operator to update.

db.employee_data.updateOne({Name: 'Raghav Sharma'}, {$inc: {'Passing Year': 2, 'Salary': -2000}})
$inc operator in MongoDB. Image by the Author.

$min: Updates the value of the field to a specified value if the specified value is less than the current value of the field. Suppose we implement a salary cap for employees, say we set the cap at 60000, we can use the $min operator to ensure that no employee receives a salary higher than this amount. As we are running this query on the entire dataset, we will leave the initial braces or the filter parameter empty.

db.employee_data.updateMany({}, {$min: {Salary: 60000}})
$min operator in MongoDB. Image by the Author.

$max: Updates the value of the field to a specified value if the specified value is greater than the current value of the field. Suppose we want to put a bar on the lowest salary an employee gets, say 50000, we can use the $max operator and run it on the entire data to ensure the minimum salary an employee gets is 50000.

db.employee_data.updateMany({}, {$max: {Salary: 50000}})
$max operator in MongoDB. Image by the Author.

$mul: Multiplies the value of a field by a specified amount. With respect to the performance of the employees, the management has shortlisted Riya Sharma and Raghav Sharma to get a salary increment of 150% each. Let’s update their salaries on the database.

db.employee_data.updateMany({Name: {$in: ['Riya Sharma', 'Raghav Sharma']}}, {$mul: {Salary: 1.5}})
$mul operator in MongoDB. Image by the Author.

$rename: Used to rename a field. Let’s change the field, ‘Passing Year’ to ‘passing_year’ in this data.

db.employee_data.updateMany({}, {$rename: {'Passing Year': 'passing_year'}})
$rename operator in MongoDB. Image by the Author.

$set: Sets or replaces the value of a field with a specified value in a document. We have already seen it in action before. For instance, consider the scenario where we need to update the city of an employee, Rohan Patel in our database.

db.employee_data.updateOne({Name: 'Rohan Patel'}, {$set: {City: 'Vadodara'}})
$set operator in MongoDB. Image by the Author.

Logical Operators

$and: Returns all documents that match the specified condition/conditions given by the user. We had like to fetch the employee details using their contact details, we can do so using the $and operator.

db.employee_data.find({$and: [{'contact.email': '[email protected]'}, {'contact.phone': '555-456-7890'}]})
$and operator in MongoDB. Image by the Author.

$not: Returns the inverse of a query expression or returns documents that do not match the query expression. All the employees whose salary is not greater than 50000.

db.employee_data.find({Salary: {$not: {$gt: 50000}}})
$not operator in MongoDB. Image by the Author.

$nor: Returns all documents that fail to match both clauses. We would like to filter out the candidates who are not from Vadodara and do not have a Mechanical branch.

db.employee_data.find({$nor: [{City: 'Vadodara'}, {Branch: 'Mechanical'}]})
$nor operator in MongoDB. Image by the Author.

If you thought that we would have got the same results using the $not operator, unfortunately, that would not be the case. $not operator is used to negate a single expression specified by the user, while the $nor operator can negate multiple expressions.

$or: Returns all documents that match the condition of either clause.

db.employee_data.find({$or: [{Branch: 'Computers'}, {passing_year: 2019}]})
$or operator in MongoDB. Image by the Author.

Conclusion

This is just an overview of the essential MongoDB commands Data Scientists needs to know in order to interact with non-relational data. However, this is just the tip of the iceberg when it comes to unlocking the full capabilities of MongoDB. To become familiar with each of the commands and operators explained in this article, it is essential to engage in regular practice and experimentation. This will help you get a better and more intuitive understanding of how to work with data in MongoDB.

Learning is an ongoing process, and there is always more to discover. It is recommended to explore and practice different concepts and techniques in order to expand and deepen your understanding. If you have learned anything from this article, please show your appreciation by leaving a clap(up to 50 claps per user are allowed).

Thank you for taking the time to read and explore with me! U+1F642

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓