Introduction to Web Scraping with Python and Beautiful Soup

1,450,926
0
Published 2017-01-06
Web scraping is a very powerful tool to learn for any data professional. With web scraping the entire internet becomes your database. In this tutorial, we show you how to parse a web page into a data file (CSV) using a Python package called BeautifulSoup.

In this example, we web scrape graphics cards from NewEgg.com.

Find the updated version of this tutorial here:    • Web Scraping Tutorial with Python and...  

Python Code:
code.datasciencedojo.com/datasciencedojo/tutorials…

Sublime:
www.sublimetext.com/3

Anaconda:
www.anaconda.com/distribution/#download-section

JavaScript beautifier:
beautifier.io/

If you are not seeing the command line, follow this tutorial:
www.tenforums.com/tutorials/72024-open-command-win…

Table of Contents:
0:00 - Introduction
1:28 - Setting up Anaconda
3:00 - Installing Beautiful Soup
3:43 - Setting up urllib
6:07 - Retrieving the Web Page
10:47 - Evaluating Web Page
11:27 - Converting Listings into Line Items
16:13 - Using jsbeautiful
16:31 - Reading Raw HTML for Items to Scrape
18:34 - Building the Scraper
22:11 - Using the "findAll" Function
27:26 - Testing the Scraper
29:07 - Creating the .csv File
32:18 - End Result

--

At Data Science Dojo, we believe data science is for everyone. Our data science trainings have been attended by more than 10,000 employees from over 2,500 companies globally, including many leaders in tech like Microsoft, Google, and Facebook. For more information please visit: hubs.la/Q01Z-13k0

💼 Learn to build LLM-powered apps in just 40 hours with our Large Language Models bootcamp: hubs.la/Q01ZZGL-0

💼 Get started in the world of data with our top-rated data science bootcamp: hubs.la/Q01ZZDpt0

💼 Master Python for data science, analytics, machine learning, and data engineering: hubs.la/Q01ZZD-s0

💼 Explore, analyze, and visualize your data with Power BI desktop: hubs.la/Q01ZZF8B0

--

Unleash your data science potential for FREE! Dive into our tutorials, events & courses today!

📚 Learn the essentials of data science and analytics with our data science tutorials: hubs.la/Q01ZZJJK0

📚 Stay ahead of the curve with the latest data science content, subscribe to our newsletter now: hubs.la/Q01ZZBy10

📚 Connect with other data scientists and AI professionals at our community events: hubs.la/Q01ZZLd80

📚 Checkout our free data science courses: hubs.la/Q01ZZMcm0

📚 Get your daily dose of data science with our trending blogs: hubs.la/Q01ZZMWl0

--

📱 Social media links

Connect with us: www.linkedin.com/company/data-science-dojo

Follow us: twitter.com/DataScienceDojo

Keep up with us: www.instagram.com/data_science_dojo/

Like us: www.facebook.com/datasciencedojo

Find us: www.threads.net/@data_science_dojo

--

Also, join our communities:

LinkedIn: www.linkedin.com/groups/13601597/

Twitter: twitter.com/i/communities/1677363761399865344

Facebook: www.facebook.com/groups/AIandMachineLearningforEve…

Vimeo: vimeo.com/datasciencedojo

Discord: discord.com/invite/tj8ken4Err

_

Want to share your data science knowledge? Boost your profile and share your knowledge with our community: hubs.la/Q01ZZNCn0

#webscraping #python #beautifulsoup

All Comments (21)
  • I was able to make a program for my client i never thought was possible. I got paid real money for this. Blessings so much learned, this is like magic
  • It's weird to think about it like that, but this video started my whole Python learning back in 2017 and I am SO SO SO much thankful for it.
  • @harsh3305
    MINOR SUGGESTION As of 10/03/2019, If you are following along this tutorial. "container.div" won't give you the div with the "item-info" class. Instead it will give you the div with the "item-badges" class. This is because the latter occurs before the former. When you access any tag with the dot(.) operator, it will just return the first instance of that tag. I had a problem following this along until i figured this out. To solve this just use the "find()" method to find exactly the div which contains the information that you want. For e.g. divWithInfo = containers[0].find("div","item-info")
  • @delt19
    Coming from an R user, this is a very well done introductory tutorial into web scraping in Python. I like the real world example with Newegg and troubleshooting along the way.
  • @arjoon
    This was really good content, definitely the best intro to web scraping I've seen. You don't go through it as though you're reading from the documentation, there's more of a flow.
  • @saadiyafourie
    Absolute champion, quite possibly the best code tutorial I've ever watched. Oh the possibilities! Thank you :)
  • you look like a god when your writing multiple lines at the same time.
  • @YasarHabib
    This was by far the best introduction to web scraping I've found online. Clear, concise, and easy to digest. Thank YOU!
  • A BIG BIG THANK YOU: the most understable tutorial I've ever seen on how to scrape a web page (and I have visionned like 100 of them)
  • @pdubocho
    The man, the myth, the legend. You have no idea how much stress and lost time you have prevented. THANK YOU!
  • @andreabtahi9519
    I am just starting web scrapping and I can honestly say that this video clearly explained everything. I watched this at 1.5 speed and it made sense. I would love more videos like this. I loved how you made it generic so it can apply to more than one website!
  • @EustaceKirstein
    32:30, I started cheesing at how awesome the end result of this whole project was. Definitely inspiring - thank you for the excellent guide!
  • @evanzhao3887
    If you had some prior experiences with web crawling, this video can makes your crawling skills into a whole new level. Allows you to crawl website containing complicated info about multiple items into a very organized dataset. The various tools introduced in the video are also fantastically helpful as well. A BIG THANK YOU
  • @Datasciencedojo
    Table of Contents: 0:00 - Introduction 1:28 - Setting up Anaconda 3:00 - Installing Beautiful Soup 3:43 - Setting up urllib 6:07 - Retrieving the Web Page 10:47 - Evaluating Web Page 11:27 - Converting Listings into Line Items 16:13 - Using jsbeautiful 16:31 - Reading Raw HTML for Items to Scrape 18:34 - Building the Scraper 22:11 - Using the "findAll" Function 27:26 - Testing the Scraper 29:07 - Creating the .csv File 32:18 - End Result
  • @frozy3155
    wow even almost 3 years later this video helped me so much and helped me to make a program that picks a random steam game, this was so hard, but i figured it out, big props to you and this video <3
  • @brendanp9415
    This is the best web scraping tutorial that I’ve found. I’ve been frustrated for hours trying to use other resources. Thank you for making this, your explanations are thorough and great!
  • This is actually the coolest thing I've seen in my entire life. Wow. Thank you so much I love you man.
  • Truly enjoyed your simple step by step explanation on why each command or function is needed, and what it does. Your Python knowledge and skills are evident, as you are able to provide immediate solutions to errors and or challenges to the problem you are attempting to solve. Followed along with the tools and enjoyed the session. Thank you.
  • One of the best teacher I have come across Youtube. Web Scraping explained so well that even a layman can follow and understand the basic concepts. I wish, in life I had a teacher/mentor/friend like the one teaching in this video.