Notes on BeautifulSoup and Flask

Scraping prices from NTUC and Cold-Storage.

To help our users get the best prices for their ingredients, we wrote a script that returns a list of products offered by NTUC and Cold Storage that are close matches for the ingredient needed.

This was achieved using two packages:

Flask

A basic web-framework was created using Flask that accepts POST requests that take in a single attribute query and return an array of food products, each with the following attributes:

  • title

  • measurement

  • price

  • supermarket

  • link

BeautifulSoup4

We noticed that we could search through the websites of online supermarkets simply by appending the search-query to the end of the URL (an example of this is shown below.)

Then, all we had to do was to

  • take in the query string

  • replace the spaces in the string with filler characters (Cold Storage uses "+", NTUC uses "%20")

  • append this string to the URL of the website

  • get a HTML web-page returned by a simulated-browser using this URL

After this was done, we inspected the elements in the HTML document received and extracted the attributes that we wanted (e.g title, measurement, price) from the elements that they were contained in.

For example, we got the title of the food product from the title property contained in the img of each food product.

In the final step, we wrapped these attributes in an object and appended them to an array, which is passed to the front-end to be rendered.

The full-implementation can be found in the following GitHub repo.

Last updated