Websites are regarded as domains of limitless information which anyone and everyone can access. The new trend of technology put us to change the way we are doing our business. The Internet now is fastly becoming a new place for business and the advancement in this technology gave rise to the number of e-commerce websites. This made the lifestyle of marketers/vendors, retailers and consumers (collectively regarded as users in this paper) easy, because it provides easy platforms to sale/order items through the internet. This also requires that the users will have to spend a lot of time and effort to search for the best product deals, products updates and offers on e-commerce websites. They have to filter and compare search results by themselves which takes a lot of time and there are chances of ambiguous results. In this paper, we applied web crawling and scraping methods on an e-commerce website to get HTML data for identifying products updates based on the current time. The HTML data is preprocessed to extract details of the products such as name, price, post date and time, etc. to serve as useful information for users.
翻译:新的技术趋势使我们改变我们的业务方式。互联网现在正在迅速成为一个新的商业场所,而这一技术的进步也带来了电子商务网站的数量。这使得市场商/供应商、零售商和消费者(本文中统称为用户)的生活方式变得容易,因为它提供了通过互联网销售/订购项目的简易平台。这也要求用户花费大量时间和精力在电子商务网站上寻找最佳产品交易、产品更新和提供。他们必须自己过滤和比较搜索结果,这些结果需要大量时间,并有可能产生模糊的结果。在本文中,我们在一个电子商务网站上应用了网上爬行和剪切除方法,以获取HTML数据,用于根据当前时间确定产品更新情况。 HTML数据是用来提取产品细节的,如名称、价格、日期和时间等,以便作为用户的有用信息。