+14154293922 info@itsjungle.com


Web-scrape 120929 companies data including +7,000 email address from hktdc.com supplier reference

Under $250
Web Scraping OCR algorithms VPN   Posted 1 year ago


I work in a surveyor firm as a salesperson in hong kong. I need to find clients on a daily basis, but my boss company has virtually zero support on lead generation. 

I decide to build an excel spreadsheet , based on this Trade Development Council , TDC  link : http://www.hktdc.com/tc-supplier/
If you type in "limited" in search engine of company name in the website,   120929 companies data are available as at yesterday 16/6/2018 across 2016 pages using 60 results per page. 

one problem is: both the fax and the telephone numbers are stored in a picture in jpeg image. I advise OCR is used to convert the two set of numbers to text . 

second problem is: downloading the first 12 pages is smooth. downloading the 13th pages is blocked by the website. 

third problem:  the email address  are accessible TDC pages with supplier reference . logo .with double ticks.

3-day time is expected to finish the project. 21 June 2018 is deadline. 

Language in english. 

Total budget is US $50 is offered to project.  $19 for NDA fee. remaining dollars on currency conversion / handling fee. 

Skills And Expertise

Sorry Not Bid Placed Yet