파이썬 selenium 으로 다나와 제품 검색 페이지 전환 크롤링

Web 크롤링/Python Crawling 2021. 6. 26. 18:17

이전 예제에서는 1페이지에 있는 내용만 크롤링했다면, 이번 예제에서는 모든 페이지를 전부 크롤링하는 방법이다.

# pip install selenium
# pip install chromedriver-autoinstaller 
# pip install bs4
 
from selenium import webdriver
import chromedriver_autoinstaller
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time
 
# 다나와 사이트 검색
 
options = Options()
options.add_argument('headless'); # headless는 화면이나 페이지 이동을 표시하지 않고 동작하는 모드
 
# webdirver 설정(Chrome, Firefox 등)
chromedriver_autoinstaller.install()
driver = webdriver.Chrome(options=options) # 브라우저 창 안보이기
# driver = webdriver.Chrome() # 브라우저 창 보이기
 
# 크롬 브라우저 내부 대기 (암묵적 대기)
driver.implicitly_wait(5)
 
# 브라우저 사이즈
driver.set_window_size(1920,1280)
 
# 페이지 이동(열고 싶은 URL)
driver.get('http://prod.danawa.com/list/?cate=112758&15main_11_02')
 
# 페이지 내용
# print('Page Contents : {}'.format(driver.page_source))
 
# 제조사별 검색 (XPATH 경로 찾는 방법은 이미지 참조)
mft_xpath = '//*[@id="dlMaker_simple"]/dd/div[2]/button[1]'
WebDriverWait(driver,3).until(EC.presence_of_element_located((By.XPATH,mft_xpath))).click()
 
# 원하는 모델 카테고리 클릭 (XPATH 경로 찾는 방법은 이미지 참조)
model_xpath = '//*[@id="selectMaker_simple_priceCompare_A"]/li[16]/label'
WebDriverWait(driver,3).until(EC.presence_of_element_located((By.XPATH,model_xpath))).click()
 
# 2차 페이지 내용
# print('After Page Contents : {}'.format(driver.page_source))
 
# 검색 결과가 렌더링 될 때까지 잠시 대기
time.sleep(2)
 
# 현재 페이지
curPage = 1
 
# 크롤링할 전체 페이지수
totalPage = 6
 
while curPage <= totalPage:
    #bs4 초기화
    soup = BeautifulSoup(driver.page_source, 'html.parser')
 
    # 상품 리스트 선택
    goods_list = soup.select('li.prod_item.prod_layer')
 
    # 페이지 번호 출력
    print('----- Current Page : {}'.format(curPage), '------')
 
    for v in goods_list:
        # 상품명, 가격, 이미지
        name = v.select_one('p.prod_name > a').text.strip()
        if not 'APPLE' in name:
            continue
        price = v.select_one('p.price_sect > a').text.strip()
        img_link = v.select_one('div.thumb_image > a > img').get('data-original')
        if img_link == None:
            img_link = v.select_one('div.thumb_image > a > img').get('src')
        print(name,', ', price,', ', img_link)
    print()
 
    # 페이지 수 증가
    curPage += 1
 
    if curPage > totalPage:
        print('Crawling succeed!')
        break
 
    # 페이지 이동 클릭
    cur_css = 'div.number_wrap > a:nth-child({})'.format(curPage)
    WebDriverWait(driver,3).until(EC.presence_of_element_located((By.CSS_SELECTOR,cur_css))).click()
 
    # BeautifulSoup 인스턴스 삭제
    del soup
 
    # 3초간 대기
    time.sleep(3)
 
# 브라우저 종료
driver.close()    
 

728x90

저작자표시 비영리 (새창열림)

'Web 크롤링 > Python Crawling' 카테고리의 다른 글

파이썬 selenium 활용 다나와 제품 검색 모든 페이지 크롤링 (0)	2021.06.28
파이썬 selenium 으로 다나와 제품 검색 페이지 전환 엑셀 저장 (0)	2021.06.27
파이썬 selenium 으로 다나와 제품 리스트 크롤링 (4)	2021.06.26
파이썬 selenium driver 설치 및 스크래핑 기초 학습 (0)	2021.06.26
[크롤링기초] 다나와 로그인 스크래핑 예제 (2)	2021.06.25

Link2Me

250x250

파이썬 selenium 으로 다나와 제품 검색 페이지 전환 크롤링

'Web 크롤링 > Python Crawling' 카테고리의 다른 글

카테고리

태그목록

글 보관함

달력

공지사항

링크

Link2Me

LATEST FROM OUR BLOG

LATEST COMMENTS

BLOG VISITORS

티스토리툴바

« 2026/06 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30