| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164 |
- # coding=utf-8
- '''
- Created on 2016年3月4日
- @author: ChenHao
- '''
- from pymongo.mongo_client import MongoClient
- from util_common import Constant, html_downloader
- from math import ceil
- from util_common.html_downloader import HtmlDownloader
- import urllib
- import pymongo
- cli = MongoClient(Constant.MONGODB_URL)
- # cli = MongoClient("mongodb://localhost:27017/")
- # cli = MongoClient("mongodb://113.105.74.140:27017/")
- db = cli.spider
- '''
- 查看propertyvalue
- '''
- # rs = db.propertyvalue.find({"propertyid" : 49})
- # for r in rs:
- # print r
- #
- # user = [{"name":"chenhao", "age":12}, {"name":"xiaohaohao", "age":14}]
- #
- # results = db.restaurant.insert_many(user)
- #
- # results.inserted_ids()
- #
- # for i in range(100):
- # cursor = db.restaurant.find_one({"name":"chenhao"})
- # print cursor
- # db.user.insert({"name": "test", "starttime": 0})
- # rs = db.user.find({"name": "chen"})
- # print rs
- #
- # rs_one = db.user.find_one({"name": "chenh"})
- # print rs_one
- # rs_user = db.user.find()
- # for r in rs_user:
- # print r
- # rs = db.kindlist_todo.find_one({"url": "http://www.mouser.cn/Semiconductors/RF-Semiconductors/RF-Integrated-Circuits/_/N-az8go/"})
- # db.kindlist_todo.delete_one({"url": "http://www.mouser.cn/Semiconductors/Discrete-Semiconductors/Diodes-Rectifiers/_/N-ax1ma/?No=1900"})
- # rs = db.kindlist_todo.find({"status": Constant.DISTINCT})
- '''
- 检查component_original去重效果
- '''
- # rs = db.component_original.find({})
- # print (rs.count())
- #
- # rs = db.component_original.find({"status": Constant.DONE})
- # print (rs.count())
- #
- # rs = db.component_original.find({"status": Constant.DISTINCT})
- # print (rs.count())
- # rs = db.brand_temp.find({})
- # brand_set = set()
- # print (rs.count())
- # for r in rs:
- # brand_set.add(r["nameCn"])
- # print (len(brand_set))
-
- # rs = db.component_original.find({"status": Constant.DONE,"imgTask": Constant.TODO})
- # print ("图片下载未完成", rs.count())
- #
- # rs = db.component_original.find({"status": Constant.DONE,"imgTask": Constant.DONE})
- # print ("图片下载已完成", rs.count())
- #
- # rs = db.component_original.find({"imgTask": Constant.DONE})
- # print ("图片下载已完成", rs.count())
- #
- # rs = db.component_original.find({"status": Constant.DONE, "imgTask": None})
- # print ("图片下载失败" ,rs.count())
- # db.user.create_index([("starttime", pymongo.ASCENDING)])
- # print (db.user.find_one())
- # rs = db.component_original.find({"imgTask": Constant.DONE}, {"img_url_uu": True}).limit(10000)
- # ss = set()
- # for r in rs:
- # ss.add(r["img_url_uu"])
- # print (len(ss))
- # temp_list = list()
- # rs = db.propertyvalue_temp.find({})
- # for index, r in enumerate(rs):
- # if index < 10000:
- # temp_list.append(r)
- # else:
- # break
- #
- # print (temp_list)
- # print(rs["str_html"])
- # # for i in rs:
- # # print i
- # for ind, i in enumerate(rs):
- # if (cou - ind) > 10:
- # # if ind < 30:
- # i["status"] = Constant.DONE
- # db.kindlist_todo.save(i)
- # print ind
- # componentid_list = list(i for i in range(1, 10000000))
- # print (len(componentid_list))
- # rs = db.propertyvalue.find({"componentid": {"$in": componentid_list}}, {"_id": False}, no_cursor_timeout=True)
- # for r in rs:
- # pass
- # url_set = set()
- # rs = db.component_temp.find({"kindid_mouser": 568})
- # rs = db.component_temp.find({"kindid_uu": 214})
- # print (rs.count())
- # for r in rs:
- # print (r)
- # print (rs.count())
- rs = db.kindlist_todo.find({})
- print (rs.count())
- rs = db.kindlist_todo.find({"status": Constant.TODO})
- print (rs.count())
- rs = db.kindlist_todo.find({"status": Constant.DONE})
- print (rs.count())
- '''
- # 检查速度
- # 0.0139999389648
- # 0.297000169754
- t1 = time.time()
- for i in range(1000):
- db.kindlist_todo.find()
- t2 = time.time()
- print t2 - t1
- for i in range(1000):
- db.kindlist_todo.find_one()
- t3 = time.time()
- print t3 - t2
- '''
- cli.close()
|