Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
360 views
in Technique[技术] by (71.8m points)

scrapy采集入库,下面的是入的sqlite3 的代码,改成入库mysql的,下面代码需要改动吗?

下面是网上的一个入库sqlite3的,我要原样改成入库mysql的。
功能就是插入文章,在插入文章前需要得到文章分类的id,如果有就根据分类名查询到分类id, 如果没有就根据得到的资料生成分类并取得id.
下面sqlite3的是可以的

问题是 我要改成mysql 的就不行了。具体看后面代码。

class BookspiderPipeline(object):
    def __init__(self):
        DBpath = os.getcwd() + '/db.sqlite3'
        self.con = sqlite3.connect(DBpath)
        self.cur = self.con.cursor()

    def process_item(self, item, spider):
        self.cur.execute("SELECT id FROM books_tag WHERE tagname = ?", (item['categoryName'],))
        tagID = self.cur.fetchone()
        if not tagID:
            self.cur.execute("INSERT INTO books_tag (tagname) VALUES (?)", (item['categoryName'],))
            self.con.commit()
            self.cur.execute("SELECT id FROM books_tag WHERE tagname = ?", (item['categoryName'],))
            tagID = self.cur.fetchone()
        tagID = tagID[0]
        print(tagID)

        self.cur.execute("SELECT id FROM books_book WHERE title = ?", (item['bookName'],))
        bookID = self.cur.fetchone()

        if not bookID:
            self.cur.execute('''
            INSERT INTO books_book (title, cover, author, intro, tag_id) VALUES (?,?,?,?,?)
            ''', (item['bookName'], item['cover'], item['author'], item['intro'], tagID))
            self.con.commit()
            self.cur.execute("SELECT id FROM books_book WHERE title = ?", (item['bookName'],))
            bookID = self.cur.fetchone()

        bookID = bookID[0]
        print(bookID)

        self.cur.execute('''INSERT INTO books_chapter (number, title, content, book_id) 
                        VALUES (?,?,?,?)''', (int(item['number']), item['chapterName'], item['chapterContent'], bookID))
        self.con.commit()
        return item

我的代码:

class ScMySqlPipeline(object):
    def __init__(self):
        dbargs = dict(
            host = '127.0.0.1',
            user = 'root',
            passwd = 'mysqlpasswd',
            db = 'info2222',
            port = 3306,
            cp_reconnect = True,
            cursorclass = pymysql.cursors.DictCursor,
            charset = 'utf8mb4',  
            use_unicode = True,
        )
        self.dbpool = adbapi.ConnectionPool('pymysql', **dbargs)
        
    def process_item(self, item, spider):
        #代码执行到这里就出问题了。应该下面代码不对。
        #下面是 先判断有没有这个分类,根据分类名查询数据库,如果有就把这个分类的id取出来赋值给category_id 以便于在后面插入文章时用到这个分类id, 如果没有的话就插入这个分类然后再得到她的分类id值
        conn.execute("""select id from aainfo_category where name = %s""", (item['fenlei'], ))
        category_id = conn.fetchone()
        if not category_id:
            conn.execute("INSERT INTO aainfo_category (`name`, `slug`, `chaodai`, `num`, `description`, `jianjie`, `weburl`) VALUES (%s, %s, %s, %s, %s, %s, %s)", (item['category_name'], item['category_slug'], item['category_chaodai'], 0, item['category_description'], item['category_jianjie'], item['category_weburl']))
            #conn.commit()
            conn.execute("""select id from aainfo_category where name = %s""", (item['fenlei'], ))
            category_id = conn.fetchone()
            print (category_id)
        category_id = category_id[0]
        print (category_id)
        if isinstance(item, vodItem):
            query = self.dbpool.runInteraction(self.insert_into_info, item).addErrback(self.handle_error)
            return item

self.insert_into_info 这个函数是文章插入mysql的代码。需要用到分类的id ,所以上面代码取得分类的id的功能。不知那里有错?请各位朋友指点一下。谢谢。


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神解答

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...