Overview

These datasets were collected in late 2017 from goodreads.com, where we only scraped users' public shelves, i.e. everyone can see it on web without login. User IDs and review IDs are anonymized. We collected these datasets for academic use only. Please do not redistribute them or use for commercial purposes.

We collected three groups of datasets: (1) meta-data of the books, (2) user-book interactions (users' public shelves) and (3) users' detailed book reviews. These datasets can be merged together by joining on book/user/review ids.

Basic Statistics of the Complete Book Graph:
  • 2,360,655 books (1,521,962 works, 400,390 book series, 829,529 authors)
  • 876,145 users; 228,648,342 user-book interactions in users' shelves (include 112,131,203 reads and 104,551,549 ratings)
Download links to these datasets can be found in the Datasets section below.

Note the complete interaction dataset is very large! We extracted several medium-size subsets by genre and recommend using these subsets for experimentation first (see "By Genre" in the Datasets section for details).

Latest News

  • [May 2023] Our datasets have been moved! Please refer to this webpage on how to download the datasets. The previous Google drive links will be deprecated soon.

Code Samples

You can find code samples about loading the datasets and doing basic data explorations in our dataset Github repository. If you have any questions regarding these datasets, please create issues at our dataset Github repository.

Citations

If you are using our datasets, please kindly cite the following papers:

Datasets

Meta-Data of Books

Book Shelves

Book Reviews

By Genre

Note in these datasets:
  • Books may overlap across different genres (i.e., one book may belong to multiple genres);
  • The subgraph for each genre may not be self-contained. Those are subsets of the nodes on the complete book graph. Detailed information about authors, works, book series etc. can be found in the meta-data section.
Children (124,082 books, 10,059,349 interactions, 734,640 detailed reviews)
  • goodreads_books_children.json.gz
  • goodreads_interactions_children.json.gz
  • goodreads_reviews_children.json.gz

  • Comics & Graphic (89,411 books, 7,347,630 interactions, 542,338 detailed reviews)
  • goodreads_books_comics_graphic.json.gz
  • goodreads_interactions_comics_graphic.json.gz
  • goodreads_reviews_comics_graphic.json.gz

  • Fantasy & Paranormal (258,585 books, 55,397,550 interactions, 3,424,641 detailed reviews)
  • goodreads_books_fantasy_paranormal.json.gz
  • goodreads_interactions_fantasy_paranormal.json.gz
  • goodreads_reviews_fantasy_paranormal.json.gz

  • History & Biography (302,935 books, 31,479,229 interactions, 2,066,193 detailed reviews)
  • goodreads_books_history_biography.json.gz
  • goodreads_interactions_history_biography.json.gz
  • goodreads_reviews_history_biography.json.gz

  • Mystery, Thriller & Crime (219,235 books, 24,799,896 interactions, 1,849,236 detailed reviews)
  • goodreads_books_mystery_thriller_crime.json.gz
  • goodreads_interactions_mystery_thriller_crime.json.gz
  • goodreads_reviews_mystery_thriller_crime.json.gz

  • Poetry (36,514 books, 2,734,350 interactions, 154,555 detailed reviews)
  • goodreads_books_poetry.json.gz
  • goodreads_interactions_poetry.json.gz
  • goodreads_reviews_poetry.json.gz

  • Romance (335,449 books, 42,792,856 interactions, 3,565,378 detailed reviews)
  • goodreads_books_romance.json.gz
  • goodreads_interactions_romance.json.gz
  • goodreads_reviews_romance.json.gz

  • Young Adult (93,398 books, 34,919,254 interactions, 2,389,900 detailed reviews)
  • goodreads_books_young_adult.json.gz
  • goodreads_interactions_young_adult.json.gz
  • goodreads_reviews_young_adult.json.gz