filename The original PDF filename from the ACM Digital Library
year The year of publication, CSCW 2018 online-first edition of PACMHCI is 2017.5
title_from_text The paper title derived from the paper text, this may be incomplete or also include author names.
lead_author The lead author of the paper, based on the filename from the ACM DL
num_pages Number of pages in the PDF
total_words Total number of words in the paper, defined as tokens between any contiguious space
total_words_nopunct Total number of words after replacing all punctuation with spaces.
body_len_words Number of words in the paper's front matter and body, no references and appendices. Calculated with `total_words` method.
body_len_words_nopunct Number of words in the paper's front matter and body, no references and appendices. Calculated with `total_words_nopunct` method.
body_len_chars Number of characters in the paper's front matter and body, no references and appendices.
ref_len_chars Number of characters in the paper's reference section.
ref_len_words Number of words in the paper's reference section. Calculated with `total_words` method.
ref_len_words_nopunct Number of words in the paper's reference section. Calculated with `total_words_nopunct` method.
appx_len_chars Number of characters in the paper's appendix section. Value is nan if no appendix was found.
appx_len_words Number of words in the paper's appendix section. Calculated with `total_words` method.
appx_len_words_nopunct Number of words in the paper's appendix section. Calculated with `total_words_nopunct` method.
ref_count_approx Approximate number of references cited.
words_per_page Averge number of words (`total_words` method) per page
words_nopunct_per_page Average number of words (`total_words_nopunct` method) per page
chars_per_word Average number of characters per word (`total_words` method)
chars_per_word_nopunct Average number of characters per word (`total_words_nopunct` method)
body_words_nopunct_per_ref_count Average number of words in the paper per number of references cited.
title_has_quote 1 if title has a single or double quotation mark, 0 if not