Finding the “Right Stuff”

A case Study re the use of Epi-Search Technology and JStor’s TextAnalyzer

The goal of this study is to illustrate how Epi-Search can be used to find highly relevant related research materials given an article, book chapter, or whitepaper as the initial item around which more readings are desired to be found.

Because this study was conducted in pursuit of a Sage Ocean Grant, it seemed only natural to make used of the related Sage Whitepaper: The Ecosystem of Technologies for Social Science Research as the initial target.

Three retrieval approaches are compared: the TextAnalyzer service available at JStor, the current Epi-Search approach found at http://FindRelatedBooks.com and the enhanced version of Epi-Search used to prepare related materials back ends of the Warren McCulloch papers found in Issue 1, Volume 21, of the journal E:CO (Emergence and Complexity in Organizations). For all three approaches the retrieved materials will be restricted to those found and accessible at JStor (for consistency all results were also restricted to 2000 to-date).

The JStor approach is quite simple: upload the paper and let the software do the work.

The current Epi-Search is almost as simple: copy and paste the paper and let the software do the work. Click the results button labelled “Google Books.” In the resulting Google search box (pre-filled with search terms) add the following at the very beginning: “site:jstor.org “ and switch the bottom button to ‘ALL’ from “BOOKS’. This addition limits Google’s results to materials available at JStor.

The updated Epi-Search approach is more sophisticated. Before the query is offered to the software the researcher needs to prepare a “lexical profile” consisting of three elements as shown below:

Lexical Profile of Sage Whitepaper

The Ecosystem of Technologies for Social Science Research

Element #1

The Author’s Abstract or Introduction

The growth in digitally borne data, combined with increasingly accessible means of developing software, has resulted in a proliferation of software to support the research lifecycle. There is now a range of software and tools custom-built for very specific tasks, and the tools supporting common research methods have improved and expanded. Moreover, progress in machine learning models—especially around natural language processing, speech recognition, and the application of graph and network theory—has led to an explosion in new tools and has enabled social science researchers to borrow tools and technologies from other disciplines. The availability and accessibility of new technologies for research is promising. But how can researchers and educators keep up with the changing landscape of tools and software? This challenge became apparent in a survey we conducted in 2016 with close to ten thousand researchers in the social sciences, who told us that the pace of change was an obstacle to teaching new methods to students (Figure 1; Metzler, Kim, Allum, & Denman, 2016). Moreover, the rapid evolution of tools for big data research in particular was seen as a barrier to researchers looking to move into the new and growing field of computational social science (Figure 2). Through subsequent interviews with researchers and students, we gained an understanding of the challenges facing social scientists who want to prepare themselves for a more data- intensive future in research. In response to this and the 2016 survey results, SAGE Publishing launched the SAGE Ocean initiative,1 with the mission to support social science by equipping social scientists with the skills, tools, and resources they need to work with big data and new technology. Over a period of 10 months, SAGE Ocean reviewed 418 tools and software packages used by social science researchers, which we sourced from research papers, tools directories, company databases like Crunchbase, Wikipedia, researcher and lab blogs, and other websites. We were interested to find out more about: How researchers discover tools How researchers decide which tools to adopt for their research How tool developers fund and maintain their tools How developers are recognised for their efforts What role software development plays within the academic ecosystem We explored the various features of these tools and technologies, as well as the key people and organisations that supported their development. We conducted detailed analyses of tools for text annotation, recruiting and surveying research participants, and collecting and analysing social media data. From this work, SAGE Ocean has built a Research Tools Directory2 to help researchers navigate the landscape of tools and software, and launched a Concept Grant scheme3 to support the builders of tools and software for social science research. We will continue this research and share our findings as we expand our list of tools for research. We believe this insight and knowledge is vital for a future in which more research is carried out with the help of technology, and in which researchers may increasingly become tool builders themselves.

Element #2

Word Cloud

academic, access, analysis, annotation, available, blog, challenges, code, collected, com, community, companies, computational, data, developed, digital, figure, free, funded, grants, https, media, number, ocean, open, org, organisations, packages, papers, project, research, retrieved, sage, sagepub, science, social, software, source, support, survey, sustainability, teams, technologies, text, tools, university, used, work, www, years

Element #3

Concept Extractions

Research, Researchers, science, Series, students, Creators, Technologies, Tools, Hu, Methods

Tools, Duca, Creators, Research, cleaning, communities, researchers, ecosystem, annotation, software

Tools, Researchers, Software, Tool, Data, Technologies, Science, Annotation, Packages, Developers

The main idea behind the lexical profile is to create a weighted set of "meaning vectors" with just the requisite variety to attract relevant related results when using NLP, LSA, and LDA technologies.

Element #1, the author’s abstract or Introduction is taken straight from the initial text with figures, boxes, and excess punctuation removed. Element #2, the word cloud is prepared by uploading the full text of the initial paper into word cloud software extracting the fifty top terms (and, if necessary, cleaning the list for nonsense items). Element #3, the concept extractions are prepared by inserting the full text into the software profile at findrelatedbooks.com and capturing the keywords suggested for both a Google Scholar query (found by looking at the results marked “from the web”) and a Google Books query. This is supplemented by a list of the related keywords produced by a software keyword extractor (in this case cortical.io was used).

When all three elements are combined, the resulting lexical profile is then submitted to the query box at FindRelatedbooks.com Once again, Click the results button labelled “from the web.” Then click the link to Google. In the resulting Google search box (pre-filled with search terms) add the following at the very beginning: “site:jstor.org “. This addition limits Google’s results to materials available at JStor.

Results

All three queries produce relevant material.
The “best” and “most relevant” is provided by the enhanced Epi-Search process.

JStor TextAnalyzer

The initial result produces an html page which looks like this:

Listing the results for easier readability:

Promoting Open Science And Research In Higher Education: Ilkka Väänänen, Kati Peltonen, 2016
Conclusions: Ian N. Gregory, Alistair Geddes, 2014
Introducing Open Source Reference Management Software To A Rural South African Campus, Aliza Le Roux, Diana Breshears, Journal Of Higher Education In Africa / Revue De L'enseignement Supérieur En Afrique, Vol. 14, No. 2 (2016), Pp. 49-60
The Emergence Of Open-Source Software For The Weather Radar Community, M. Heistermann, S. Collis, M. J. Dixon Et Al., Bulletin Of The American Meteorological Society, Vol. 96, No. 1 (January 2015), Pp. 117-128
Financing Open Source Biotechnology, Janet Hope. 2008
A Cultural History Of Web 2.0, Alice E. Marwick, 2013
Introduction: The Implications Of Single Sourcing For Writers And Writing, Locke Carter, Technical Communication, Vol. 50, No. 3, Special Issue: Making The Leap To Single Sourcing (August 2003), Pp. 317-320
The Impact Of Open Source Software On The Strategic Choices Of Firms Developing Proprietary Software, Jeevan Jaisingh, Eric W. K. See-To, Kar Yan Tam, Journal Of Management Information Systems, Vol. 25, No. 3 (Winter, 2008/2009), Pp. 241-275
Testqual: Conceptualizing Software Testing As A Service, Yang Yang, Colin Onita, Xihui Zhang Et Al., E-Service Journal, Vol. 7, No. 2 (Winter 2011), Pp. 46-65
Technically Speaking: Exhibitors Uniting In Their Desire To Provide Federated Searching, David Dorman, American Libraries, Vol. 34, No. 7 (Aug., 2003), Pp. 76-78

Current Epi-Search

A Dynamic Framework For Classifying Information Systems Development Methodologies and Approaches, J Iivari - ‎2000
Between The Chairs. An Interdisciplinary Career, M Thaller - ‎2017
Bridges I: Interdisciplinary Collaboration As Practice, C Pearce - ‎2003
Conducting Video Research In The Learning Sciences: Guidance on Selection, Analysis, Technology, and Ethics, Sj Derry - ‎2010
Defining Computational Thinking For Mathematics and Science Classrooms, D Weintrop - ‎2016
Digital Humanities And Renaissance Studies In Canada, Sm Loose - ‎2014
Elements Of Scientific Visualization In Basic Neuroscience Research, Bc Albensi - ‎2004
Engaging Digital Scholarship: Thoughts On Evaluating Multimedia Scholarship, S Anderson - ‎2011
Facilitating Communities Of Practice In Digital Humanities, He Green - ‎2014
Gentile Da Foligno And Scholasticism by Roger French, J Ziegler - ‎2003
Grilichesian Breakthroughs: Inventions Of Methods Of Inventing and Firm Entry in Nanotechnology, Mr Dar - ‎2005
Inducing Expertise In History Doctoral Students Via Information Retrieval Design, C Cole - ‎2000
Making Learning Fun: Quest Atlantis, A Game Without Guns, S Barab - ‎2005
Music, Creativity And Scientific Thinking, Rs Root-Bernstein - ‎2001
Past, Present And Future Of Historical Information Science, O Boonstra - ‎2004
Saving China Through Science: The Science Society of China, Scientific Nationalism, and Civil Society in Republican China, Z Wang - ‎2002
Student Perceptions Of An Authentic Classroom, M Nicaise - ‎2000
Supporting E-Business Research With Web Crawler Methodology, A Nemeslaki - ‎2012
Taking The Long View: From E-Science Humanities To Humanities Digital Ecosystems, S Anderson - ‎2012
Where We Read from Matters: Disciplinary Literacy In A Ninth-Grade Social Studies Classroom, J Damico - ‎2009

Enhanced Epi-Search

A Study of the Efficacy of Project-based Learning Integrated with Computer-based Simulation - STELLA, R Eskrootchi ‎ - 2010
Beyond Black Boxes: Bringing Transparency and Aesthetics Back to Scientific Investigation, M Resnick - ‎2000 ‎
Beyond Constructivism: A Return to Science-Based Research and Practice in Educational Technology, W Winn - ‎2003 ‎
Citizen Science as an Ecological Research Tool : Challenges and Benefits, JL Dickinson - ‎2010 ‎
Citizen science to foster innovation in open science , society and policy, Aletta Bonn , Anne Bowser, Zen Makuch - 2018
Confronting Analytical Dilemmas for Understanding Complex Human Interactions in Design-Based Research from a Cultural-Historical Activity Theory (CHAT) Framework, LC Yamagata-Lynch - ‎2007 ‎
Curriculum Research: Toward a Framework for" Research-Based Curricula", DH Clements - ‎2007 ‎
Design Research in Education: Yes, but Is It Methodological?, A Kelly - ‎2004 ‎
Design Science in Information Systems Research, AR Hevner - ‎2004- ‎
Design-Based Research and Technology-Enhanced Learning Environments, F Wang - ‎2005 ‎
digitalSTS: A Field Guide for Science & Technology Studies, Janet Vertesi et al. - 2019
Integrating citizen science into university, Daniel Wyler and Muki Haklay - 2018
Interview Research in Political Science on JSTOR, FR Baumgartner -2013
Reframing research on learning with technology: in search of the meaning of cognitive tools, B KIM - ‎2007 ‎
Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have, Eszter Hargittai - 2009
Research Data Management: Practical Strategies for Information Professionals, Joyce M. Ray -2014
Resuscitating Research in Educational Technology, KD Squire - ‎2005- ‎
Rethinking the Role of Information Technology-Based Research Tools in Students' Development of Scientific Literacy, M van Eijck - ‎2007 ‎
Scientific Communication Before and After Networked Science, J Carey - ‎2013 ‎
Technology infrastructure for citizen science, Peter Brenton, Ella Vogel, and Marie-Elise Lecoq - 2018

Discussion

Of course a researcher could simply input the title of the whitepaper into Google and then again limit the results to JStor by adding “site:jstor.org” in front of the search terms. This results in:

A basic guide for empirical environmental social science, M Cox - ‎2015
A guideline to improve qualitative social science publishing in ecology and conservation journals, K Moon - ‎2016
Ecosystem Services as a Stakeholder-Driven Concept for Conservation Science, S MENZEL - ‎2010
Enlisting the Social Sciences in Decisions about Dam Removal on JSTOR, SE JOHNSON - ‎2002
Environmental justice research shows the importance of social feedbacks in ecosystem service trade-offs, NM Dawson - ‎2017
Evolution of natural and social science interactions in global change research programs, HA Mooney - ‎2013
Expanding the contribution of the social sciences to social-ecological resilience research, S Stone-Jovicich - ‎2018
From e-Science Humanities to Humanities Digital Ecosystems, S Anderson - ‎2012
How Earth Science Has Become a Social Science, N Oreskes - ‎2015
Integrating Social Science Science into the Long-Term Ecological Research (LTER) Network: Social Dimensions of Ecological Change and Ecological Dimensions of Social Change, CL Redman - ‎2004
Making Relevant Science in Biodiversity Studies, C Granjou - ‎2015
Probing the interfaces between the social sciences and social-ecological resilience: insights from integrative and hybrid perspectives in the social sciences, S Stone-Jovicich - ‎2015
Review: Ecology and the Social Sciences, P Lowe - ‎2009
Seeking Common Ground: How Natural and Social Scientists Might Jointly Create an Overlapping Worldview for Sustainable Livelihoods: A South African Perspective, N King - ‎2007
Social relevance of science and technology, GK Kadekodi - ‎2009
Social Science Research Needs for the Hurricane Forecast and Warning System, H Gladwin - ‎2009
Sustainability transformations: a resilience perspective, P Olsson - ‎2014
The “social” aspect of social-ecological systems: a critique of analytical frameworks and findings from a multisite study of coastal sustainability, T Stojanovic - ‎2016
Using Administrative Data for Social Science and Policy, AM Penner - ‎2019

These results seem rather biased towards “ecology” – most likely due to the use of “Eco-System” in the whitepaper title. One of the serious disadvantages of general searches such as Google is that the entailments of words are drawn from general language use and not from the specific interests of the querying researcher. By contrast, the lexical profile approach used in the modified FindRelatedBooks.com adds just enough context so that the entailments better reflect the paper from which the query words are drawn.

Using JStor as the target corpus does not mean the researcher is stuck using the internal JStor search engine. Google allows searches of JStor material. The JStor results can be improved by taking the concepts which TextAnalyzer has extracted and using them in a Google search (restricted with the term “site:jstor.org”).

Can History Be Open Source? Wikipedia and the Future of the Past, R Rosenzweig - ‎2006 ‎
Code Reuse in Open Source Software, S Haefliger - ‎2008 ‎
Computer-Assisted Analysis of Textual Data in Social Sciences, G Wiedemann - ‎2013 ‎
Design Science in Information Systems Research, AR Hevner - ‎2004 ‎
Desperately Seeking the ``IT'' in IT Research—A Call to Theorizing the IT Artifact, WJ Orlikowski - ‎2001 ‎
Digital Open Access to Publicly-Funded Research and National Security: A Review of the Status of Access to and a Framework for Evaluation of Security-Related Research Results, MR Sanfilippo - ‎2014 ‎
Emergence of New Project Teams from Open Source Software ... J Hahn - ‎2008 ‎
Geography and computational social science, PM Torrens - ‎2010 ‎
Network Effects: The Influence of Structural Capital on Open Source Project Success, PV Singh - ‎2011 ‎
Open Source Software and the "Private-Collective" Innovation Model: Issues for Organization Science, E von Hippel - ‎2003 ‎
'Openness' in a Global Information Society, C May - ‎2008 ‎
Past, Present and Future of Historical Information Science, O Boonstra - ‎2004 ‎
Principles for Conducting Critical Realist Study Research in Information Systems, D Wynn Jr - ‎2012 ‎
Technology and Institutions: What Can Research on Information Technology and Research on Organizations Learn from Each Other?, WJ Orlikowski - ‎2001 ‎
The Economics of Technology Sharing: Open Source and Beyond, J Lerner - ‎2005 ‎
The Emergence of Governance in an Open Source Community, S O'Mahony - ‎2007 ‎
The Promise of Research on Open Source Software, G von Krogh - ‎2006 ‎
Understanding Sustained Participation in Open Source Software Projects, Y Fang - ‎2009 ‎

This suggests that there is merit in the Textanalyzer concept extractions but less merit in the search engine used internal to JStor itself.

We note that both versions of Epi-Search also give results from the ISCE Library (4000+ books centered on systems, complexity, philosophy, and organizations). These too seem highly relevant (though some are pre-2000):

Writing the Doctoral Dissertation, Davis, Gordon B.
Working with Sensitizing Concepts: Analytical Field Research (Qualitative Research Methods), Hoonaard, Will van den
Writing and Presenting Research (SAGE Study Skills Series), Thody, Angela
Ground Rules for Good Research: A 10 Point Guide for Social Researchers, Denscombe, Martyn
Doing Your Research Project 4/e: A guide for first-time researchers in social science, education and health, Bell, Judith
Judgment Calls in Research (Studying Organizations), McGrath, Joseph E.
What Is Research?: Methodological Practices and New Approaches, Rhedding-Jones, Jeanette
Management Research: An Introduction, Easterby-Smith, Mark
Qualitative Analysis for Social Scientists, Strauss, Anselm L.
Investigating the Social World: The Process and Practice of Research (Sage Studies in Discourse), Schutt, Russell K.

By contrast, results from Google Books are more tangential and are best used by a researcher trying to write about the topics in the whitepaper and locating them with a context composed of current book level work. This suggests (and we have other evidence to support) that aiming Epi-Search at a restricted corpus (i.e. JStor or the ISCE Library) works well for getting focused results and aiming it at large corpora results in more tangentially relevant retrievals. Either method is strongly suggestive of two additional uses for the software: 1) a researcher can input their work and discover other related material which perhaps should be addressed but of which they were previously unaware and 2) that same input can reveal related but tangential areas of exploration for query expansion and “added readings.”

FindRelatedBooks.com in its various forms is a highly powerful tool which far too few researchers are aware of or use.

For more information about epi-search and FindRelatedBooks.com please contact Michael Lissack (michael.lissack@gmail.com or 617-710-9565) President, American Society for Cybernetics