Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Jones, S M

Searching in archive cs. Search in all archives.
.
  1. DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding

    Authors: Kehinde Ajayi, Xin Wei, Martin Gryder, Winston Shields, Jian Wu, Shawn M. Jones, Michal Kucer, Diane Oyen

    Abstract: Recent advances in computer vision (CV) and natural language processing have been driven by exploiting big data on practical applications. However, these research fields are still limited by the sheer volume, versatility, and diversity of the available datasets. CV tasks, such as image captioning, which has primarily been carried out on natural images, still struggle to produce accurate and meanin… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  2. arXiv:2307.06458  [pdf, other

    cs.SI cs.CV cs.DL

    Discovering Image Usage Online: A Case Study With "Flatten the Curve''

    Authors: Shawn M. Jones, Diane Oyen

    Abstract: Understanding the spread of images across the web helps us understand the reuse of scientific visualizations and their relationship with the public. The "Flatten the Curve" graphic was heavily used during the COVID-19 pandemic to convey a complex concept in a simple form. It displays two curves comparing the impact on case loads for medical facilities if the populace either adopts or fails to adop… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: 6 pages, 5 figures, Presented as poster at JCDL 2023

    ACM Class: I.4.9; H.3.3; H.4.3; H.3.7

  3. arXiv:2211.02115  [pdf, other

    cs.CV cs.IR

    Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

    Authors: Shawn M. Jones, Diane Oyen

    Abstract: Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a tex… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: 20 pages; 7 figures; to be published in the proceedings of the Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop from ECCV 2022

    ACM Class: H.3.3; H.3.7; H.3.5; I.4.9

  4. arXiv:2209.08649  [pdf, other

    cs.DL

    Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists

    Authors: Himarsha R. Jayanetti, Shawn M. Jones, Martin Klein, Alex Osbourne, Paul Koerbin, Michael L. Nelson, Michele C. Weigle

    Abstract: As web archives' holdings grow, archivists subdivide them into collections so they are easier to understand and manage. In this work, we review the collection structures of eight web archive platforms: : Archive-It, Conifer, the Croatian Web Archive (HAW), the Internet Archive's user account web archives, Library of Congress (LC), PANDORA, Trove, and the UK Web Archive (UKWA). We note a plethora o… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: 5 figures, 16 pages, accepted for publication at TPDL 2022

  5. It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

    Authors: Shawn M. Jones, Valentina Neblitt-Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

    Abstract: In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying metadata takes time, we recognize that each news article author has a limited metadata budget with which to spend their tim… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: 10 pages, 10 figures, 3 tables

  6. Automatically Selecting Striking Images for Social Cards

    Authors: Shawn M. Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

    Abstract: To allow previewing a web page, social media platforms have developed social cards: visualizations consisting of vital information about the underlying resource. At a minimum, social cards often include features such as the web resource's title, text summary, striking image, and domain name. News and scholarly articles on the web are frequently subject to social card creation when being shared on… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: 10 pages, 5 figures, 10 tables

  7. arXiv:2008.00139  [pdf, other

    cs.DL cs.HC cs.IR

    SHARI -- An Integration of Tools to Visualize the Story of the Day

    Authors: Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson

    Abstract: Tools such as Google News and Flipboard exist to convey daily news, but what about the past? In this paper, we describe how to combine several existing tools with web archive holdings to perform news analysis and visualization of the "biggest story" for a given date. StoryGraph clusters news articles together to identify a common news story. Hypercane leverages ArchiveNow to store URLs produced by… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

    Comments: 19 pages, 16 figures, 1 Table

    ACM Class: H.3.7; H.3.6; H.3.4

    Journal ref: Presented at the Web Archiving and Digital Libraries 2020 Workshop

  8. arXiv:2008.00137  [pdf, other

    cs.DL cs.HC cs.IR

    MementoEmbed and Raintale for Web Archive Storytelling

    Authors: Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson

    Abstract: For traditional library collections, archivists can select a representative sample from a collection and display it in a featured physical or digital library space. Web archive collections may consist of thousands of archived pages, or mementos. How should an archivist display this sample to drive visitors to their collection? Search engines and social media platforms often represent web pages as… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

    Comments: 54 pages, 5 tables, 46 figures

    ACM Class: H.3.7; H.3.6; H.3.4

    Journal ref: Presented at the Web Archiving and Digital Libraries 2020 Workshop

  9. arXiv:1905.11342  [pdf, other

    cs.DL cs.HC cs.SI

    Social Cards Probably Provide For Better Understanding Of Web Archive Collections

    Authors: Shawn M. Jones, Michele C. Weigle, Michael L. Nelson

    Abstract: Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page.… ▽ More

    Submitted 29 May, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: 58 pages, 53 figures

    ACM Class: H.3.7; H.3.6; H.3.5; H.5.2

  10. The Many Shapes of Archive-It

    Authors: Shawn M. Jones, Alexander Nwala, Michele C. Weigle, Michael L. Nelson

    Abstract: Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government organizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources, or seeds, and creating their own web archive collections. We focus on the collections within Archive-It, a subscription ser… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: 10 pages, 12 figures, to appear in the proceedings of the 15th International Conference on Digital Preservation (iPres 2018)

    ACM Class: H.3.7; H.3.1

  11. The Off-Topic Memento Toolkit

    Authors: Shawn M. Jones, Michele C. Weigle, Michael L. Nelson

    Abstract: Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configured to revisit the same original resource multiple times. This is incredibly useful for understanding an unfolding news sto… ▽ More

    Submitted 17 September, 2018; v1 submitted 18 June, 2018; originally announced June 2018.

    Comments: 10 pages, 14 figures, to appear in the proceedings of the 15th International Conference on Digital Preservation (iPres 2018)

    ACM Class: H.3.7; H.3.6; H.3.4

  12. arXiv:1602.09102  [pdf, other

    cs.DL

    Persistent URIs Must Be Used To Be Persistent

    Authors: Herbert Van de Sompel, Martin Klein, Shawn M. Jones

    Abstract: We quantify the extent to which references to papers in scholarly literature use persistent HTTP URIs that leverage the Digital Object Identifier infrastructure. We find a significant number of references that do not, speculate why authors would use brittle URIs when persistent ones are available, and propose an approach to alleviate the problem.

    Submitted 29 February, 2016; originally announced February 2016.

    Comments: 2 pages, 2 figures, accepted for publication at WWW 2016 (poster track)

  13. arXiv:1602.06223  [pdf, other

    cs.DL

    Rules of Acquisition for Mementos and Their Content

    Authors: Shawn M. Jones, Harihar Shankar

    Abstract: Text extraction from web pages has many applications, including web crawling optimization and document clustering. Though much has been written about the acquisition of content from live web pages, content acquisition of archived web pages, known as mementos, remains a relatively new enterprise. In the course of conducting a study with almost 700,000 web pages, we encountered issues acquiring meme… ▽ More

    Submitted 22 February, 2016; v1 submitted 19 February, 2016; originally announced February 2016.

    Comments: 16 pages, 6 figures, 13 listings

    ACM Class: H.3.7

  14. arXiv:1506.06279  [pdf, other

    cs.DL

    Avoiding Spoilers in Fan Wikis of Episodic Fiction

    Authors: Shawn M. Jones, Michael L. Nelson

    Abstract: A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering "spoilers" -- information that gives away key plot points before the intended time of the show's writers. Enterprising readers might b… ▽ More

    Submitted 20 June, 2015; originally announced June 2015.

    Comments: 18 pages, 31 figures, 3 tables, 2 algorithms

    ACM Class: H.3.7

  15. arXiv:1406.3876  [pdf, other

    cs.DL

    Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension

    Authors: Shawn M. Jones, Michael L. Nelson, Harihar Shankar, Herbert Van de Sompel

    Abstract: We have implemented the Memento MediaWiki Extension Version 2.0, which brings the Memento Protocol to MediaWiki, used by Wikipedia and the Wikimedia Foundation. Test results show that the extension has a negligible impact on performance. Two 302 status code datetime negotiation patterns, as defined by Memento, have been examined for the extension: Pattern 1.1, which requires 2 requests, versus Pat… ▽ More

    Submitted 15 June, 2014; originally announced June 2014.

    Comments: 23 pages, 18 figures, 9 tables, 17 listings

    ACM Class: H.3.7