Long and Short Papers

The following long and short papers were selected through a rigorous peer-review process. Long and short papers will both get a 20-minute time slot (including 5 minutes Q&A) for presentation at the conference. All papers sessions will be held in the Quays Theatre and The Lowry.

Session 1: 360 Media

  • 5th June, 16:00-17:00
  • Session Chair: David Green (UWE, Bristol)

Exploring Visual Guidance in 360-degree Videos

  • Marco Speicher – DFKI, Saarland Informatics Campus, Saarbrücken, Saarland, Germany
  • Christoph Rosenberg – Saarland University, Saarland Informatics Campus, Saarbrücken, Saarland, Germany
  • Donald Degraen – Intel Visual Computing Institute (IVCI), Saarland Informatics Campus, Saarbrücken, Germany
  • Florian Daiber – DFKI, Saarland Informatics Campus, Saarbrücken, Germany
  • Antonio Krüger – DFKI, Saarland Informatics Campus, Saarbrücken, Germany

Abstract: 360-degree videos offer a novel viewing experience with the ability to explore virtual environments much more freely than before. Technologies and aesthetics behind this approach of film-making are not yet fully developed. The newly gained freedom creates challenges and new methods have to be established to guide users through narratives. This work provides an overview of methods to guide users visually and contributes insights from an experiment exploring visual guidance in 360-degree videos with regard to task performance and user preferences. In addition, smartphone and HMD are used as output devices to examine possible differences. The results show that using viewers preferred HMD over smartphone and visual guidance over its absence. Overall, the Object to Follow method performed best, followed by the Person to Follow method. Based on the results, we defined a set of guidelines for drawing the viewers’ attention in 360-degree videos.

What Are Others Looking at? Exploring 360° Videos on HMDs with Visual Cues about Other Viewers

  • Ville Mäkelä – LMU Munich, Munich, Germany, Tampere University, Tampere, Finland
  • Tuuli Keskinen – Tampere University, Tampere, Finland
  • John Mäkelä – Tampere University, Tampere, Finland
  • Pekka Kallioniemi – Tampere University, Tampere, Finland
  • Jussi Karhu Tampere – University, Tampere, Finland
  • Kimmo Ronkainen – Tampere University, Tampere, Finland
  • Alisa Burova – Tampere University, Tampere, Finland
  • Jaakko Hakulinen – Tampere University, Tampere, Finland
  • Markku Turunen – Tampere University, Tampere, Finland

Abstract: Viewing 360° videos can be an immersive experience, especially when using a head-mounted display (HMD). Current research identifies the need to guide viewers, as the freedom to rotate the view may make them miss things. We explore a unique, automatic approach to this problem with dynamic guidance methods called social indicators. They utilize gaze data collected from viewers to recognize popular areas in 360° videos, which are then visualized to subsequent viewers. We developed two different social indicators and evaluated them in a 30-participant user study. We found that although the indicators show great potential in subtly guiding users and improving the experience, finding the balance between guidance and self-exploration is vital. Also, users had varying interest towards indicators that represented a larger audience but reported a clear desire to use the indicators with their friends. We also present guidelines for providing dynamic guidance for 360° videos.

Camera Heights in Cinematic Virtual Reality: How Viewers Perceive Mismatches Between Camera and Eye Height

  • Sylvia Rothe – LMU Munich University, Munich, Germany
  • Boris Kegeles – LMU Munich, Munich, Germany
  • Heinrich Hussmann – LMU Munich, Munich, Germany
Abstract: When watching a 360° movie with Head Mounted Displays (HMD) the viewer feels to be inside the movie and can experience it in an immersive way. The head of the viewer is exactly in the same place as the camera was when the scene was recorded. In traditional movies, the viewer is watching the movie from outside and a distance between eye height and camera height does not create a problem. However, viewing a movie from the perspective of the camera by HMDs can raise some challenges, e.g. heights of well-known objects can irritate the viewer in the case the camera height does not correspond to the physical eye height. The aim of this work is to study how the position of the camera influences presence, sickness and the user experience of the viewer. We have considered several watching postures as well as various eye heights. The results of our experiments suggest that differences between camera and eye heights are more accepted if the camera position is lower than the own body height. Additionally, sitting postures are preferred and can be adapted easier than standing postures. These results can be applied to improve guidelines for 360° filmmakers.

Session 2: Immersion and Content

  • 6th June, 09:30-10:30
  • Session Chair: Guy Schofield (University of York)

Development of a Questionnaire to Measure Immersion in Video Media: The Film IEQ

  • Jacob M. Rigby – UCL Interaction Centre, University College London, London, United Kingdom
  • Duncan P Brumby – UCL Interaction Centre, University College London, London, United Kingdom
  • Sandy J. J. Gould – School of Computer Science, University of Birmingham, Birmingham, United Kingdom
  • Anna L Cox – UCL Interaction Centre, University College London, London, United Kingdom

Abstract: Researchers and practitioners are keen to understand how new video viewing practices driven by technological developments impact viewers’ experiences. We detail the development of the Immersive Experience Questionnaire for Film and TV (Film IEQ). An exploratory factor analysis based on responses from 414 participants revealed a four-factor structure of (1) captivation, (2) real-world dissociation, (3) comprehension, and (4) transportation. We validated the Film IEQ in an experiment that replicated prior research into the effect of viewing on screens of varying size. Responses captured by the Film IEQ indicate that watching on a small phone screen reduces the viewer’s level of comprehension, and that this negatively impacts the viewing experience, compared to watching on a larger screen. The Film IEQ allows researchers and practitioners to assess video viewing experiences using a questionnaire that is easy to administer, and that has been empirically validated.

Deb8: A Tool for Collaborative Analysis of Video

  • Guilherme – Carneiro School of Computer Science, University of St Andrews, St Andrews, United Kingdom
  • Miguel Nacenta – School of Computer Science, University of St Andrews, St Andrews, United Kingdom
  • Dr Alice Toniolo – School of Computer Science, University of St Andrews, St Andrews, United Kingdom
  • Gonzalo Mendez Escuela – Superior Politécnica del Litoral, Guayaquil, Ecuador Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
  • Aaron J Quigley – SACHI, School of Computer Science, University of St Andrews, St Andrews, Fife, United Kingdom

Abstract: Public, parliamentary and television debates are commonplace in modern democracies. However, developing an understanding and communicating with others is often limited to passive viewing or, at best, textual discussion on social media. To address this, we present the design and implementation of Deb8, a tool that allows collaborative analysis of video-based TV debates. The tool provides a novel UI designed to enable and capture rich synchronous collaborative discussion of videos based on argumentation graphs that link quotes of the video, opinions, questions, and external evidence. Deb8 supports the creation of rich idea structures based on argumentation theory as well as collaborative tagging of the relevance, support and trustworthiness of the different elements. We report an evaluation of the tool design and a reflection on the challenges involved.

Viewers’s Visions of the Future: Co-Creating Hyper-Personalized and Immersive TV and Video Experiences

  • David Geerts – Meaningful Interactions Lab (mintlab), KU Leuven, Leuven, Belgium imec, Leuven, Belgium
  • Evert van Beek – Department of Human Information and Communication Design, TU Delft, Delft, Netherlands
  • Fernanda Chocron – Miranda Graduate Programme in Communication and Information, Federal University of Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
Abstract: The past decade has shown that new technologies can have a profound impact on how we consume television and online video content. As technologies such as VR/AR, sensors, and smart voice assistants are maturing, it is becoming pertinent to study how they could influence the next generation of TV and video experiences. While some experiments already incorporate one or more of these technologies, a systematic study into user expectations for these new technologies has not yet been conducted. In this paper, we present the results of a co-creation session resulting in two future video watching scenarios visualized using storyboards: one presenting a hyper-personalized experience based on the automatic recognition of emotions, and another one presenting an immersive experience using Virtual and Augmented Reality. We conclude with user evaluations of both concepts, offering insights in the opportunities and challenges these concepts could bring for the future of television and video experiences.

Session 3: Video in Society

  • 6th June, 11:00-12:00
  • Session Chair: Shauna Concannon (University of York)

Audience and Expert Perspectives on Second ScreenEngagement with Political Debates

  • Katerina Gorkovenko – DJCAD, University of Dundee, Dundee, United Kingdom
  • Nick Taylor – DJCAD, University of Dundee, Dundee, United Kingdom

Abstract: Televised debates remain a key point in elections, during which there are vast amounts of online activity, much of it conducted through personal devices or second screens. Amidst growing recognition of the influence of online political discourse, we explore the issues and opportunities arising at this specific point in election cycles, using a design-led multi-stakeholder approach to understand both the audience and expert perspectives. Workshops with debate viewers highlighted six key issues and possible solutions, which were encapsulated in four speculative design concepts. These were used to prompt further discussion with political and media experts, who were able to identify the implications and challenges of addressing the opportunities identified by the participants. Together, these perspectives allow us to unravel some of the complexities of designing for this multifaceted problem.

Exploring the Limits of Linear Video in a Participatory Mental Health Film

  • Simona Manni – Digital Creativity Labs, University of York, York, United Kingdom
  • Prof. Marian F Ursu – Digital Creativity Labs, Dept. of Theatre, Film and Television, University of York, York, North Yorkshire, United Kingdom
  • Jonathan Hook – Digital Creativity Labs, Dept. of Theatre, Film and Television, University of York, York, United Kingdom

Abstract: Participatory filmmaking offers opportunities to counterbalance stereotypes about mental health often endorsed by the mainstream media, by involving participants who have a lived experience of mental health problems in production. It is our experience, however, that the linear videos traditionally resulting from such processes can fail to fully accommodate and represent a plurality of participant voices and viewpoints and, as a consequence, may lead to oversimplified accounts of mental health. Interactive film, on the other hand, could open up a space of opportunities for participatory films that allow multiple voices and complex representations to coexist. In this paper, we explore this opportunity by reviewing a linear film produced by five men with mental health problems in 2016 about isolation and recovery. Through a series of creative workshops, the film was deconstructed by its participants, who analysed which additional possibilities of both form and content that could be revealed if the film was transformed into a non-linear interactive film. Our findings reveal several expressive needs that a non-linear interactive film could more easily accommodate and opportunities for making participatory filmmaking truly dialogic by allowing an active exchange with audiences that preserves, rather than streamlines, the tension between collective views and personal accounts.

The Living Room of the Future

  • Neelima Sailaja – Mixed Reality Laboratory, University of Nottingham, Nottingham, United Kingdom
  • Professor Andy Crabtree – School of Computer Science, University of Nottingham, Nottingham, United Kingdom
  • James Colley – Mixed Reality Laboratory, University of Nottingham, Nottingham, United Kingdom
  • Adrian Gradinar – Imagination Lancaster, Lancaster University, Lancaster, Lancashire, United Kingdom
  • Paul Coulton – LICA, Lancaster University, Lancaster, United Kingdom
  • Mr Ian Forrester – BBC R&D, Manchester, United Kingdom
  • Lianne Kerlin – BBC R&D, BBC, Salford, Manchester, United Kingdom
  • Phil Stenton – BBC Research & Development, BBC, Salford, Manchester, United Kingdom
Abstract: Emergent media services are turning towards the use of audience data to deliver more personalised and immersive experiences. We present the Living Room of The Future (LRoTF), an embodied design fiction built to both showcase future adaptive physically immersive media experiences exploiting the Internet of Things (IoT) and to probe the adoption challenges confronting their uptake in everyday life. Our results show that audiences have a predominantly positive response to the LRoTF but nevertheless entertain significant reservations about adopting adaptive physically immersive media experiences that exploit their personal data. We examine ‘user’ reasoning to elaborate a spectrum of adoption challenges that confront the uptake of adaptive physically immersive media experiences in everyday life. These challenges include data legibility, privacy concerns and potential dystopias, concerns over agency and control, the social need for customisation, value trade-off, and lack of trust.

Session 4: Users and Devices

  • 6th June, 14:30-15:30
  • Session Chair: David Zendle (York St John University)

Exploring Online Video Watching Behaviors

  • Frank Bentley – Yahoo, Sunnyvale, California, United States
  • Max Silverman – Oath (Formerly Yahoo), Sunnyvale, California, United States
  • Melissa Bica – Computer Science, Human-Centered Computing, University of Colorado Boulder, Boulder, Colorado, United States

Abstract: Laptop and desktop computers are frequently used to watch online videos from a wide variety of services. From short YouTube clips, to television programming, to full-length films, users are increasingly moving much of their video viewing away from television sets towards computers. But what are they watching, and when? We set out to understand current video use on computers through analyzing full browsing histories from a diverse set of online Americans, finding some temporal differences in genres watched, yet few differences in the length of videos watched by hour. We also explore topics of videos, how users arrive at online videos through referral links, and conclude with several implications for the design of online video services that focus on the types of content people are actually watching online.

Methods for Device Characterisation in Media Services

  • Ana Dominguez – Vicomtech, San Sebastian, Spain
  • Professor Julian Florez – TECNUN, Navarra University, Donostia/San Sebastián, Spain
  • Alberto Lafuente – ATC, University of the Basque Country, San Sebastián, Spain
  • Stefano Masneri – Vicomtech, Donostia, Spain
  • Iñigo Tamayo – Vicomtech, Donostia, Spain
  • Dr. Mikel Zorrilla – Digital Media, Vicomtech, Donostia-San Sebastian, Gipuzkoa, Spain

Abstract: Content distribution in video streaming services and interactive TV multi-screen experiences could be improved through the characterisation of each involved device. These applications could provide a much more coherent user experience if the underlying system is able to detect the type of each device, since a device type exhibits specific features that should be considered on the adaptation process. Some of these features are attainable in the browser through different libraries and APIs, but their limited reliability forces to turn to more abstract levels of characterisation. The device type property (smartphone, tablet, desktop, tv) allows to classify devices according to a set of common specific features and its detection at runtime is very helpful for the adaptation process. In the context of a more extensive research, this paper compares three different methods of Web-based device type detection improving upon the existing state of the art and which have been validated through a hybrid broadcast broadband interactive multi-screen service.

Browsing for Content Across Pay TV and Video On Demand Options

  • Jennifer McNally – Verizon, San Jose, California, United States
  • Elizabeth Harrington Diederich – Verizon, Waltham, Massachusetts, United States
Abstract: Viewers have more content available to them than ever across multiple services but often times they do not have a specific program in mind and spending a lot of time just browsing. To better understand and identify opportunities to improve the experience, we set out to understand how viewers look for content when they have multiple choices but nothing specific in mind and to identify issues they experience during the browsing process. Twenty-eight individuals submitted diary entries every time they found themselves looking for content to watch without specific content in mind. Findings describe scenarios, viewer criteria, where they look and why, and pain points.

Session 5: Interactive and Responsive

  • 6th June, 16:00-17:00
  • Session Chair: Sha Li (University of York)

Interacting with Smart Consumer Cameras: Exploring Gesture, Voice, and AI Control in Video Streaming

  • David A. ShammaFXPAL, Palo Alto, California, United States
  • Jennifer MarlowFXPAL, Palo Alto, California, United States
  • Laurent DenoueFXPAL, Palo Alto, California, United States

Abstract: Livestreaming and video calls have grown in popularity due to the increased connectivity and advancements in mobile devices. Our interactions with these cameras are limited as the cameras are either fixed or manually remote controlled. Here we present a Wizard-of-Oz elicitation study to inform the design of interactions with smart 360° cameras or robotic mobile desk cameras for use in video-conferences and live-streaming situations. There was an overall preference for devices that can minimize distraction, as well as, preferences for devices that demonstrate an understanding of video-meeting context. We find participants dynamically grow with regards to the complexity of interactions which illustrate the need for deeper event semantics within the Camera AI. Finally, we detail interaction techniques and design insights to inform the next generation of personal video cameras for streaming and collaboration.

BookTubing Across Regions: Examining Differences based on Nonverbal and Verbal Cues

  • Chinchu Thomas – Multimodal Perception Lab, International Institute of Information Technology, Bangalore, Karnataka, India
  • Dinesh Babu Jayagopi – Multimodal Perception Lab, International Institute of Information Technology, Bangalore, Karnataka, India
  • Daniel Gatica-Perez – Idiap-EPFL, Lausanne, Switzerland

Abstract: BookTubers are a rapidly growing community in YouTube who shares content related to books. Previous literature has addressed problems related to automatically analyzing opinions and mood of video logs (vlogs) as a generic category in YouTube. Unfortunately, the population studied is not diverse. In this work, we study and compare some aspects of the geographic/cultural context of BookTube videos, comparing non-western (Indian) and Western populations.The role played by nonverbal and verbal cues in each of these contexts are analyzed automatically using audio, visual, and text features. The analysis shows that cultural context and popularity can be inferred to some degree using multimodal fusion of these features. The best obtained results are an average precision-recall score of 0.98 with Random Forest in a binary India vs. Western video classification task , and 0.66 in inferring binary popularity levels of BookTube videos.