TVX ’19- Proceedings of the 2019 ACM International Conference on Interactive Experiences for TV and Online Video
SESSION: 360° Media
360-degree videos offer a novel viewing experience with the ability to explore virtual environments much more freely than before. Technologies and aesthetics behind this approach of film-making are not yet fully developed. The newly gained freedom creates challenges and new methods have to be established to guide users through narratives. This work provides an overview of methods to guide users visually and contributes insights from an experiment exploring visual guidance in 360-degree videos with regard to task performance and user preferences. In addition, smartphone and HMD are used as output devices to examine possible differences. The results show that using viewers preferred HMD over smartphone and visual guidance over its absence. Overall, the Object to Follow method performed best, followed by the Person to Follow method. Based on the results, we defined a set of guidelines for drawing the viewers’ attention in 360-degree videos.
Viewing 360° videos on a head-mounted display (HMD) can be an immersive experience. However, viewers must often be guided, as the freedom to rotate the view may make them miss things. We explore a unique, automatic approach to this problem with dynamic guidance methods called social indicators. They use the viewers’ gaze data to recognize popular areas in 360° videos, which are then visualized to subsequent viewers. We developed and evaluated two different social indicators in a 30-participant user study. Although the indicators show great potential in subtly guiding users and improving the experience, finding the balance between guidance and self-exploration is vital. Also, users had varying interest towards indicators that represented a larger audience but reported a clear desire to use the indicators with their friends. We also present guidelines for providing dynamic guidance for 360° videos.
Camera Heights in Cinematic Virtual Reality: How Viewers Perceive Mismatches Between Camera and Eye Height
When watching a 360° movie with Head Mounted Displays (HMD) the viewer feels to be inside the movie and can experience it in an immersive way. The head of the viewer is exactly in the same place as the camera was when the scene was recorded. In traditional movies, the viewer is watching the movie from outside and a distance between eye height and camera height does not create a problem. However, viewing a movie from the perspective of the camera by HMDs can raise some challenges, e.g. heights of well-known objects can irritate the viewer in the case the camera height does not correspond to the physical eye height. The aim of this work is to study how the position of the camera influences presence, sickness and the user experience of the viewer. We have considered several watching postures as well as various eye heights. The results of our experiments suggest that differences between camera and eye heights are more accepted if the camera position is lower than the own body height. Additionally, sitting postures are preferred and can be adapted easier than standing postures. These results can be applied to improve guidelines for 360° filmmakers.
SESSION: Immersion and Content
Researchers and practitioners are keen to understand how new video viewing practices driven by technological developments impact viewers’ experiences. We detail the development of the Immersive Experience Questionnaire for Film and TV (Film IEQ). An exploratory factor analysis based on responses from 414 participants revealed a four-factor structure of (1) captivation, (2) real-world dissociation, (3) comprehension, and (4) transportation. We validated the Film IEQ in an experiment that replicated prior research into the effect of viewing on screens of varying size. Responses captured by the Film IEQ indicate that watching on a small phone screen reduces the viewer’s level of comprehension, and that this negatively impacts the viewing experience, compared to watching on a larger screen. The Film IEQ allows researchers and practitioners to assess video viewing experiences using a questionnaire that is easy to administer, and that has been empirically validated.
Public, parliamentary and television debates are commonplace in modern democracies. However, developing an understanding and communicating with others is often limited to passive viewing or, at best, textual discussion on social media. To address this, we present the design and implementation of Deb8, a tool that allows collaborative analysis of video-based TV debates. The tool provides a novel UI designed to enable and capture rich synchronous collaborative discussion of videos based on argumentation graphs that link quotes of the video, opinions, questions, and external evidence. Deb8 supports the creation of rich idea structures based on argumentation theory as well as collaborative tagging of the relevance, support and trustworthiness of the different elements. We report an evaluation of the tool design and a reflection on the challenges involved.
The past decade has shown that new technologies can have a profound impact on how we consume television and online video content. As technologies such as VR/AR, sensors, and smart voice assistants are maturing, it is becoming pertinent to study how they could influence the next generation of TV and video experiences. While some experiments already incorporate one or more of these technologies, a systematic study into user expectations for these new technologies has not yet been conducted. In this paper, we present the results of a co-creation session resulting in two future video watching scenarios visualized using storyboards: one presenting a hyper-personalized experience based on the automatic recognition of emotions, and another one presenting an immersive experience using Virtual and Augmented Reality. We conclude with user evaluations of both concepts, offering insights in the opportunities and challenges these concepts could bring for the future of television and video experiences.
SESSION: Video in Society
Televised debates remain a key point in elections, during which there are vast amounts of online activity, much of it conducted through personal devices or second screens. Amidst growing recognition of the influence of online political discourse, we explore the issues and opportunities arising at this specific point in election cycles, using a design-led multi-stakeholder approach to understand both the audience and expert perspectives. Workshops with debate viewers highlighted six key issues and possible solutions, which were encapsulated in four speculative design concepts. These were used to prompt further discussion with political and media experts, who were able to identify the implications and challenges of addressing the opportunities identified by the participants. Together, these perspectives allow us to unravel some of the complexities of designing for this multifaceted problem.
Stepping Through Remixed: Exploring the Limits of Linear Video in a Participatory Mental Health Film
Participatory filmmaking offers opportunities to counterbalance stereotypes about mental health often endorsed by the mainstream media, by involving participants who have a lived experience of mental health problems in production. It is our experience, however, that the linear videos traditionally resulting from such processes can fail to fully accommodate and represent a plurality of participant voices and viewpoints and, as a consequence, may lead to oversimplified accounts of mental health. Interactive film, on the other hand, could open up a space of opportunities for participatory films that allow multiple voices and complex representations to coexist. In this paper, we explore this opportunity by reviewing Stepping Through, a linear film produced by five men with mental health problems in 2016 about isolation and recovery. Through a series of workshops, the film was deconstructed by its creators, who analysed which additional possibilities of both form and content that could be revealed if the Stepping Through was transformed into a non-linear interactive film. Our findings reveal several expressive needs that a non-linear interactive film could more easily accommodate and opportunities for making participatory filmmaking truly dialogic by allowing an active exchange with audiences that preserves, rather than streamlines, the tension between collective views and personal accounts.
Emergent media services are turning towards the use of audience data to deliver more personalised and immersive experiences. We present the Living Room of The Future (LRoTF), an embodied design fiction built to both showcase future adaptive physically immersive media experiences exploiting the Internet of Things (IoT) and to probe the adoption challenges confronting their uptake in everyday life. Our results show that audiences have a predominantly positive response to the LRoTF but nevertheless entertain significant reservations about adopting adaptive physically immersive media experiences that exploit their personal data. We examine ‘user’ reasoning to elaborate a spectrum of adoption challenges that confront the uptake of adaptive physically immersive media experiences in everyday life. These challenges include data legibility, privacy concerns and potential dystopias, concerns over agency and control, the social need for customisation, value trade-off and lack of trust.
SESSION: Users and Devices
Laptop and desktop computers are frequently used to watch online videos from a wide variety of services. From short YouTube clips, to television programming, to full-length films, users are increasingly moving much of their video viewing away from television sets towards computers. But what are they watching, and when? We set out to understand current video use on computers through analyzing full browsing histories from a diverse set of online Americans, finding some temporal differences in genres watched, yet few differences in the length of videos watched by hour. We also explore topics of videos, how users arrive at online videos through referral links, and conclude with several implications for the design of online video services that focus on the types of content people are actually watching online.
Content distribution in video streaming services and interactive TV multi-screen experiences could be improved through the characterisation of each involved device. These applications could provide a much more coherent user experience if the underlying system is able to detect the type of each device, since a device type exhibits specific features that should be considered on the adaptation process. Some of these features are attainable in the browser through different libraries and APIs, but their limited reliability forces to turn to more abstract levels of characterisation. The device type property (smartphone, tablet, desktop, tv) allows to classify devices according to a set of common specific features and its detection at runtime is very helpful for the adaptation process. In the context of a more extensive research, this paper compares three different methods of Web-based device type detection improving upon the existing state of the art and which have been validated through a hybrid broadcast broadband interactive multi-screen service.
Viewers have more content available to them than ever across multiple services but they are also spending a lot of time looking for content to watch. To better understand and identify opportunities to improve the experience for TV watchers, we set out to understand how viewers look for content when they have multiple choices but nothing specific in mind. Twenty-eight individuals submitted diary entries every time they found themselves looking for content to watch without specific content in mind. Findings describe browsing scenarios, viewer criteria, where they browse and why, as well as pain points.
SESSION: Interactive and Responsive
Interacting with Smart Consumer Cameras: Exploring Gesture, Voice, and AI Control in Video Streaming
Livestreaming and video calls have grown in popularity due to the increased connectivity and advancements in mobile devices. Our interactions with these cameras are limited as the cameras are either fixed or manually remote controlled. Here we present a Wizard-of-Oz elicitation study to inform the design of interactions with smart 360° cameras or robotic mobile desk cameras for use in video-conferences and live-streaming situations. There was an overall preference for devices that can minimize distraction, as well as, preferences for devices that demonstrate an understanding of video-meeting context. We find participants dynamically grow with regards to the complexity of interactions which illustrate the need for deeper event semantics within the Camera AI. Finally, we detail interaction techniques and design insights to inform the future personal video cameras for streaming and collaboration.
BookTubers are a rapidly growing community in YouTube who shares content related to books. Previous literature has addressed problems related to automatically analyzing opinions and mood of video logs (vlogs) as a generic category in YouTube. Unfortunately, the population studied is not diverse. In this work, we study and compare some aspects of the geographic/cultural context of BookTube videos, comparing non-western (Indian) and Western populations. The role played by nonverbal and verbal cues in each of these contexts are analyzed automatically using audio, visual, and text features. The analysis shows that cultural context and popularity can be inferred to some degree using multimodal fusion of these features. The best obtained results are an average precision-recall score of 0.98 with Random Forest in a binary India vs. Western video classification task, and 0.75 in inferring binary popularity levels of BookTube videos.
SESSION: Work in Progress
Omnidirectional (360°) video is a novel media format, rapidly becoming adopted in media production and consumption as part of today’s ongoing virtual reality revolution. Due to its novelty, there is a lack of tools for producing highly engaging 360° video for consumption on a multitude of platforms (VR headsets, smartphones or conventional TV sets). In this work, we describe our preliminary work on tools for automating several tasks in the production of 360° video, which are tedious and time consuming when done manually. We propose tools for automated cinematography (generate a lean-back experience without user interaction for conventional TV sets) and automated annotation of 360° video for simplifying linking to other resources like text or 2D images/videos. Both tools employ deep learning based methods for extracting the information about the objects in the scene. We will discuss the current state of these tools and ways how to improve the tools in the future.
Understanding user’s intent has a pivotal role in developing immersive and personalised media applications. This paper introduces our recent research and user experiments towards interpreting user attention in virtual reality (VR). We designed a gaze-controlled Unity VR game for this study and implemented additional libraries to bridge raw eye-tracking data with game elements and mechanics. The experimental data show distinctive patterns of fixation spans which are paired with user interviews to help us explore characteristics of user attention.
Accurately quantifying audience appreciation poses significant technical challenges, privacy concerns and difficulties in scaling the results to realistic audience sizes. This paper presents a new approach to appreciation measurement based on the analysis of BBC iPlayer on-demand viewing pattern data, such as the timeline of the user’s interactions with the play button, combined with appreciation scores from traditional feedback surveys. This methodology infers implicit viewer appreciation automatically, without adding significant cost or time overheads and without requiring additional input from the participant or the use of intrusive methods, such as facial recognition. The results obtained, based on data from a sample of over 27,000 iPlayer users, show accuracy scores above 90% for predictions generated using computationally efficient models, including Decision Trees and Random Forests. The analysis suggests that the user’s appreciation of a programme can be predicted based on their online viewing behaviour, potentially improving our understanding of the audience.
Institutional care settings are often described as places where residents suffer from social isolation. Although sharing media preferences, consumption patterns and practices is believed to be effective to trigger communications and develop friendships between older adults, it rarely happens in care homes. Our research explores the potential to promote residents’ social interaction by augmenting public print media. In this work-in-progress, we started with newspapers as an example to understand residents’ information sources, media habits and preferences. We were also interested in their perceptions of the attractiveness and sociability of augmented print media. The findings showed that the participants held positive attitudes on such technologies. Preliminary design requirements were summarized to inform the future development of related social technologies in public caring environments.
We present a gesture-based user interface for smart TVs that employs deictic gestures to control the content displayed on the TV screen. Our interface implements an instance of the “Smart-Pockets” interaction technique, where links to digital content, in our case to users’ preferred television channels and shows, are stored inside users’ pockets and readily accessible with a mere pointing of the hand to those pockets. Pointing gestures to the pockets and towards the TV screen are detected using the Inertial Measurement Unit embedded in Myo, a smart armband. We discuss the ways in which our prototype opens new opportunities for hybrid, gesture- and pointing-based interactions for smart TVs as well as opportunities for designing interactions that take place at the periphery of user attention.
We present an agenda for the visual augmentation of television watching based on recently booming technology, such as smart wearables and Augmented/Mixed Reality technology. Our agenda goes beyond second-screen viewing trends to explore the opportunities delivered by wearable devices and gadgets, such as smartglasses and head-mounted displays, to deliver rich visual experiences to users. While still a work-in-progress, we hope that our contribution will be inspiring to the TVX community and, consequently, foster critical and constructive discussions towards new devices, application opportunities, and tools to augment visually the television watching experience.
Augmented Fast-Forwarding: Can we Improve Advertising Impact by Enriching Fast-forwarded Commercials?
The trend of time-shifted viewing worries television networks and advertisers, as time-shifting viewers often fast-forward through commercials, resulting in lower advertising impact. The present research tests an alternative solution to this problem by augmenting fast-forwarded commercials with brand logos placed in the center of the screen. We tested the potential of augmented fast-forwarding in an experiment in which participants watched a television show interrupted by a commercial break using three experimental conditions: commercials played at regular speed, at fast-forwarded speed, or at fast-forwarded speed but enriched with logos. Advertising impact was measured during the commercial break (using eye-tracking glasses), right after the TV show (brand recognition), and the day after the TV show (day after brand recognition). Interestingly, the results showed that augmented fast-forwarding performed equally well as regular-speed viewing on two out of three advertising impact measures.
The Immersive Accessibility Project (ImAc) explores how accessibility services can be integrated with 360o video as well as new methods for enabling universal access to immersive content. ImAc is focused on inclusivity and addresses the needs of all users, including those with sensory or learning disabilities, of all ages and considers language and user preferences. The project focuses on moving away from the constraints of existing technologies and explores new methods for creating a personal experience for each consumer. It is not good enough to simply retrofit subtitles into immersive content: this paper attempts to disrupt the industry with new and often controversial methods.
This paper provides an overview of the ImAc project and proposes guiding methods for subtitling in immersive environments. We discuss the current state-of-the-art for subtitling in immersive environments and the rendering of subtitles in the user interface within the ImAc project. We then discuss new experimental rendering modes that have been implemented including a responsive subtitle approach, which dynamically re-blocks subtitles to fit the available space and explore alternative rendering techniques where the subtitles are attached to the scene.
In this paper, with propose PokeRepo Go++, which is our one-man live reporting system (PokeRepo Go) with an added commentator function that enables outside experts to make comments. As a consequence of the spread of live broadcast streaming services, anybody is now able to broadcast his/her own interests, concerns and everyday occurrences. To support reports made by a single person in the form of a live broadcast, we have developed and actually operated PokeRepo Go. PokeRepo Go only had a function to transmit video to viewers in one direction, i.e. non-interactively. For this reason, because an interviewee or reporter had no reaction from the audience, there was uncertainty as to whether the broadcast content was being conveyed to the audience as per his/her intention. This exposed the importance of two-directional, i.e. interactive, communication. In PokeRepo Go++, we provide a commentator function by which commentators can seamlessly participate in live broadcast content and communicate naturally with an interviewee. Additionally, we considered the UI design to realize the compatibility of use of the commentator function with use of the pre-existing PokeRepo Go functions, such as filming operations (including acquirement of video/audio and control of lighting) and editing operations. At a demo session in a domestic conference we operated the prototype system of PokeRepo Go++ and evaluated the usefulness thereof.
Virtual Reality (VR) is increasingly a tempting option for creators seeking more immersive VR audiovisual experiences. In the suspense genre, VR promises to deliver a more impactful experience to the viewers. Nevertheless, it's needed to confirm if such promise is real. This study focused on the creation of a suspense genre content to understand the impact on the immersive experience of the viewer if presented in stereoscopy VR 360º. An evaluation was developed, with a convenience sample of 36 participants. The immersive differences were evaluated when viewing the same audiovisual content of suspense in different formats: VR 360º; 360º; 2D. The results showed that the VR 360º intensifies the perceptual immersion, but diminishes the narrative immersion, a consequence of the 360º.
The Netflix production Bandersnatch represents a potentially crucial step for interactive digital narrative videos, due to the platform's reach, popularity, and ability to finance costly experimental productions. Indeed, Netflix has announced that it will invest more into interactive narratives – moving into romance and other genres – which makes Bandersnatch even more important as first step and harbinger of things yet to come. For us, the question was therefore how audiences react to Bandersnatch. What are the factors driving user's enjoyment and what factors might mitigate the experience. For example, novelty value of an interactive experience on Netflix might be a crucial aspect or the combination with the successful series Black Mirror. We approach these questions from two angels – with a critical analysis of the work itself, including audience reactions and an initial user study using Roth's measurement toolbox (N = 32).
This paper explores the effects of adding augmented reality (AR) artefacts to an existing TV programme. A prototype was implemented augmenting a popular nature documentary. Synchronised content was delivered over a Microsoft HoloLens and a TV. Our preliminary findings suggest that the addition of AR to an existing TV programme can result in creation of engaging experiences. However, presenting content outside the traditional TV window challenges traditional storytelling conventions and viewer expectations. Further research is required to understand the risks and opportunities presented when adding AR artefacts to TV.
This paper explores the potential of media and how it can be leveraged to create a tool to help individuals become more aware of their emotions and promote their psychological wellbeing. It discusses main motivation and background and presents EmoJar, an interactive application being designed and developed to allow users to collect and review media that have significant impact and remind them of the good things they experience along time. EmoJar is based on the Happiness Jar concept that gets enriched with media and its emotional impact, as an extension to Media4WellBeing, aligning with the goals and approaches of Positive Psychology and Positive Computing.
Nowadays radio shows are much more than a linear broadcast feed – they are all about user engagement. At the same time, many users are no longer only connected to a radio station brand through the linear broadcast channel, but also through digital platforms, and interaction via social media is becoming ever more important. Additionally, digital services enable broadcasters and users to customise the radio experience. Radio is thus a medium embedded in a context of social media, interaction and personalisation. This workshop thus aims to bring together researchers and practitioners working on tools, services and applications enabling interactive radio experiences.
The first international workshop on Data-driven Personalisation of Television aims to highlight the significantly growing importance of data in the support of new television content consumption experiences. This includes automatic video summarization, dynamic insertion of content into media streams and object based media broadcasting, to serve the recommendation of TV content and personalization in media delivery. The workshop has two keynote talks alongside five paper presentations and several related demos.
In this half-day workshop, we will explore the ethics of Virtual Reality (VR) through conversations framed around design fictions. Affordable head-mounted displays (HMDs) and accessible VR content are now within reach of large audiences, yet many of VR's most urgent challenges remain under-explored. In addition to the many known unknowns (e.g. how do we manage sensory conflicts and spatial limitations in VR?), there are many more unknown unknowns (e.g. what kinds of psychological, social and cultural impact will VR provoke?). By bringing together diverse scenarios from workshop participants, and bespoke design fictions created specifically to explore the ethics of VR, we will facilitate a rich discussion that will inform the development of three high-fidelity design fictions that will be used to explore the ethics of VR in future workshops, including one in Bristol, UK in November 2019, part of the Virtual Realities Immersive Documentary Encounters project.