Spartacus: Spatially-Aware Interaction for Mobile Devices Through Energy-Efficient Audio Sensing

Spartacus: Spatially-Aware Interaction for Mobile DevicesThrough Energy-Efficient Audio Sensing Presenter Ivan Chiou

Authors • All come from Electrical and Computer Engineering, Carnegie Mellon University • Zheng Sun, PhD student in CyLab Mobility Research Center • AveekPurohit, Ph.D. candidate • Raja Bose, Microsoft Silicon Valley, KarMode LLC • Pei Zhang, Assistant Research Professor

Abstract • Spartacus • a mobile system that enables spatially-aware neighboring device interactions with zero prior configuration. • Using built-in microphones and speakers • Doppler effect to enable an interaction through a pointing gesture. • audio-based lower-power listening mechanism to trigger the gesture detection service. • Experiment • 90% device selection accuracy within 3m • lower energy consumption

Introductions • Recent research still require initial channel of communication such as Wi-Fi or Bluetooth • Spartacus’ Key contribution: • a novel acoustic technique based on the Doppler effect • a novel undersamplingaudio signal processing pipeline • low-power listening(reduces energy consumption) and without any manual actions from users • Experimentally validation

System Overview • How it works: • Spartacus interact by quickly pointing her mobile phone towards the targeting device. • low-power listening using their built-in microphones. • an audio beacon with a short duration as an initiator • does not require any extra hardware. • implemented on the Android mobile platform without extra hardware.

Design Challenges • High Resolution Doppler-Shift Detection • pointing gestures of average users are usually transient (shorter than 0.5s) • increases the frequency-domain resolution by 5X than traditional FFT-based approaches • High-Accuracy Device Selection • Accurately estimate the peak frequency shifts • implement a bandpass audio signal processing pipeline to intermit high frequency acoustic noises • Energy-effect Interaction Trigger • a low-power audio listening protocol to trigger incoming interaction

SYSTEM DESCRIPTION • How Spartacus detects the maximum peak frequency shifts among those candidate target devices? • Since the user made the gesture directionally towards the target device, the target device would be able to observe the maximum Doppler shift and to be selected.

Detect Doppler Shift with High Resolution(1/5) • Deriving Angular Resolution • where fA is the observed tone frequency of DA, f0 the frequency of the original tone, Fsthe sampling rate, NFFT the number of FFT points, and the calculated frequency shift expressed in terms of FFT points.

Detect Doppler Shift with High Resolution(2/5) • Assume the target device is stationary during the course of the gesture

Detect Doppler Shift with High Resolution(3/5)

Detect Doppler Shift with High Resolution(4/5) • Improving Resolution using Undersampling • increasing the original tone frequency f0 • stronger energy degradation • increasing the number of FFT points NFFT • higher computational burden • decreasing the sampling rate Fs. • Spartacus at a very high frequency(18KHz) • Undersampling technique can significantly reduce it

Detect Doppler Shift with High Resolution(5/5) • Determining Undersampling Parameters • A higher n • a higher fL • Avoided using fL higher than 19KHz since it will cause greater energy degradation • Commodity Device limits audio sampling rates • include 8KHz, 16KHz, 32KHz, 44.1KHz, and 48KHz • only when n=5, 6, or 7 given Fs = 44.1KHz, or when n = 4 given Fs = 48KHz • Angular resolution improved 26.7 degrees to 10 degrees.

Select Target Device with High Accuracy(1/3) • Bandpass Signal Processing Pipeline • since the new sampling rate is much lower than the Nyquist rate, aliasing arises in the original sampled audio signals.

Select Target Device with High Accuracy(2/3) • We found that M = 1.5 led to robust performance in various indoor environments. • After each device detects the Doppler frequency shifts, all the devices report their frequency shift to the sender device, along with the device’s ID information. • The sender device then compares all the received Doppler shifts and determines the target device.

Select Target Device with High Accuracy(3/3) • Angular Gain through Pointing Gestures • the number of FFT points is 2048, the smallest angular resolution is 10 degrees when the undersampling factor n is equal to 7. • when candidate devices are close to the user (i.e. within 3m), the device selection accuracy is better than the analysis. • This angular change is significant when the candidate devices DA and DB are close to D0. Assuming the user’s arm is 60cm, the effective angular difference is increased to 55", which makes the two devices much easier to be differentiated.

Energy-efficient Interaction Triggering(1/3) • How Spartacus Design for saving energy? • Low-Power Audio Listening • Advantages • Ubiquitous Hardware Support • No extra hardware and Only need Microphones and speakers • Limited Range • Easy to detecting neighboring devices within the same space • Energy Efficient • designed for continuous discovery. • Protocol Two major modes • Periodic Listening • wake up (every Trx) • Record sound for duration (drx). • Beaconing • After receive the beacon, switch to continuous listening mode to record the gesture • a short beacon duration consumes more energy • Tradeoff between energy consumption vs. duty cycles • Encodes the device ID using the Reed-Solomon coding • Using a 16 Frequency Shift-Keying (FSK) scheme with a central frequency at 19KHz. • Keys are using a 50Hz • the transmission of the device ID is at least 200Hz lower than the gesture tone - NO ambiguities

Energy-efficient Interaction Triggering(2/3) • Dealing with Wakeup Jitter • It can be observed between when an API starts recording sound and when the system actually begins recording. • average jitter: 70ms, standard deviation: 15ms • empirical measurements to solve this problem

Energy-efficient Interaction Triggering(3/3) • Dealing with Wakeup Jitter • due to the existence of the wakeup jitter , an additional guard band is used in the beacons.

IMPLEMENTATION • Hardware • Android platform on Galaxy Tab, Nexus 7, Galaxy Nexus, and HTC One S. • Software implementation • 4 components • GestureSensing • GestureSensing.makeGesture(); • GestureSensing.analyzeGesture(); • LowPowerListening • LPL.start(); • AudioModem • GUI.

Hardware limitation(1/2) • In Spartacus, we use tone frequencies higher than 20KHz : inaudible • quantize the energy degradation of sound • Devices: • Sennheiser MKE 2P microphone • Yamaha NX-U10 speaker • energy degradation higher than 15KHz • Mobile phone usually designed for human conversations and music that is lower than 15KHz • increases every 1KHz, the degradation of sound energies increases 5dB on speakers • average 3.2dB/m energy decrease of sound from 1m to 6m

Hardware limitation(2/2) • These results indicate that, to reduce energy degradation and increase interaction range, audio tones with lower frequencies should be leveraged.

Evaluation of Pointing Gestures(1/4) • Challenging questions: • How diversely do users point their phones, and how fast can a user point? • If the user points fast enough, how often does the target device observe the highest frequency shift, thus the highest velocity, of the gesture? • If we want to estimate the frequency shifts, how much frequency- and time-domain resolution do we need to successfully capture the peak frequency shift inside of a gesture? • Participator • 12 participants (6 females) • briefed the participants on the idea of Spartacus before the experiment • 10 gestures towards a target device 2m away from them, using a Galaxy Nexus phone. • detected hand trajectories of the participants using image processing techniques

Evaluation of Pointing Gestures(2/4) • Finding 1 • Three types of gesture • most of the participants fully stretchedout their arms • Focusing on evaluating this vertically downwardgesture trajectory in the current design of Spartacus.

Evaluation of Pointing Gestures(3/4) • Finding 2 • facing towards the target device, with an average ±7.5" angular bias. • precisely point the phones towards the target device • selecting the target device using the maximum velocity

Evaluation of Pointing Gestures(4/4) • Finding 3 • The peak velocity of the gestures of all participants was 3.4m/s on average • Most of the gestures lasted less than one second, and the peak velocities appeared and diminished within 25ms. • Spartacus needs a high time-domain resolution to position the peak frequency shifts

Experimental Setup • Galaxy Nexus phone 25 times towards the target device • a peak velocity of about 3m/s. • Select 20 from 25 gestures for analysis. • captured at the two candidate devices at 44.1KHz, undersampled7 times to 6.3KHz

Device Selection Accuracy • Performance with Distances and Angles • As the distances between devices increase, the device selection accuracy drops gradually • Since tones and other frequency bands decreases as the distances increase • as decreases, the accuracy of device selection drops.

Performance Under Noisy Conditions • Evaluation metals sounds • played a piece of rock music (i.e. “Burn It Down” of Linkin Park) • metal clangs can hardly reach frequencies above 18KHz, which has limited effect to Spartacus.

Performance with Different Scenario • limited space in these scenarios • Only test to 1.5m with 30degrees. • Distance increase, the performance slight decreases due to the stronger multi-path effects in the Cubicles and Hallway. • All threecases, achieved higher than 85% accuracy.

Interaction Latency • Spartacus: 2014-point FFT processing • takes 1.5s to process a 1s gesture • traditional FFT: 8192-point FFT processing • takes 8.7s

Power Consumption • compare the performance under different duty cycles • fixed each listening session to 200ms • Hardware • Galaxy Nexus mobile phones • Each test time • running low-power listening task for 5min • Result • 4X lower energy consumption than WiFi Direct • 5.5X lower than the latest Bluetooth 4.0 protocols

RELATED WORK(1/2) • Audio Processing in Mobile Sensing • Microphones on Mobile sensing • Miluzzo • human conversation snippets for analyzing social activities • SurroundSense • combined with other sensing modalities • accelerometers, cameras, and magnetometers to detect locations of users for social context inferences • Lu • unknown social events can be automatically identified and easily labeled • Microphones on Energy-efficient • JigSaw and Darwin Phones • enabling energy-efficient continuous sensing and collaborative learning techniques • MoVi • multiple participants to create integrated social event records • SwordFight • Provide distance ranging technique using time difference of sound arrivals

RELATED WORK(2/2) • Spatially-Aware Device Interactions • Point & Connect (P&C) proposed an interaction technique based on time difference of sound arrivals. • Enabling P&C may prevent the users from using their default WiFi networks. • launched the related service and continuously waiting for interaction requests • consume significant energy. • SoundWave • Single-device interactions • the laptop is both the transmitter and the receiver of Doppler effect, the generated frequency shift is doubled. • PANDAA • No extra infrastructure and no extra effort from users to initiate interactions • only supports devices in stationary placements • Polaris • Support spatially-aware indoor device interactions • dealt with only absolute directional relationships of devices

DISCUSSION • Energy-Efficient Interaction Triggers • Be enabled on demand when the energy constraint is not a major concern. • Triggered by other traditional communication schemes, such as Bluetooth or WiFi Direct. • To solve that user has to wait for a couple of seconds for a “warmupbeacon”before doing the gesture in Spartacus • Security Issues • malicious device standing close by could pretend to have detected higher Doppler shifts than other devices, so that it deceives the sender into thinking it was the receiver. • Only trusted and authenticated devices could be allowed to report their Doppler shifts. • After the user’s device determines the potential receiver who has reported the maximal Doppler shifts, the name and identity of receiver’s owner would be shown on the user’s device. • Contentions Among Interaction Sessions • Used in a crowded scenario(ex. airport) • contentions could be an issue for device pairing techniques • Need a contention coordination mechanism

CONCLUSION • Spartacus, a spatially-aware interaction system • High accuracy • Low latency • Low energy consumption • No extra hardware • Zero prior noisy configuration • Use in various conditions. • Experimental evaluations for Spartacus performance

My Question • This paper only document the initial gesture in its experiments? How about other gestures detection that receiver can recognize difference meanings of senders? • If there are many children and adults who have different height and stand close in crowded scenario, how could the system to separate tallest and shortest from all selection targets?

BACKUP Presenter Ivan Chiou

Spartacus: Spatially-Aware Interaction for Mobile Devices Through Energy-Efficient Audio Sensing

Spartacus: Spatially-Aware Interaction for Mobile Devices Through Energy-Efficient Audio Sensing

Presentation Transcript

Mobile Devices and the Mobile Web

SMERT: Energy-Efficient Design of a Multimedia Messaging System for Mobile Devices

Interaction Overview

Electrical Sensing Devices

Interaction Devices

Energy Transformations in devices

A Survey of Context-Aware Mobile Computing Research

ENERGY-PROPORTIONAL IMAGE SENSING FOR

Introduction

Spartacus

Spartacus

Interaction of EM with atmosphere

EnTracked : Energy-Efficient Robust Position Tracking for Mobile Devices

CS378 - Mobile Computing

Context-Aware Interaction Techniques in a Small Device

안녕하세요 (Hello!)

Energy-Efficient, Application-Aware Medium Access for Sensor Networks

Extracting Metadata for Spatially-Aware Information Retrieval on the Internet

The Tradeoff between Energy Efficiency and User State Estimation Accuracy in Mobile Sensing

Energy Efficient Prefetching and Caching

Interaction Devices

An Architecture for Distributed Spatial Configuration of Context Aware Applications