• ### Invited Talks

• #### Indoor Navigation Services from Mobile DataSeptember 25, 2017

###### Location: 21st European Conference on Advances in Databases and Information Systems Hilton Cyprus, Nicosia, Cyprus

The pervasiveness of smartphones is leading to the uptake of a new class of Internet-based Indoor Navigation (IIN) services, which might soon diminish the need of Satellite-based localization technologies in urban environments. These services rely on geo-location databases that store spatial models along with wireless, light and magnetic signals used to localize users and provide better power efficiency and wider coverage than predominant approaches. In this talk I will overview the research behind the building blocks of the Anyplace IIN, an open, modular, extensible and scalable navigation architecture that exploits crowdsourced Wi-Fi data to develop a novel navigation service that won several international research awards for its utility and accuracy (i.e., less than 2 meters). Our MIT-licenced open-source software stack has to this date been used by hundreds of researchers and practitioners around the globe, with the public Anyplace service reaching over 80,000 real user interactions. In the second part of this talk, I will focus on an algorithm we developed for protecting users from location tracking by the IIN service, without hindering the provisioning of fine-grained location updates on a continuous basis. Our algorithm exploits a k-Anonymity Bloom filter and a generator of camouflaged localization requests, both of which are shown to be resilient to a variety of privacy attacks. My talk will conclude with a summary of other recent research work.
• #### Indoor Data Management in AnyplaceJuly 11, 2017

###### Location: Max Planck Institute for Informatics, Saarbrücken, Germany

• #### Indoor Data Management in AnyplaceJune 6, 2017

###### Location: Heidelberg University, Heidelberg, Germany

• #### Indoor Data Management in AnyplaceMay 4, 2017

###### Location: University of Mannheim, Mannheim, Germany

• #### Building All k Nearest Neighbor Social CommunitiesApril 12, 2017

###### Location: Paris Descartes University, Paris, France

A wide spectrum of Internet-scale mobile applications, ranging from social networking, gaming and entertainment to emergency response and crisis management, all require efficient and scalable All k Nearest Neighbor (AkNN) computations over millions of moving objects every few seconds to be operational. Most traditional techniques for computing AkNN queries are centralized, lacking both scalability and efficiency. Only recently, distributed techniques for shared-nothing cloud infrastructures have been proposed to achieve scalability for large datasets. These batch-oriented algorithms are sub-optimal due to inefficient data space partitioning and data replication among processing units. In this talk I will present Spitfire, a distributed algorithm that provides a scalable and high performance AkNN processing framework. The proposed algorithm deploys a fast load-balanced partitioning scheme along with an efficient replication-set selection algorithm, to provide fast main-memory computations of the exact AkNN results in a batch-oriented manner. I will also overview Rayzit, an experimental and open-source mobile AkNN service we established and operate that reached over 45,000 real user interactions to this date.
• #### IoT-enabled Localization and NavigationJune 16, 2016

###### Location: 2016 PERCCOM Summer School, Erasmus Mundus Master on Pervasive Computing & COMmunications for sustainable development, Nancy, France

The advances of Internet-of-Things (IoT) technology in recent years is leading to the uptake of a new class of Internet-based navigation services, which might soon diminish the need of Satellite-based technologies in urban environments. These services rely on geolocation databases that store spatial models along with wireless, light and magnetic signals used to localize users. Developing IoT-based navigation services creates a new spectrum of information management challenges ranging from crowdsourcing indoor models, acquiring and fusing big-data velocity signals, localization algorithms, location privacy of custodians and others. In this talk, I will overview the current landscape of academic and industrial such services using a multi-dimensional taxonomy of emerging topics in this domain, including location, crowdsourcing, privacy and modeling. I present the dimensions of our taxonomy through the lens of an open, modular, extensible and scalable navigation architecture, coined Anyplace, concluding with open challenges.
• #### Spatial Big Data Research and Applications at the University of CyprusMay 16, 2016

###### Location: The First Europe-China Workshop on Big Data Management, Helsinki, Finland

Spatial big data architectures have transformed the way enterprises collect, store and analyze massive amounts of velocity data that features a spatial extend. In this talk, I will summarize an array of research problems and applications that our laboratory has investigated or aims to investigate in this scope. I will start out by overviewing Anyplace (http://anyplace.cs.ucy.ac.cy/), our in-house Internet-based Indoor Navigation service that won several international research awards for its accuracy and utility. Particularly, I will discuss the challenges in processing and visualizing indoor big data (e.g., Wi-Fi and magnetic signals). My talk will be succeeded by a summary of Rayzit (http://rayzit.com/), which is our award-winning location-based crowd messaging service that allows a mobile crowd to instantly connect to their k Nearest Neighbors (kNN) as they move in space. I will particularly summarize Spitfire, which is a distributed algorithm that provides a scalable and high-performance All k Nearest Neighbor processing framework to Rayzit. I will then overview Spate, which is a novel spatio-temporal time machine for telecommunication data (e.g., CDR, NMS, PCHR). Spate supports a range of visual analytic cues (heatmaps, POI clusters, etc.) that enables a Telco operator to quickly compare abstract models to real velocity data but also to monitor and replay network traces using a declarative SQL language that breaks down to SPARK jobs. I will finally also introduce GreenCharge, which is a big data GIS architecture to guide electric vehicles to green energy excess. GreenCharge is expected to maximize self-consumption of electricity by prosumers, contributing in that way to the stability of power grids and a sustainable future. GreenCharge is anticipated to be pilot at the University of Cyprus, a self-sufficient 17GWh/annum producer of solar electricity.
• #### Anyplace Indoor Information ServiceMay 3, 2016

###### Location: Symposium on Challenges of Fingerprinting in Indoor Positioning and Navigation, Open University of Catalonia, Barcelona, Spain

People spend 80-90% of their time in indoor environments such as offices, undergrounds, shopping malls and airports. On the other hand, the uptake of interesting applications in indoor spaces (e.g., navigation, inventory management and elderly support) has so far been hampered by the lack of technologies that can provide indoor location (position) accurately, in real-time, in an energy-efficient manner and without expensive additional hardware. Modern smartphones currently rely on Internet-based Indoor Navigation (IIN) services, which can provide the location of a user upon request. Unfortunately, those technologies are both inaccurate and additionally raise important location privacy concerns, as the IIN can know where the user is at all times. In this talk, I will start out by overviewing the building blocks of Anyplace (http://anyplace.cs.ucy.ac.cy/), our in-house IIN services that recently won several international research awards for its accuracy (i.e., less than 2 meters) and utility. Anyplace deploys a number of innovative concepts, including crowdsourcing, big-data management, energy-aware processing, multi-device optimization and mobile data management, in order to realize a power-efficient and accurate indoor localization and navigation technology. In the second part of this talk, I will focus on an algorithm we developed for protecting users from location tracking by the IIN service, without hindering the provisioning of fine-grained location updates on a continuous basis. Our algorithm exploits a k-Anonymity Bloom filter and a generator of camouflaged localization requests both of which are shown to be resilient to a variety of privacy attacks. In the third part of this talk, I will focus on an innovative framework for accurate fast indoor localization over an intermittently connected WiFi coined Prefetching Localization (PreLoc).
• #### Prefetching Indoor Navigation Structures in AnyplaceMarch 23, 2016

###### Location: University of Nicosia, Nicosia, Cyprus

• #### Internet-based Indoor Navigation ServicesFebruary 25, 2016

###### Location: ACROSS Meeting, Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands

• #### Indoor Data Management in AnyplaceJune 24, 2015

###### Location: Department of Computer Science, University of Pittsburgh, PA USA

• #### Tutorial: Mobile Data Management in Indoor SpacesJune 17, 2015

###### Location: The 16th IEEE International Conference on Mobile Data Management (MDM '15), Pittsburgh, PA, USA.

This advanced seminar presents the fundamental mobile data management concepts behind the realization of innovative indoor information services that deal with all aspects of handling indoor data as a valuable resource, including data modeling, data acquisition, query processing, privacy and energy consumption. The goal is to provide an overview of the emerging field of indoor data management with a particular emphasis on mobile systems. We tackle the topic from a wide range of perspectives: fundamentals, definitions, current state, academic & industrial perspective, reality & visionary scenarios as well as future challenges. The seminar captures the big picture, such that interested researchers and practitioners can expand their study by following the references. Our presentation will be carried out through the lens of an experimental Indoor Information System we developed at the University of Cyprus, coined Anyplace, which has obtained three international awards and was ranked the second most accurate indoor localization technology by Microsoft Research at IEEE/ACM IPSN '14.
• #### Indoor Data Management: Status and ChallengesFebruary 17, 2015

###### Location: Department of Computer Science, University of Cyprus, Nicosia Cyprus

People spend 80-90% of their time in indoor environments such as offices, undergrounds, shopping malls and airports. On the other hand, the uptake of interesting applications in indoor spaces (e.g., navigation, inventory management and elderly support) has so far been hampered by the lack of technologies that can provide indoor location (position) accurately, in real-time, in an energy-efficient manner and without expensive additional hardware. Modern smartphones currently rely on cloud-based Indoor Positioning Services (IPS), which can provide the location of a user upon request but those are both inaccurate and additionally raise important location privacy concerns, as the IPS can know where the user is at all times. In this talk, I will start out by overviewing the building blocks of Anyplace, our in-house IPS that recently won several international research awards for its accuracy (i.e., less than 2 meters) and utility. Anyplace deploys a number of innovative concepts, including crowdsourcing, big-data management, energy-aware processing, multi-device optimization and mobile data management, in order to realize a power-efficient and accurate indoor localization and navigation technology. In the second part of this talk, I will focus on an algorithm we developed for protecting users from location tracking by the IPS, without hindering the provisioning of fine-grained location updates on a continuous basis. Our algorithm exploits a k-Anonymity Bloom filter and a generator of camouflaged localization requests both of which are shown to be resilient to a variety of privacy attacks.
• #### Smartphone Cloud Testbeds and ApplicationsDecember 5, 2014

###### Location: European Institute of Innovation & Technology (EIT), ICT Labs, Budapest Hungary

The explosive number of smartphones with ever growing computing and sensing capabilities have brought a paradigm shift to many traditional domains of the computing field. In this talk, I will present three (3) ongoing testbeds and applications we are developing in-house in the space of smartphone and human-centric systems: In will start out with SmartLab (smartlab.cs.ucy.ac.cy), a Mobile Infrastructure-as-a-Service cloud we have developed and deployed at the University of Cyprus. In SmartLab, an intuitive web-based interface supplies a variety of complex mobile management utilities that provide fine-grained and low-level control over real smartphones, e.g., usage of networking, storage and sensors as well as automated mockup executions. We present our research experiences from using SmartLab in different research settings as well as our envisioned future scenarios for urban-scale deployment, federation issues and security studies. I will continue with Anyplace (anyplace.cs.ucy.ac.cy), our in-house Indoor Positioning Service that recently won several international research awards for its accuracy (i.e., less than 2 meters) and utility. Anyplace deploys a number of innovative concepts, including crowdsourcing, big-data management, energy-aware processing, multi-device optimization and mobile data management, in order to realize a power-efficient and accurate indoor localization and navigation technology. I will conclude with Rayzit (rayzit.com) which is an award-winning location-based crowd messaging service that addresses big-data velocity with parallel algorithms and distributed NoSQL databases.
• #### Indoor Data Management: Status and ChallengesOctober 2, 2014

###### Location: Skolkovo Institute of Science and Technology (Skoltech), Moscow Russia

• #### Managing Smartphone Cloud TestbedsSeptember 30, 2014

###### Location: Distributed Systems Group, Technical University of Vienna, Vienna Austria

The explosive number of smartphones with ever growing computing and sensing capabilities have brought a paradigm shift to many traditional domains of the computing field. Smartphone users nowadays gain access to unprecedented possibilities, knowledge and power due to a diverse landscape of applications. Developers on the other hand are challenged with a fragmented mobile landscape that is extremely dynamic to changes in both hardware and software. Re-programming smartphones and instrumenting them for application testing and data gathering at scale is currently a tedious and time-consuming process that poses significant logistical challenges. In this talk, we present the abstractions comprising SmartLab, a Mobile Infrastructure as a Service cloud we have developed and deployed at the University of Cyprus. In SmartLab, an intuitive web-based interface supplies a variety of complex mobile management utilities that provide fine-grained and low-level control over real smartphones, e.g., usage of networking, storage and sensors as well as automated mockup executions. We present our research experiences from using SmartLab in different research settings as well as our envisioned future scenarios for urban-scale deployment, federation issues and security studies. My talk will be succeeded by a summary of related mobile data management research efforts, namely Anyplace, which is our in-house indoor localization and navigation service; and Rayzit which is our award-winning location-based crowd messaging service.
• #### Mobile Crowdsourcing: Challenges and ApplicationsMarch 31, 2014

###### Location: 7th Webdatanet (COST Action IS1004) Conference on Mobile Research, Larnaca Cyprus.

Crowdsourcing refers to a distributed problem-solving model in which a crowd of undefined size is engaged to solve a complex problem through an open call. This novel problem-solving model found its way into numerous applications on the web for voting, fund-raising, micro-works and wisdom-of-the-crowd scenarios. Multi-sensing capabilities and multi-modal connectivity means of smartphones offer a great platform for extending and diversifying web-based crowdsourcing applications to a larger contributing crowd, making contribution easier and omnipresent. Unfortunately, numerous new challenges ranging from big data stream volume and velocity, to location privacy and energy consumption as well as device diversity issues among other, arise in this new context. In this talk, I will summarize mobile crowdsourcing challenges and applications related to Microblogging Urban Sensing and Indoor Information Systems.
• #### Crowdsourcing Urban Data with SmartphonesMarch 28, 2014

###### Location: The 17th International Conference on Extending Database Technology (EDBT/ICDT), Mining Urban Data (MUD) Workshop, Athens, Greece

Crowdsourcing refers to a distributed problem-solving model in which a crowd of undefined size is engaged to solve a complex problem through an open call. This novel problem-solving model found its way into numerous applications on the web for voting, fund-raising, micro-works and wisdom-of-the-crowd scenarios. Multi-sensing capabilities and multi-modal connectivity means of smartphones offer a great platform for extending and diversifying web-based crowdsourcing applications to a larger contributing crowd, making contribution easier and omnipresent. Unfortunately, numerous new challenges ranging from big data stream volume and velocity, to location privacy and energy consumption as well as device diversity issues among other, arise in this new context. this talk, I will start out by a discussion on how primitive crowdsourcing challenges emerge and evolve into urban data collection scenarios. I will present these new challenges through the lens of some in-house systems we’ve developed over the last few years, particularly: i) a smartphone programming cluster coined SmartLab, which provides means for general-purpose urban-scale sensing scenarios and smart cities; ii) an indoor information system coined Anyplace, which addresses big-data, privacy and energy consumption for building WiFi Radiomaps of indoor places; and iii) a location-based crowd messaging service coined Rayzit which addresses big data velocity with parallel algorithms and distributed nosql databases.
• #### Panel: Large-Scale Participatory Urban Sensing: A Fad or Reality?July 5, 2013

###### Location: 14th IEEE International Conference on Mobile Data Management (MDM '13), Milan, Italy

One of the popular research trends at present focuses on the use of sensor data generated/collected by consumer mobile devices to infer the ‘urban state’. There are a fairly large number of research initiatives that view such a citizen-centric distributed and mobile sensing platform as one of the most promising ways to gather data about various aspects of cities, such as environmental parameters & pollution levels, traffic congestion, popularity of events at various public spaces, etc. there are many skeptics who doubt that this “decentralized, bottom-up” approach can be an effective & commercially viable approach in the long run, due to open challenges in many aspects, such as resource limitations, privacy data quality and incentives. This panel will explore how academia and industry are tackling these challenges and debate on what types of applications are likely to be sustainable under this crowd-sourced paradigm.
• #### Tutorial: Crowdsourcing for Mobile Data ManagementJune 4, 2013

###### Location: The 14th IEEE International Conference on Mobile Data Management (MDM '13), Milan Italy.

Crowdsourcing refers to a distributed problem-solving model in which a crowd of undefined size is engaged to solve a complex problem through an open call. This novel problem-solving model found its way into numerous applications on the web for voting, fund-raising, micro-works and wisdom-of-the-crowd scenarios. On the other hand, the shift of desktop users to mobile platforms in the post-PC era, along with the unique multi-sensing capabilities of modern mobile devices are expected to eventually unfold the full potential of Crowdsourcing. This is true, as smartphones offer a great platform for extending and diversifying web-based crowdsourcing applications to a larger contributing crowd, making contribution easier and omnipresent. This advanced seminar presents the fundamental concepts behind crowdsourcing and its applications to mobile data management. In the first part of the seminar, we will overview the crowdsourcing landscape from a variety of perspectives, with a particular emphasis on the latest data management trends. In the second and more extended part of the seminar, we will focus on an in-depth coverage of emerging mobile crowdsourcing architectures and systems, through a multi-dimensional taxonomy that will address location, sensing, power, performance, big-data and privacy among others. Furthermore, we will overview a number of in-house crowdsourcing prototypes we have developed and deployed over the last few years. The seminar concludes with challenges opportunities and new directions in the field.
• #### Big Data - What is it?March 12, 2013

###### Location: 4th Architect Club Meeting, Nicosia Cyprus

Big data refers to data sets whose size and structure strains the ability of commonly used relational DBMSs to capture, manage, and process the data within a tolerable elapsed time. Big data sizes commonly range from a few dozen terabytes to many petabytes in a single database and their underlying data model might be anything from structured (relational or tabular) to semi-structured (XML or JSON) or even unstructured (Web text and log files). Big data architectures are highly parallel and distributed in order to cope with the inherent I/O and CPU limitations. Such systems typically perform on mid-scale private clouds, offering higher privacy, to large-scale public clouds, both exposing operational and analytic functionality stand-alone or as-a-Service. This talk aims to overview the current big-data management landscape, the underlying technologies and their provenance, the latest NoSQL and NewSQL trends, possible applications of big-data management systems for online and offline processing of sensor data, text data, social data and medical data in enterprise environments. The talk will also overview ongoing big-data research and teaching activities at the University of Cyprus.
• #### Smartphone Sensing: Testbeds and ApplicationsFebruary 11, 2013

###### Location: Workshop on Social Platforms for Urban Sensing, Dept. of Comp. Science, University of Cyprus, Nicosia Cyprus

Smartphone devices have emerged as powerful computational platforms equipped with multitude of sensors that are capable of generating vast amounts of data (geo-location, audio, video, etc.) Collections of such devices connected to the Internet yield Smartphone Networks, which can be utilized for opportunistic and participatory sensing applications in intelligent transportation systems, social networking applications, city planning and others. In this talk, I will present a collection of ongoing testbeds and applications we are developing in-house for this new era of ubiquituous smartphone computing. In particular, I will be presenting SmartLab, which an innovative open programming cloud of 40+ Android devices deployed at our Department over the last three years. SmartLab is the first open Smartphone IaaS (Infrastructure-as-a-Service) cloud that enables fine-grained, low-level interactions over static or moving smartphones via an intuitive web-based interface. Such a testbed provides means for general-purpose urban-scale sensing scenarios and personal gadget management. This talk will also briefly cover other smartphone sensing testbeds and applications we 've developed: a localization engine for fine-grained positioning without GPS, a trajectory comparinson framework with privacy gurranttees, a P2P smartphone searching framework and a neighborhood sensing framework.
• #### Querying Sensor Data in Smartphone NetworksOctober 11, 2012

###### Location: CS Colloquium Series @ UCY, Nicosia Cyprus

Smartphones have emerged as powerful computational platforms equipped with multitude of sensors that are capable of generating vast amounts of data (geo-location, audio, video, etc.) Collections of smartphones connected to the Internet are nowadays proposed for opportunistic and participatory sensing applications in intelligent transportation systems, social networking applications, city planning and many other domains, prompting undeniably the post-PC era. In this talk, I will present distributed architectures for querying and managing such sensor data by taking into account energy, data disclosure and networking aspects. I will particularly focus on SmartTrace, a powerful query processing framework for finding similar smartphone trajectories without disclosing the traces of participating users. I will also present SmartLab, a first-of-a-kind programmable cloud of 40+ smartphones deployed at our department enabling a new line of systems-oriented research on smartphones. Finally, I will also overview other related smartphone data management frameworks we 've developed for peer-to-peer search, crowdsourcing and indoor positioning, concluding with an outlook to our future research agenda.
• #### Data Management Techniques for Smartphone NetworksJune 12, 2011

###### Location: 10th Intl. ACM Workshop on Data Engineering for Wireless and Mobile Access (MobiDE '11), Athens Greece

Smartphone devices have emerged as powerful computational platforms equipped with multitude of sensors that are capable of generating vast amounts of data (geo-location, audio, video, etc.) Collections of such devices connected to the Internet yield Smartphone Networks, which can be utilized for opportunistic and participatory sensing applications in intelligent transportation systems, social networking applications, city planning and others. The uptake of applications in this domain, is currently severely hampered by the fact that these devices have: i) a limited energy budget (i.e., smartphone devices still operate on batteries), ii) limited connectivity (i.e., not all regions offer unlimited Internet connectivity at the same cost); and iii) high privacy constraints (i.e., these devices might reveal the identity and habits of their custodians.) In this talk, I will present a collection of data management techniques that deal with Smartphone Networks. In particular, I will start out with SmartTrace, a powerful framework for finding similar trajectories in a smartphone network without disclosing the traces of the participating users. SmartTrace relies on an in-situ data storage model, where geo-location data is recorded locally on smartphones for both performance and data-disclosure reasons. SmartTrace then deploys an efficient top-K query-processing algorithm that exploits distributed trajectory similarity measures, resilient to spatial and temporal noise, in order to derive the most relevant answers quickly and efficiently. I will then introduce SmartOpt, a multi-objective query optimizer that enables efficient content searches in smartphone networks. I will also introduce Proximity, a spatial neighborhood computation framework for smartphone networks. My talk will be succeeded by the presentation of SmartNet, our in-house programming cloud for smartphone networks
• #### Energy Efficient Data Management in Smartphone NetworksApril 2, 2011

###### Location: US National Science Foundation Workshop on Sustainable Energy Efficient Data Management, Arlington, Virginia USA

Smartphone computational platforms equipped with multitude of sensors and capable of generating vast amounts of data (geo-location, audio, video, etc.) On the other hand, these devices operate on a strict energy budget, thus have a limited lifetime on a single charge. Consequently, we need to identify new energy-aware algorithms and techniques to provide innovative, feature-rich applications and services. In this white paper, we start out by providing recent trends in Smartphone technology and Smartphone networks. Our description is succeeded by an anatomy of the energy costs associated with data processing in a Smartphone Network. We conclude with prominent research directions in energy-aware data management for Smartphone networks.
• #### Querying Smartphone Networks with SmartTraceMarch 29, 2011

###### Location: Dept. of Computer Science, University of Pittsburgh, Pittsburgh, PA, USA, March 29 2011

Smartphone devices have emerged as powerful computational platforms equipped with multitude of sensors that are capable of generating vast amounts of data (geo-location, audio, video, etc.) Collections of such devices connected to the Internet yield Smartphone Networks, which can be utilized for opportunistic and participatory sensing applications in intelligent transportation systems, social networking applications, city planning and others. The uptake of applications in this domain, is currently severely hampered by the fact that these devices have: i) a limited energy budget (i.e., smartphone devices still operate on batteries), ii) limited connectivity (i.e., not all regions offer unlimited Internet connectivity at the same cost); and iii) high privacy constraints (i.e., these devices might reveal the identity and habits of their custodians). In this talk I will present SmartTrace, a powerful framework for finding similar trajectories in a smartphone network, without disclosing the traces of participating users. Our framework, coined SmartTrace, quickly answers queries of the form: “Report the users that move more similar to Q, where Q is some query trace.” SmartTrace relies on an in-situ data storage model, where geo-location data is recorded locally on smartphones for both performance and data-disclosure reasons. SmartTrace then deploys an efficient top-K query-processing algorithm that exploits distributed trajectory similarity measures, resilient to spatial and temporal noise, in order to derive the most relevant answers to Q quickly and efficiently. We assess our propositions with realistic and real workloads from Microsoft Research Asia and other sources. Our study reveals that SmartTrace computes the desired results with 74% less energy consumption and 13% faster than its centralized and decentralized counterparts. My talk will be succeeded by a summary of related research efforts, namely SmartNet, an innovative programming cloud for smartphone networks; and SmartOpt, a multi-objective query optimizer that enables efficient content searches in smartphone networks.
• #### Query Routing Trees for Wireless Sensor NetworksMarch 9, 2011

###### Location: Information Systems, Open University of Cyprus, Nicosia Cyprus

Wireless Sensor Networks offer a non-intrusive and non-disruptive technology that enables users to monitor the physical world at an extremely high fidelity. In order to collect the data generated by these tiny-scale devices, sensors are typically organized in structures coined Query Routing Trees (QRTs). Our study reveals that predominant data acquisition systems construct QRTs in ad-hoc manners leading to a significant waste of energy. In this talk I will present MicroPulse+, a framework for minimizing the consumption of energy during data collection in Sensor Networks. MicroPulse+ eliminates a variety of data transmission and data reception inefficiencies using a collection of in-network algorithms. In particular, MicroPulse+ introduces: i) the Workload-Aware Routing Tree (WART) algorithm, which is established on profiling recent data collection activity and on identifying the bottlenecks using an in-network execution of the critical path method; and ii) the Energy-driven Tree Construction (ETC) algorithm, which balances the workload among nodes and minimizes data collisions. The talk will conclude with an outlook into current and future research work.
• #### Ranking Query Results in a Networked WorldMarch 9, 2011

###### Location: Department of Informatics, University of Athens Greece

In this talk I will present a family of algorithms for Top-k ranking of query results in a distributed environment. A Top-K query focuses on the subset of most relevant answers for two reasons: i) to minimize the cost metric that is associated with the retrieval of all answers; and ii) to improve the quality of the answer set such that the user is not overwhelmed with irrelevant results. I will start out by providing an overview of Top-K query processing algorithms for centralized and middleware systems. I will then highlight the limitations of these algorithms and focus on three novel algorithms we developed designated for networked environments (i.e., Peer-to-Peer Networks, Wireless Sensor Networks and Smartphone Networks). I will also present evaluation studies of these algorithms on: i) a Wireless Sensor Network testbed of 54 sensor devices; ii) a Peer-to-Peer testbed of 1000 peers deployed on 75 linux workstations; and iii) A smartphone network deployment on Android-based smartphone devices. The talk will conclude with an overview of related research problems that I am currently working on and an outlook to future work.
• #### Panel: Data Management in Clouds: Research Challenges and OpportunitiesJuly 3, 2010

###### Location: 9th Hellenic Data Management Symposium (HDMS '10), Ayia Napa, Cyprus

Panel Chair: Anastasia Ailamaki. Other Panelists: Marios Dikaiakos, Evangelia Pitoura, Peter Triantafillou and Akrivi Vlachou
• #### Ranking Query Results in a Networked WorldMay 27, 2010

###### Location: IBM T.J. Watson Research Center, Hawthorne, NY, USA

In this talk I present the fundamental concepts behind distributed Top-K query processing algorithms. A Top-K query focuses on the subset of most relevant answers for two reasons: i) to minimize the cost metric that is associated with the retrieval of all answers; and ii) to improve the quality of the answer set such that the user is not overwhelmed with irrelevant results. I will start out by providing an overview of state-of-the-art Top-K query processing algorithms for centralized and middleware systems. I will then highlight the limitations of these algorithms and focus on two novel algorithms we developed designated for networked environments (i.e., Wireless Sensor Networks, Peer-to-Peer Networks, Vehicular Networks, etc.) I will also present evaluation studies conducted on: i) a Peer-to-Peer testbed of 1000 peers deployed on 75 workstations; ii) a Wireless Sensor Network testbed of 54 sensor devices and iii) A Smartphone Network, deployed on a number of Android-based smartphone devices. The talk will conclude with an overview of related research problems that I am currently working on and an outlook to future applications of the presented ideas.
• #### Spatio-Temporal Query Processing in Smartphone NetworksMay 23, 2010

###### Location: SSPC-WAN Workshop, 11th Intl. Conference on Mobile Data Management (MDM '10), Kansas City, MO, USA

In this presentation I will present a powerful and distributed spatio-temporal query processing framework, coined HUB-K. Our framework can be utilized to promptly answer queries of the form: ' 'Report the objects (i.e., trajectories) that follow a similar spatio-temporal motion to Q, where Q is some query trajectory. ' ' HUB-k, relies on an in-situ data storage model, where spatio-temporal data remains on the smartphone that generated the given data, as well a state-of-the-art top-k query processing algorithms, which exploit distributed trajectory similarity measures in order to identify the correct answers promptly. We present preliminary design choices, an outline of our preliminary implementation and an outlook to future challenges.
• #### Semantic Challenges in (Mobile) Sensor NetworksJanuary 26, 2010

###### Location: Seminar 10042: Semantic Challenges in Sensor Networks, Dagstuhl, Germany

The widespread deployment of mobile phones along with the massive production of sensors for every aspect of modern life provides evidence that Computer Science research and education will evolve dramatically over the next few years. The boundaries of Mobile Devices and Sensor Devices are nowadays blurring as the former devices are already equipped with a multitude of sensing capabilities, including GPS (which enables the derivation of geospatial coordinates), accelerometers (which enable the derivation of orientation, vibration and shock) and an exciting set of other sensors (e.g., proximity sensors, ambient light sensors, while more traditional sensors such as temperature, acoustic, magnetometers and others will be integrated in these devices very soon). That creates the notion of Mobile Sensor Devices that will become even more ubiquitous than their predecessor 'smart-phone ' devices. In this talk, I will provide an overview and definitions of Mobile-Sensor-Network (MSN) related platforms and applications. In particular, I will show how applications in environmental monitoring, body sensor networks, vehicular sensor networks and intelligent transportation systems have brought a dramatic shift on how spatio-temporal data is nowadays generated. I will then outline some semantic challenges that arise in this context including: vastness, uncertainty, data integration, query processing and privacy. I will also address some more general challenges that currently hinder the evolution and uptake of semantic MSNs.
• #### Distributed Top-K Ranking AlgorithmsDecember 15, 2008

###### Location: DAMA Group, Polytechnic University of Catalonia (UPC), Barcelona, Spain

In this talk I will present the fundamental concepts of distributed Top-K query processing algorithms. A Top-K query focuses on a subset of most relevant answers for two reasons: i) to minimize the cost metric that is associated with the retrieval of all answers; and ii) to improve the quality of the answer set such that the user is not overwhelmed with irrelevant results. I will start out by providing an overview of state-of-the-art Top-K query processing algorithms for centralized DBMS systems. I will then highlight the limitations of these algorithms and focus on the Threshold Join Algorithm (TJA), our distributed top-k query processing algorithm designated for distributed computing networks (i.e., Wireless Sensor Networks, Peer-to-Peer Networks, Vehicular Networks, etc.) I will finally present an evaluation study conducted with our middleware system deployed over a network of 1000 peers on 75 workstations.
• #### MicroHash - An Efficient Index Structure for Flash-Based Sensor DevicesDecember 12, 2008

###### Location: IBM Research, Zurich, Switzerland

Wireless Sensor Networks offer a non-intrusive and non-disruptive technology that enables users to monitor the physical world at an extremely high fidelity. Research in this area has to this day primarily focused on the trade-off between local computation and communication in order to minimize the transfer of data over the fundamentally expensive wireless link. On the contrary, we focus on the challenges of storing sensor readings locally at each node. This In-Situ storage paradigm offers a novel perspective for conserving energy in Wireless Sensor Networks as the communication channel is only accessed for answering on-demand queries rather than for percolating each and every event to a centralized database. Storing large quantities of data locally at each sensor has to be complemented by efficient access methods that will speed up the execution of queries when required. In this talk I will present MicroHash, an external memory index structure that is tailored to the distinct characteristics of the most prevalent type of non-volatile memory used in sensor systems, namely flash memory. MicroHash exploits the asymmetric read/write characteristics of flash memory in order to offer high performance indexing and searching capabilities in the presence of energy and storage media lifetime constraints.
• #### An Overview of Distributed Top-K Ranking AlgorithmsDecember 12, 2008

###### Location: Communication Systems Group (CSG), ETH Zurich, Switzerland

In this talk I will present the fundamental concepts of distributed Top-K query processing algorithms. A Top-K query focuses on a subset of most relevant answers for two reasons: i) to minimize the cost metric that is associated with the retrieval of all answers; and ii) to improve the quality of the answer set such that the user is not overwhelmed with irrelevant results. I will start out by providing an overview of state-of-the-art Top-K query processing algorithms for centralized DBMS systems. I will then highlight the limitations of these algorithms and focus on the Threshold Join Algorithm (TJA), our distributed top-k query processing algorithm designated for distributed computing networks (i.e., Wireless Sensor Networks, Peer-to-Peer Networks, Vehicular Networks, etc.) I will finally present an evaluation study conducted with our middleware system deployed over a network of 1000 peers on 75 workstations.

• #### MicroHash - An Efficient Index Structure for Flash-Based Sensor DevicesJanuary 11, 2008

###### Location: Systems and Networking Group, Microsoft Research Cambridge, Cambridge, UK

Wireless Sensor Networks offer a non-intrusive and non-disruptive technology that enables users to monitor the physical world at an extremely high fidelity. Research in this area has to this day primarily focused on the trade-off between local computation and communication in order to minimize the transfer of data over the fundamentally expensive wireless link. On the contrary, we focus on the challenges of storing sensor readings locally at each node. This In-Situ storage paradigm offers a novel perspective for conserving energy in Wireless Sensor Networks as the communication channel is only accessed for answering on-demand queries rather than for percolating each and every event to a centralized database. Storing large quantities of data locally at each sensor has to be complemented by efficient access methods that will speed up the execution of queries when required. In this talk I will present MicroHash, an external memory index structure that is tailored to the distinct characteristics of the most prevalent type of non-volatile memory used in sensor systems, namely flash memory. MicroHash exploits the asymmetric read/write characteristics of flash memory in order to offer high performance indexing and searching capabilities in the presence of energy and storage media lifetime constraints.
• #### Content-Based Search in Internet-Scale Peer-to-Peer SystemsDecember 28, 2006

###### Location: Department of Electronic, Computer and Software Systems (ECS), KTH - Royal Institute of Technology, Stockholm, Sweden

The emerging Peer-to-Peer (P2P) model has become a very powerful and attractive paradigm for developing Internet-scale services for sharing resources, including files and documents. The distributed nature of these systems, where nodes are typically located across different networks and domains, inherently hinders the efficient retrieval of information. In this talk I will present techniques to perform content-based search over data repositories that are geographically scattered over peers of different networks. Data repositories in this context contain documents of text, audio, video or other semi-structured data and the task is to locate a certain set of keywords or multimedia features. We present the components of the pFusion architecture, an open source system that builds on work in unstructured P2P systems and topologically-aware overlay construction techniques. Our empirical results using datasets from AKAMAI, NLANR and TREC, show that the architecture we propose is both efficient and practical. In this talk I will also overview other related research activities in Grid, P2P and Sensor systems that we are currently involved in.
• #### Top-K Query Processing Techniques for Distributed EnvironmentsJune 8, 2006

###### Location: Institute of Computer Science (ICS) of the Foundation for Research and Technology Hellas (FORTH), Crete, Greece

Emerging applications in Sensor and Peer-to-Peer networks make the concept of data integration without centralization nowadays more meaningful than ever. In these environments, data is generated continuously and potentially automatically across geographically diverse locations. Organizing data in centralized repositories is becoming prohibitively expensive and in many occasions impractical. Storing data in-situ however, complicates query processing because data relations are fragmented over a number of remote sites. Furthermore, accessing these fragmented relations is only feasible by traversing a network of other nodes. This makes the execution of a query an even more complex task. We claim that in many occasions it might more beneficial to find the K highest ranked (or Top-K) answers, for some user defined parameter K, if this can minimize the query execution cost. In this talk, I will present techniques to efficiently answer Top-K queries in a distributed environment. A Top-K query returns the K highest ranked answers to a user defined similarity function. At the same time it also minimizes some cost metric, such as the utilization of the communication medium, which is associated with the retrieval of the desired answer set. I will provide an overview of state-of-the-art algorithms that solve the Top-K problem in a centralized setting and show why these are not applicable to the distributed case. I will then focus on the Threshold Join Algorithm (TJA), which is a novel solution for executing Top-K queries in a distributed environment. I will also present results from our performance study with a real middleware testbed deployed over a network of 75 workstations.
• ### Other Talks

• #### Towards Real-Time Road Traffic Analytics using Telco Big DataAugust 28, 2017

###### Location: 11th Intl. Workshop on Real-Time Business Intelligence and Analytics, collocated with VLDB 2017 (BIRTE '17), Munich, Germany

A telecommunication company (telco) is traditionally only perceived the entity that provides telecommunication services, such as and data communication access to users. However, the backbone infrastructure of such entities spanning densely urban and widely rural areas, provides nowadays a unique to collect immense amounts of mobility data that can valuable insights for road trac management and avoidance. this paper we outline the components of the Trac-TBD Telco Big Data) architecture, which aims to become an innovative trac analytic and prediction system with the following i) provide micro-level trac modeling and prediction goes beyond the current state provided by Internet-based enterprises utilizing crowdsourcing; ii) retain the location boundaries of users inside their mobile network operator, avoid the risks of exposing location data to third-party mobile and iii) be available with minimal costs and using infrastructure (i.e., cell towers and TBD data streams are available inside a telco). Road trac understanding, management analytics can minimize the number of road accidents, fuel and energy consumption, avoid unexpected delays, to a macroscopic spatio-temporal understanding of traf- in cities but also to “smart” societies through applications in city public transportation logistics and eet management for startups and governmental bodies.
• #### Data-driven Serendipity Navigation in Urban PlacesJune 7, 2017

###### Location: 37th IEEE International Conference on Distributed Computing Systems (ICDCS 2017), Atlanta, GA, USA.

With the proliferation of mobile computing and the ability to collect detailed data for the urban environment a number of systems that aim at providing Points of Interest (POIs) and tour recommendations have appeared. The overwhelming majority of these systems aims at providing an optimal recom- mendation, where optimality refers to objectives of minimizing the distance to be covered or maximizing the quality of the POIs recommended. A major problem is that by focusing on the optimization of these objectives, there remains little room to the user for serendipity. Urban and social scientists have identified serendipity, i.e., the ability to come across unexpected places, as a feature that makes a city livable. In this work, we introduce a prototype of an experimental platform for evaluating venue recommendation algorithms by providing informative tour recommendations based on the suggested venues. Our prototype system integrates the notion of serendipity in urban navigation at both the venue as well as the route recommendation level without compromising the quality and diversity of the recommended POIs. In addition our system allows the user to upload their own algorithms and explore their performance as compared to many well-known algorithms.
• #### Indoor Localization Accuracy Estimation from Fingerprint DataMay 31, 2017

###### Location: 18th IEEE International Conference on Mobile Data Management (MDM '17), KAIST, Daejeon, South Korea.

The demand for indoor localization services has led to the development of techniques that create a Fingerprint Map (FM) of sensor signals (e.g., magnetic, Wi-Fi, bluetooth) at designated positions in an indoor space and then use FM as a reference for subsequent localization tasks. With such an approach, it is crucial to assess the quality of the FM before deployment, in a manner disregarding data origin and at any location of interest, so as to provide deployment staff with the information on the quality of localization. Even though FM-based localization algorithms usually provide accuracy estimates during system operation (e.g., visualized as uncertainty circle or ellipse around the user location), they do not provide any information about the expected accuracy before the actual deployment of the localization service. In this paper, we develop a novel frame- work for quality assessment on arbitrary FMs coined ACCES. Our framework comprises a generic interpolation method using Gaussian Processes (GP), upon which a navigability score at any location is derived using the Cramer-Rao Lower Bound (CRLB). Our approach does not rely on the underlying physical model of the fingerprint data. Our extensive experimental study with magnetic FMs, comparing empirical localization accuracy against derived bounds demonstrates that the navigability score closely matches the accuracy variations users experience.
• #### ACCES: Offline Accuracy Estimation for Fingerprint-Based LocalizationMay 30, 2017

###### Location: 18th IEEE International Conference on Mobile Data Management (MDM '17), KAIST, Daejeon, South Korea.

In this demonstration we present ACCES, a novel framework that enables quality assessment of arbitrary fin- gerprint maps and offline accuracy estimation for the task of fingerprint-based indoor localization. Our framework considers collected fingerprints disregarding the physical origin of the data. First, it applies a widely used statistical instrument, namely Gaussian Process Regression (GPR), for interpolation of the fingerprints. Then, to estimate the best possibly achievable localization accuracy at any location, it utilizes the Cramer-Rao Lower Bound (CRLB) with interpolated data as an input. Our demonstration entails a standalone version of the popular and open-source Anyplace Internet-based indoor navigation service in which the software modules of ACCES are integrated. At the conference, we will present the utility of our method in two modes: (i) Collection Mode, where attendees will be able to use our service directly to collect signal measurements over the venue using an Android smartphone; and (ii) Reflection Mode where attendees will be able to observe the collected measurements and the respective ACCES accuracy estimations in the form of an overlay heatmap.

• #### SPATE: Compacting and Exploring Telco Big DataApril 20, 2017

###### Location: 33rd IEEE International Conference on Data Engineering (ICDE '17), San Diego, CA, USA.

In this demonstration paper, we present SPATE, innovative telco big data exploration framework whose are two-fold: (i) minimizing the storage space needed incrementally retain data over time; and (ii) minimizing the time for spatiotemporal data exploration queries over data. Our framework deploys lossless data compression ingest streams of telco big data in the most compact manner full resolution for data exploration tasks. We augment storage structures with decaying principles that lead to progressive loss of detail as information gets older. Our also includes visual and declarative interfaces for a of telco-specific data exploration tasks. We demonstrate in two modes: (i) Visual Mode, where attendees will able to interactively explore synthetic telco traces we will and (ii) SQL Mode where attendees can submit custom queries based on a provided schema.
• #### Efficient Exploration of Telco Big Data with Compression and DecayingApril 19, 2017

###### Location: 33rd IEEE International Conference on Data Engineering (ICDE '17), San Diego, CA, USA.

In the realm of smart cities, telecommunication companies (telcos) are expected to play a protagonistic role as these can capture a variety of natural phenomena on an ongoing basis, e.g., traffic in a city, mobility patterns for emergency response or city planning. The key challenges for telcos in this era is to ingest in the most compact manner huge amounts of network logs, perform big data exploration and analytics on the generated data within a tolerable elapsed time. This paper introduces SPATE, an innovative telco big data exploration framework whose objectives are two-fold: (i) minimizing the storage space needed to incrementally retain data over time; and (ii) minimizing the response time for spatiotemporal data exploration queries over recent data. The storage layer of our framework uses lossless data compression to ingest recent streams of telco big data in the most compact manner retaining full resolution for data exploration tasks. The indexing layer of our system then takes care of the progressive loss of detail in information, coined decaying, as data ages with time. The exploration layer provides visual means to explore the generated spatio-temporal information space. We measure the efficiency of the proposed framework using a 5GB anonymized real telco network trace and a variety of telco-specific tasks, such as OLAP and OLTP querying, privacy-aware data sharing, multivariate statistics clustering and regression. We show that out framework can achieve comparable response times to the state-of-the-art using an order of magnitude less storage space.

• #### Scalable Mockup Experiments on Smartphones using SmartLabApril 16, 2015

###### Location: 16th IEEE International Conference on Mobile Data Management (MDM 2015), Pittsburgh, PA, USA

In this paper we present a comprehensive architecture to carry out experimental repeatability studies on clusters of smartphones. Our architecture is founded on SmartLab (http://smartlab.cs.ucy.ac.cy/), our in-house architecture for managing real and virtual smartphones via an intuitive Web user interface. Our presented architecture consists of several exciting components for re-programming and instrumenting smartphones to perform application testing and data gathering in a facile manner as well as executing mockup experiments by “feeding” the devices with GPS/sensor readings. We will particularly demonstrate the various components of our architecture that encompasses smartphone sensor data collected by mobile users and organized in our distributed NoSQL document store. The given datasets can then be replayed on our testbed comprising of real and virtual smartphones accessible to developers through our Web 2.0 user interface. We present the applicability of our architecture through various mockup experiments over different application scenarios.
• #### Anyplace: A Crowdsourced Indoor Information ServiceApril 16, 2015

###### Location: 16th IEEE International Conference on Mobile Data Management (MDM 2015), Pittsburgh, PA, USA

People do most of their activities, business, commerce, entertainment and socializing indoors. As all of these are increasingly aided by online services and indoor spaces are becoming bigger and more complex, there is a growing need for cost-effective indoor localization, mapping, navigation and information services. In this paper, we present a complete Indoor Information Service, coined Anyplace (http://anyplace.cs.ucy.ac.cy/), which has an open, modular, extensible and scalable architecture, making it ideal for a wide range of applications. Our service features three highly desirable properties, namely crowdsourcing, scalability and accuracy. Anyplace implements a set of crowdsourcing-supportive mechanisms to handle the enormous amount of crowd-sensed data, filter incorrect user contributions and exploit Wi-Fi data from heterogeneous mobile devices. Moreover, it uses a big-data architecture for efficient storage and retrieval of localization and mapping data. Finally, our service relies on the abundance of sensory data on smartphones (e.g. Wi-Fi signal strength and inertial measurements) to deliver reliable indoor geolocation information that received several international awards.
• #### Rayzit: An Anonymous and Dynamic Crowd Messaging ArchitectureApril 15, 2015

###### Location: 3rd IEEE International Workshop on Mobile Data Management, Mining, and Computing on Social Networks, collocated with IEEE MDM '15 (Mobisocial '15), Pittsburgh, PA, USA

The smartphone revolution has introduced a new era of social networks where users communicate over anonymous messaging platforms to exchange opinions, ideas and even carry out commerce. These platforms enable individuals to establish social interactions between strangers based on a common interest or attribute. In this paper we present Rayzit, a novel anonymous crowd messaging architecture, which utilizes the location of each user to connect them instantly to their k Nearest Neighbors (kNN) as they move in space. Contrary to the very large body of location-based social networks that suffer from bootstrapping issues our architecture enables a user to always interact with the geographically closest possible users around. We establish this communication using a fast computation of an All kNN query that generates a dynamic global social graph every few seconds. We present motivating application scenarios and the detailed backend architecture that allows Rayzit to scale. We have collected and analyzed data from the interactions of thousands of active users and confirm our claims.
• #### Crowdsourced Indoor Localization and Navigation with AnyplaceApril 16, 2014

###### Location: 13th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2014), Berlin, Germany

In this demonstration paper, we present the Anyplace system that relies on the abundance of sensory data on smartphones (e.g., WiFi signal strength and inertial mea- surements) to deliver reliable indoor geolocation information. Our system features two highly desirable properties, namely crowdsourcing and scalability. Anyplace implements a set of crowdsourcing-supportive mechanisms to handle the enormous amount of crowdsensed data, filter incorrect user contributions and exploit WiFi data from heterogeneous mobile devices. More- over Anyplace follows a big-data architecture for efficient and scalable storage and retrieval of localization and mapping data.
• #### Sensor Mockup Experiments with SmartLabApril 16, 2014

###### Location: 13th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2014), Berlin, Germany

In this demonstration paper we present SmartLab, an architecture for managing a cluster of both Android Real De- vices (ARDs) and Android Virtual Devices (AVDs) via an intuitive web-based interface. Our architecture consists of several exciting components for re-programming and instrumenting smartphones to perform application testing and data gathering in a facile manner as well as executing mockup experiments by “feeding” the devices with GPS/sensor readings. We will particularly demonstrate the various components of our architecture that encompasses smartphone sensor data collected by mobile users and organized in our distributed NoSQL document store. The given datasets can then be replayed on our testbed comprising of real and virtual smartphones accessible to developers through our Web 2.0 user interface. We present the applicability of our architecture through various mockup experiments over different application scenarios.
• #### Managing Smartphone Testbeds with SmartLabNovember 7, 2013

###### Location: 27th USENIX Large Installation System Administration Conference (LISA '13), Washington D.C., USA

The explosive number of smartphones with ever growing sensing and computing capabilities have brought a paradigm shift to many traditional domains of the computing field. Re-programming smartphones and instrumenting them for application testing and data gathering at scale is currently a tedious and time-consuming process that poses significant logistical challenges. In this paper, we make three major contributions: First, we propose a comprehensive architecture, coined SmartLab, for managing a cluster of both real and virtual smartphones that are either wired to a private cloud or connected over a wireless link. Second, we propose and describe a number of Android management optimizations (e.g., command pipelining, screen-capturing, file management), which can be useful to the community for building similar functionality into their systems. Third we conduct extensive experiments and microbenchmarks to support our design choices providing qualitative evidence on the expected performance of each module comprising our architecture. This paper also overviews experiences of using SmartLab in a research-oriented setting and also ongoing and future development efforts.

• #### CLODA: A Crowdsourced Linked Open Data ArchitectureJune 3, 2013

###### Location: 1st IEEE Intl. Workshop on Mobile Data Management, Mining, and Computing on Social Networks (MobiSocial) with IEEE MDM '13, June 3, 2013, Milan Italy

In this paper we present our Crowdsourced Linked Open Data Architecture (CLODA), a first attempt to combine crowdsourcing, localization and location-based services to gener- ate, collect, validate and relate real-world, geo-spatial and multi- dimensional information using smartphones and other mobile devices. CLODA focuses on the construction of URI addressable, interlinked and semi-structured data following the Linked-Open Data (LOD) paradigm. The validity of the constructed data is then contributed by a participating crowd. We present our prototype implementation on top of Google Maps and a blend of in-house technologies, particularly our indoor positioning framework, coined Airplace, our trajectory similarity framework, coined SmartTrace, our neighborhood detection framework, coined Proximity and our smartphone testing platform coined SmartLab.
• #### Big Data - What is it?March 19, 2013

###### Location: EPL671 Course, Department of Computer Science, University of Cyprus, Nicosia Cyprus

Big data refers to data sets whose size and structure strains the ability of commonly used relational DBMSs to capture, manage, and process the data within a tolerable elapsed time. Big data sizes commonly range from a few dozen terabytes to many petabytes in a single database and their underlying data model might be anything from structured (relational or tabular) to semi-structured (XML or JSON) or even unstructured (Web text and log files). Big data architectures are highly parallel and distributed in order to cope with the inherent I/O and CPU limitations. Such systems typically perform on mid-scale private clouds, offering higher privacy, to large-scale public clouds, both exposing operational and analytic functionality stand-alone or as-a-Service. This talk aims to overview the current big-data management landscape, the underlying technologies and their provenance, the latest NoSQL and NewSQL trends, possible applications of big-data management systems for online and offline processing of sensor data, text data, social data and medical data in enterprise environments. The talk will also overview ongoing big-data research and teaching activities at the University of Cyprus.
• #### Continuous all k-nearest neighbor querying in smartphone networksJuly 24, 2012

###### Location: The 13th IEEE International Conference on Mobile Data (IEEE MDM '12), Bangalore India

Consider a centralized query operator that identiﬁes to every smartphone user its k geographically nearest neighbors at all times, a query we coin Continuous All k-Nearest Neighbor (CAkNN). Such an operator could be utilized to enhance public emergency services, allowing users to send SOS beacons out to the closest rescuers, allowing gamers and social networking users to establish ad-hoc overlay communication infrastructures, in order to carry out complex interactions. In this paper, we study the problem of efﬁciently processing a CAkNN query in a cellular or WiFi network, both of which are ubiquitous. We introduce an algorithm, coined Proximity, which answers CAkNN queries in O(n(k+λ)) time, where n denotes the number of users and λ a network-speciﬁc parameter (λ << n). Proximity does not require any additional infrastructure or specialized hardware and its efﬁciency is mainly attributed to a smart search space sharing technique we introduce. Its implementation is based on a novel data structure, coined k+heap, which achieves constant O(1) look-up time and logarithmic O(log(k*λ)) insertion/update time. Proximity, being parameter-free, performs efﬁciently in the face of high mobility and skewed distribution of users (e.g., the service works equally well in downtown, suburban, or rural areas). We have evaluated Proximity using mobility traces from two sources and concluded that our approach performs at least one order of magnitude faster than adapted existing work.
• #### Towards planet-scale localization on smartphones with a partial radiomapJune 25, 2012

###### Location: 4th ACM international workshop on Hot topics in planet-scale measurement ' (HotPlanet '12), in conjunction with MobiSys '12, Lake District UK

The majority of smartphone localization systems useAssistedGPS for ﬁne-grained localization in outdoor spaces or WiFibased RSS (Received Signal Strength) technologies for coarsegrain positioning in indoor and outdoor spaces. The former consumes precious energy from mobile devices, is strictly affected by the environment (e.g., cloudy day, forests, etc.) and does not work in indoor spaces. The latter collects RSS from WiFi beams within a user’s vicinity and transfers an RSS vector to the server for localization, in which the position of the user is disclosed possibly violating users’ privacy. In this paper, we present BloomMap, an innovative and eﬃcient algorithm that conducts a localization process without unveiling the user’s location to the localization service, minimizing the energy consumption of the mobile unit and also minimizing the network traﬃc by not transferring large positioning structures to the client (i.e., known as radiomap). Our framework is designed for planet-scale RSS localization scenarios, which are expected to emerge in the near-future. In particular, a user may localize itself using a subset of a vast data repository of RSS signals that is updated in real time by smartphone wardrivers. Our preliminary evaluation shows that our propositions can localize a device without unveiling its location in approximately 80% less time, energy and network resources than competitive approaches. We also describe our WiFi-based prototype system developed on the Android OS.
• #### Disclosure-free GPS Trace Search in Smartphone NetworksJune 7, 2011

###### Location: The 12th IEEE International Conference on Mobile Data (IEEE MDM '11), Lulea Sweden

In this paper we present a powerful distributed framework for finding similar trajectories in a smartphone network, without disclosing the traces of participating users. Our framework, coined SmartTrace, exploits opportunistic and participatory sensing in order to quickly answer queries of the form: 'Report the users that move more similar to Q, where Q is some query trace '. SmartTrace, relies on an in-situ data storage model, where geo-location data is recorded locally on smartphones for both performance and data-disclosure reasons. SmartTrace then deploys an efficient top-K query processing algorithm that exploits distributed trajectory similarity measures, resilient to spatial and temporal noise, in order to derive the most relevant answers to Q quickly and efficiently. We assess our ideas with realistic and real workloads from Microsoft Research Asia and other sources. Our study reveals that SmartTrace computes the desired results with 74% less energy consumption and 13% faster than its centralized and decentralized counterparts. Our experimental results also confirm our analytical study.
• #### Multi-Objective Query Optimization in Smartphone Social NetworksJune 7, 2011

###### Location: The 12th IEEE International Conference on Mobile Data (IEEE MDM '11), Lulea Sweden

The bulk of social network applications for smartphones (e.g., Twitter, Facebook, Foursquare, etc.) currently rely on centralized or cloud-like architectures in order to carry out their data sharing and searching tasks. Unfortunately, the given model introduces both data-disclosure concerns (e.g., disclosing all captured media to a central entity) and performance concerns (e.g., consuming precious smartphone battery and bandwidth during content uploads). In this paper, we present a novel framework, coined SmartOpt, for searching objects (e.g., images, videos, etc.) captured by the users in a mobile social community. Our framework, is founded on an in-situ data storage model, where captured objects remain local on their owner 's smartphones and searches then take place over a novel lookup structure we compute dynamically, coined the Multi-Objective Query Routing Tree (MO-QRT). Our structure concurrently optimizes several conflicting objectives (i.e., it minimizes energy consumption, minimizes search delay and maximizes query recall), using a Multi-objective Evolutionary Algorithm based on Decomposition (MOEA/D) that calculates a diverse set of high quality non-dominated solutions in a single run. We assess our ideas with mobility patterns derived by Microsoft 's Geolife project and social patterns derived by DBLP. Our study reveals that SmartOpt can yield query recall rates of 95%, with one order of magnitude less time and two orders of magnitude less energy than its competitors.
• #### Query Routing Trees for Wireless Sensor NetworksFebruary 15, 2011

###### Location: EPL671 Course, Department of Computer Science, University of Cyprus, Nicosia Cyprus

Wireless Sensor Networks offer a non-intrusive and non-disruptive technology that enables users to monitor the physical world at an extremely high fidelity. In order to collect the data generated by these tiny-scale devices, sensors are typically organized in structures coined Query Routing Trees (QRTs). Our study reveals that predominant data acquisition systems construct QRTs in ad-hoc manners leading to a significant waste of energy. In this talk I will present MicroPulse+, a framework for minimizing the consumption of energy during data collection in Sensor Networks. MicroPulse+ eliminates a variety of data transmission and data reception inefficiencies using a collection of in-network algorithms. In particular, MicroPulse+ introduces: i) the Workload-Aware Routing Tree (WART) algorithm, which is established on profiling recent data collection activity and on identifying the bottlenecks using an in-network execution of the critical path method; and ii) the Energy-driven Tree Construction (ETC) algorithm, which balances the workload among nodes and minimizes data collisions. The talk will conclude with an outlook into current and future research work.
• #### MHS: Minimum-Hot-Spot Query Trees for Wireless Sensor NetworksJune 6, 2010

###### Location: The 9th International ACM Workshop on Data Engineering for Wireless and Mobile Access (MobiDE '10), withACM SIGMOD/PODS10, Indianapolis, Indiana USA

We present a novel distributed algorithm (MHS) that constructs a query routing tree that minimizes collisions during query execution. It was shown in previous work that minimizing collisions during query execution saves significant amount of energy[1]. In the same paper it is shown that balancing the node degrees of a query routing tree significantly reduces collisions during query execution. We address the inefficiencies of the previously proposed algorithm and propose a simpler, purely distributed, parameter-free, cheaper and more efficient algorithm. Our resulting query trees are optimally balanced, guarantee minimum collisions and minimum latency for query execution and allow for opportunistic in-network processing. MHS poses the minimum possible communication overhead to the network and is parameter-free as opposed to previously proposed algorithms. Our proposed algorithm can be used for acquiring data from the nodes of any distributed systems where the main objective is to minimize the communication cost.

• #### FSort: External Sorting on Flash-based Sensor DevicesAugust 24, 2009

###### Location: The 6th Intl. Workshop on Data Management for Sensor Networks (DMSN09), with VLDB09, Lyon France

In long-term deployments of Wireless Sensor Networks, it is often more efficient to store sensor readings locally at each device and transmit those readings to the user only when requested (i.e., in response to a user query). Many of the techniques that collect information from a sensor network require that the data is sorted on some attribute (e.g., range queries, top-k queries, join queries, etc.) Yet, the underlying storage medium of these devices (i.e., Flash media) presents some unique characteristics which renders traditional disk-based sorting algorithms inefficient in this context. In this paper we devise the FSort algorithm, an efficient external sorting algorithm for flash-based sensor devices with a small memory footprint. FSort minimizes the expensive write/delete operations of flash memory minimizing in that way the consumption of energy. In particular, FSort uses a top-down replacement selection algorithm in order to produce sorted runs on flash media in a log-based manner. Sorted runs are then recursively merged in order to yield the sorted result. Our experimentation with real traces from Intel Research Berkeley show that FSort greatly outperforms the traditional External Mergesort Algorithm both in regards to time and energy consumption. We found similar advantages in regards to the wearability constraints of flash media.
• #### Perimeter-based Data Acquisition and Replication in Mobile Sensor NetworksMay 20, 2009

###### Location: The 10th International Conference on Mobile Data Management (MDM '09), Taipei Taiwan

This paper assumes a set of n mobile sensors that move in the Euclidean plane as a swarm. Our objectives are to explore a given geographic region by detecting spatio-temporal events of interest and to store these events in the network until the user requests them. Such a setting finds applications in mobile environments where the user (i.e., the sink) is infrequently within communication range from the field deployment. Our framework, coined SenseSwarm, dynamically partitions the sensing devices into perimeter and core nodes. Data acquisition is scheduled at the perimeter, in order to minimize energy consumption, while storage and replication takes place at the core nodes which are physically and logically shielded to threats and obstacles. To efficiently identify the nodes laying on the perimeter of the swarm we devise the Perimeter Algorithm (PA), an efficient distributed algorithm with a low communication complexity. For storage and fault-tolerance we devise the Data Replication Algorithm (DRA), a voting-based replication scheme that enables the exact retrieval of events from the network in cases of failures. Our trace-driven experimentation shows that our framework can offer significant energy reductions while maintaining high data availability rates. In particular, we found that when failures are less than 60\% failure then we can recover over 80\% of generated events exactly.
• #### ETC: Energy-driven Tree Construction in Wireless Sensor NetworksMay 20, 2009

###### Location: SenTIE '09 workshop, with IEEE MDM '09, Taipei Taiwan

Continuous queries in Wireless Sensor Networks (WSNs) are founded on the premise of Query Routing Tree structures (denoted as T ), which provide sensors with a path to the querying node. Predominant data acquisition systems for WSNs construct such structures in an ad-hoc manner and therefore there is no guarantee that a given query workload will be distributed equally among all sensors. That leads to data collisions which represent a major source of energy waste. In this paper we present the Energy-driven Tree Construction (ETC) algorithm, which balances the workload among nodes and minimizes data collisions, thus reducing energy consumption, during data acquisition in WSNs. We show through real micro-benchmarks on the CC2420 radio chip and trace-driven experimentation with real datasets from Intel Research and UC-Berkeley that ETC can provide significant energy reductions under a variety of conditions prolonging the longevity of a wireless sensor network.
• #### Indexing and Searching in Wireless Sensor NetworksFebruary 14, 2008

###### Location: Department of Computer Science, University of Cyprus, Nicosia Cyprus

Wireless Sensor Networks offer a non-intrusive and non-disruptive technology that enables users to monitor the physical world at an extremely high fidelity. Research in this area has to this day primarily focused on the trade-off between local computation and communication in order to minimize the transfer of data over the fundamentally expensive wireless link. On the contrary, we focus on the challenges of storing sensor readings locally at each node. This In-Situ storage paradigm offers a novel perspective for conserving energy in Wireless Sensor Networks as the communication channel is only accessed for answering on-demand queries rather than for percolating each and every event to a centralized database. Storing large quantities of data locally at each sensor has to be complemented by efficient access methods that will speed up the execution of queries when required. In this talk I will provide an overview of recent developments in Wireless Sensor Network Technology and highlight some important data indexing and searching challenges that arise in this context. In particular, I will present MicroHash which is an external memory index structure that is tailored to the distinct characteristics of flash memory, the most prevalent type of non-volatile memory used in sensor systems.
• #### ICGrid: Towards a Grid Infrastructure for Intensive Care UnitsJanuary 21, 2008

###### Location: Intensive Care Forum, Hilton, Nicosia Cyprus

ICGrid (Intensive Care Grid) is a distributed platform that enables the seamless integration, correlation and retrieval of clinically interesting episodes across Intensive Care Units, which is currently under development by our group. Such a task requires huge processing and data storage capabilities, which are common attributes of Grid infrastructures. ICGrid is based on a hybrid architecture that combines i) a heterogeneous set of monitors that sense the inpatients and ii) Grid technology that enables the storage, processing and information sharing task between Intensive Care Units.
• #### Grid Failure Monitoring and Ranking using FailRankJanuary 15, 2008

###### Location: Coregrid Network of Excellence, Paris France

The objective of Grid computing is to make processing power as accessible and easy to use as electricity and water. The last decade has seen an unprecedented growth in Grid infrastructures which nowadays enables large-scale deployment of applications in the scientific computation domain. One of the main challenges in realizing the full potential of Grids is to make these systems \em dependable. In this presentation we present \em FailRank a nove
• #### SenseSwarm: A Perimeter-based Data Acquisition Framework for Mobile Sensor NetworksSeptember 24, 2007

###### Location: The 4th Intl. Workshop on Data Management for Sensor Networks (DMSN '07), with VLDB '07, Vienna Austria

This paper assumes a set of $n$ mobile sensors that move in the Euclidean plane as a swarm. Our objectives are to explore a given geographic region by detecting and aggregating spatio-temporal events of interest and to store these events in the network until the user requests them. Such a setting finds applications in environments where the user (i.e., the sink) is infrequently within communication range from the field deployment. Our framework, coined SenseSwarm, dynamically partitions the sensing devices into perimeter and core nodes. Data acquisition is scheduled at the perimeter in order to minimize energy consumption while storage and replication takes place at the core nodes which are physically and logically shielded to threats and obstacles. To efficiently identify the perimeter of the swarm we devise the Perimeter Algorithm (PA), an efficient distributed algorithm with a message complexity of O(p + n), where p denotes the number of nodes on the perimeter and $n$ the overall number of nodes. For storage and replication we devise a spatio-temporal in-network aggregation scheme based on minimum bounding rectangles and minimum bounding cuboids. Our trace-driven experimentation shows that our framework can offer significant energy reductions while maintaining high data availability rates.
• #### Distributed Spatio-Temporal Similarity SearchJuly 4, 2007

###### Location: Cyprus Summer School on Intelligent Systems, Department of Computer Science, Nicosia Cyprus

In this talk I will introduce the distributed spatio-temporal similarity search problem: given a query trajectory Q, we want to find the trajectories that follow a motion similar to Q, when each of the target trajectories is segmented across a number of distributed nodes. We propose two novel algorithms, UB-K and UBLB-K, which combine local computations of lower and upper bounds on the matching between the distributed subsequences and Q. Such an operation generates the desired result without pulling together all the distributed subsequences over the fundamentally expensive communication medium. Our solutions find applications in a wide array of domains, such as cellular networks, wildlife monitoring and video surveillance. Our experimental evaluation using realistic data demonstrates that our framework is both efficient and robust to a variety of conditions. In this talk, I will also present techniques to efficiently answer Top-K queries in a distributed environment. A Top-K query returns the K highest ranked answers to a user defined similarity function. At the same time it also minimizes some cost metric, such as the utilization of the communication medium, which is associated with the retrieval of the desired answer set. I will provide an overview of state-of-the-art algorithms that solve the Top-K problem in a centralized setting and show why these are not applicable to the distributed case. I will then focus on the Threshold Join Algorithm (TJA), which is a novel solution for executing Top-K queries in a distributed environment. I will also present results from our performance study with a real middleware testbed deployed over a network of 75 workstations.
• #### FailRank: Towards a Unified Grid Failure Monitoring and Ranking SystemJune 12, 2007

###### Location: CoreGRID Workshop on Grid Programming Model Grid and P2P Systems Architecture Grid Systems, Tools and Environments, Crete Greece

The objective of Grid computing is to make processing power as accessible and easy to use as electricity and water. The last decade has seen an unprecedented growth in Grid infrastructures which nowadays enables large-scale deployment of applications in the scientific computation domain. One of the main challenges in realizing the full potential of Grids is to make these systems \em dependable. In this paper we present \em FailRank a nove
• #### The MicroPulse Framework for Adaptive Waking Windows in Sensor NetworksMay 11, 2007

###### Location: The 1st IEEE International Workshop on Data Intensive Sensor Networks 2007, with MDM 2007, Mannheim Germany.

In this paper we present MicroPulse, a novel framework for adapting the waking window of a sensing device S based on the data workload incurred by a query Q. Assuming a typical tree-based aggregation scenario, the waking window is defined as the time interval t during which S enables its transceiver in order to collect the results from its children. Minimizing the length of t enables S to conserve energy that can be used to prolong the longevity of the network and hence the quality of results. Our method is established on profiling recent data acquisition activity and on identifying the bottlenecks using an in-network execution of the Critical Path Method. We show through trace-driven experimentation with a real dataset that MicroPulse can reduce the energy cost of the waking window by three orders of magnitude.
• #### MINT Views: Materialized In-Network Top-k Views in Sensor NetworksMay 11, 2007

###### Location: he 8th International Conference on Mobile Data Management (MDM '07), Mannheim Germany

In this paper we introduce MINT (Materialized In-Network Top-k) Views, a novel framework for optimizing the execution of continuous monitoring queries in sensor networks. A typical materialized view V maintains the complete results of a query Q in order to minimize the cost of future query executions. In a sensor network context, maintaining consistency between V and the underlying and distributed base relation R is very expensive in terms of communication. Thus, our approach focuses on a subset V ' (\subseteq V) that unveils only the k highest-ranked answers at the sink for some user defined parameter k. We additionally provide an elaborate description of energy-conscious algorithms for constructing, pruning and maintaining such recursively-defined in-network views. Our trace-driven experimentation with real datasets show that MINT offers significant energy reductions compared to other predominant data acquisition models.
• #### Top-K Algorithms: Concepts and ApplicationsMarch 20, 2007

###### Location: Nicosia Cyprus (EPL 671 - Computer Science: Research and Technology)

In this talk, I will present techniques to efficiently answer Top-K queries in a distributed environment. A Top-K query returns the K highest ranked answers to a user defined similarity function. At the same time it also minimizes some cost metric, such as the utilization of the communication medium, which is associated with the retrieval of the desired answer set. I will provide an overview of state-of-the-art algorithms that solve the Top-K problem in a centralized setting and show why these are not applicable to the distributed case. I will then focus on the Threshold Join Algorithm (TJA), which is a novel solution for executing Top-K queries in a distributed environment. I will also present results from our performance study with a real middleware testbed deployed over a network of 75 workstations.
• #### MicroHash: An External Memory Indexing Structure for Wireless Sensor DevicesApril 26, 2007

###### Location: Nicosia, Cyprus (EPL651 - Data Management for Mobile Computing Department of Computer Science (UCY))

Wireless Sensor Networks offer a non-intrusive and non-disruptive technology that enables users to monitor and understand the physical world at an extremely high fidelity. Research to this day has primarily focused on the trade-off between local computation and communication, in order to offset the expensive transfer of data over the fundamentally unreliable wireless link. On the contrary, we focus on the challenges of storing sensor readings locally at each node. This In-Situ storage paradigm offers a novel perspective for conserving energy, as we access the communication channel to answer on-demand queries rather than for percolating each and every event to a centralized database. Storing large quantities of data locally at each node has to be complemented by efficient index structures that will enable access to data when required. In this talk we present MicroHash, an external memory index structure which is tailored to the distinct characteristics of the most prevalent type of non-volatile memory used in sensor systems, namely flash memory. Our index structure exploits the asymmetric read/write and wear characteristics of flash memory in order to offer high performance indexing and searching capabilities in the presence of a low energy budget.
• #### ICGrid: Intensive Care Grid (Best Demo)December 1, 2006

###### Location: Sophia-Antipolis France (CoreGRID Industrial Conference)

Intensive Care Units (ICUs) at hospitals utilize cutting edge technology in order to acquire the physiological state of inpatients, which are in a critical (life-threatening) physiological state, at an extremely high fidelity. In particular, ICUs utilize a very large number of monitoring and sensing devices that are continuously attached on inpatients in order to uncover the physiological state of the inpatients. Such measurements can then be utilized for i) education, ii) early diagnosis and iii) for defining early warning systems that identify when a human life is jeopardy. A problem with the current setting is that individual ICUs are limited to the locally acquired measurements. As a result, the number of clinically 'interesting ' episodes available to doctors is also very limited. ICGrid (Intensive Care Grid) is a distributed platform that enables the seamless integration, correlation and retrieval of clinically interesting episodes across Intensive Care Units, which is currently under development by our group. Such a task requires huge processing and data storage capabilities, which are common attributes of Grid infrastructures. ICGrid is based on a hybrid architecture that combines i) a heterogeneous set of monitors that sense the inpatients and ii) Grid technology that enables the storage, processing and information sharing task between Intensive Care Units. Our demonstration aims at presenting the first part of the hybrid architecture of ICGrid (i.e. the acquisition of real signals from inpatients and their storage on the Grid). Our demonstration platform operates on a standalone laptop. In a real setting, this software is able to extract the physiological parameters from monitoring devices installed at ICUs.

• #### MicroHash: An External Memory Indexing Structure for Wireless Sensor DevicesMarch 31, 2006

###### Location: Department of Computer Science, University of Cyprus, Nicosia Cyprus

Wireless Sensor Networks offer a non-intrusive and non-disruptive technology that enables users to monitor and understand the physical world at an extremely high fidelity. Research to this day has primarily focused on the trade-off between local computation and communication, in order to offset the expensive transfer of data over the fundamentally unreliable wireless link. On the contrary, we focus on the challenges of storing sensor readings locally at each node. This In-Situ storage paradigm offers a novel perspective for conserving energy, as we access the communication channel to answer on-demand queries rather than for percolating each and every event to a centralized database. Storing large quantities of data locally at each node has to be complemented by efficient index structures that will enable access to data when required. In this talk we present MicroHash, an external memory index structure which is tailored to the distinct characteristics of the most prevalent type of non-volatile memory used in sensor systems, namely flash memory. Our index structure exploits the asymmetric read/write and wear characteristics of flash memory in order to offer high performance indexing and searching capabilities in the presence of a low energy budget.

• #### Distributed Top-K Query ProcessingNovember 16, 2005

###### Location: Department of Computer Science, University of Cyprus, Nicosia Cyprus

Modern Sensor and Peer-to-Peer data management systems have to cope with data that is generated automatically and continuously across distributed and potentially geographically diverse locations. Organizing data in centralized repositories is becoming increasingly expensive and in many occasions impractical. Additionally, users are usually only interested in finding the highest ranked answers to their queries rather that the complete range of answers. In this talk, I will present efficient techniques to answer Top-K queries in a distributed environment. A Top-K query returns the K highest ranked answers to a user defined similarity function. At the same time it also minimizes some cost metric which is associated with the retrieval of the desired answer set. My talk focuses on the Threshold Join Algorithm (TJA), which is a novel distributed Top-K query processing algorithm that combines local similarity scores available at each computing site. I will also present the LB-K and UBLB-K algorithms which utilize lower and upper bounds, when exact scores are not available. An extensive experimental evaluation with our distributed middleware testbed reveals that the proposed methods are orders of magnitudes more efficient than their competitors.
• #### On Constructing Internet-Scale P2P Information Retrieval SystemsSeptember 2004

###### Location: Second International Workshop on Databases, Information Systems, and Peer-to-Peer Computing (DBISP2P 2004), Toronto Canada

We initiate a study on the effect of the network topology on the performance of Peer-to-Peer (P2P) information retrieval systems. The emerging P2P model has become a very powerful and attractive paradigm for developing Internet-scale systems for sharing resources, including files, or documents. We show that the performance of Information Retrieval algorithms can be significantly improved through the use of fully distributed topologically aware overlay network construction techniques. Our empirical results, using the Peerware middleware infrastructure, show that the approach we propose is both efficient and practical.
• #### A Local Search Mechanism for Peer-to-Peer NetworksNovember 2002

###### Location: The 11th ACM CIKM International Conference on Information and Knowledge Management, McLean, VA USA.

One important problem in peer-to-peer (P2P) networks is searching and retrieving the correct information. However, existing searching mechanisms in pure peer-to-peer networks are inefficient due to the decentralized nature of such networks. We propose two mechanisms for information retrieval in pure peer-to-peer networks. The fir, the modified Breadth-First-Search (BFS) mechanism, is an extension of the current Gnuttela protocol, allows searching with keywords, and is designed to minimize the number of messages that are needed to search the network. The second, the Intelligent Search mechanism, uses the past behavior of the P2P network to further improve the scalability of the search procedure. In this algorithm, each peer autonomously decides which of its peers are most likely to answer a given query. The algorithm is entirely distributed, and therefore scales well with the size of the network. We implemented our mechanisms as middleware platforms. To show the advantages of our mechanisms we present experimental results using the middleware implementation.