GSOC 2017 project proposal

Ivan Josa, Universitat de Lleida (UdL) student, Spain

Fly Over Your Big Data (FlOY-BD)

My name is Ivan Josa and I am a student from Lleida (Spain). I am currently in my last
year of Master’s Degree in Computer Science specialized in Big Data Analysis.

Previously to these studies, I have made an Engineer Degree also in the University of
Lleida, more concretely in the Polytechnic Faculty.

While I was studying my degree, I have been working full-time as a Java programmer
in Indra.

In 2012 I achieved Java SE 7 Associate Programmer certification and in 2015 I
obtained the FCE from Cambridge English.

I have previous knowledge on Java EE environments and Linux systems management
as I have a Diploma of Higher Education specialized in Managements and
Administration of Computer Systems by UdL.

Other Projects and knowledge:

GSOC 2016 Successful Participant

CoDi P2P. I worked on this project as member of the Distributed Computing

research group in UdL.

Java EE projects using Spring, Struts, JPA, Servlets,...
Android Knowledge
Python Knowledge
Hadoop & Spark Knowledge
Drupal Management and Module Development
Magento Management
Liferay Management and Development

Project Description

The aim this project is to develop a system that, making use of the Liquid Galaxy
capacity to display information over an interactive map, displays data (in this case
meteorological data ) obtained from a big data analytics and mining process.

Firstly, the data will be gathered from public data APIs and, after being cleaned,
stored in a Nosql database to be processed later. This information will be related to
historical weather conditions, water and energy historical reservoirs, earthquakes
and other weather related information that could come to mind during the
development of the project.

Secondly, the data will be analysed under some kind of data analysis algorithm such
clustering with K-Means algorithm or Regression Models making use of a Spark
Uni-Node System running on the cloud and Python as programming language, more
concretely its pySpark library.

Finally, the conclusions obtained from this data analysis will be shown on a external
interface, such a website running in a server connected to the Liquid Galaxy, in a
dashboard format (for example: http://citydashboard.org/london/ ) for each of the
considered cities.

This dashboard (developed with the Django framework), will offer to the end-user,
the possibility to seamlessly display the chosen information into Liquid Galaxy.
Then the system will automatically send the corresponding KML files to Liquid Galaxy
in order to display this information in a descriptive and visual way, for example
percentual polygons, polylines, etc.

Another data source will be the General Transit Feed Specification (GTFS) that
defines a common format for geographic information, usually about public
transportation. GTFS implements a standard used in several data platforms, that
provide great information to the citizens daily life. The idea is to capture the
information of GTFS feeds and cross it with current weather data in order to make
best route predictions.

Finally, an additional functionality to the system is its integration with Google Assistant using actions
SDK and contextual capabilities to make voice requests, for instance display the current weather for a
city or calculate the best route from one city to another.

Some special questions

The data mining will be developed through Spark technology, more concretely

through its implementation in python, called pySpark. Spark is a higher layer of

Hadoop which implements its map-reduce algorithm focused on clustering.

In some specials cases, Hadoop could be suitable as well and therefore, the data

mining could be also done under Hadoop.

The data displayed is tied to the available data sources and their format
The data for each city depends on what they share

The data mining process will be focused on obtaining value from the data,

either from the data itself or from the correlation with other data source

Use Case

Display Actual Weather
Display Historical Weather within a time period
Display Historical Weather at a concrete time
Display lines/figures representing the data
Make a tour for different points of data:

“Show me the temperature change of Barcelona for the last 10 years.“
“And the last 5 years?”
“And the precipitation?”
Calculate the route from Lleida to Sevilla
Google Assistant Context Management:

Linked Technologies

RESTful calls for querying the data APIs
Python

for the data gathering

for the Django web development

for the data cleaning and mining steps via pySpark library

HTML/CSS for the frontend development
KML for the data visualization
Assistant (Actions SDK, Api.AI)
GTFS

Values for Liquid Galaxy community

BigData Integration
Could be used for teaching purposes
Possibility to adapt it to Smart Cities data visualisation in a future

Timeline

Previous to GSoC (before May 4th):

Research on Liquid Galaxy and Google Earth platforms
Research on Smart Data Platforms

Bonding period (from May 4th to May 30, 2017):

Prepare the development environment
Initial Data Gathering

First Working Period (from May 30, 2017 to June 30, 2017):

Complete Data Gathering and Cleaning

Data Mining

Second Working Period (from July 1,2017 to July - 24, 2017):

GTFS data search and processing
Web frontend and backend development

Third Working Period (from July 25,2017 to August - 20, 2017):

Link web to Liquid Galaxy and testing
Provide Google Assistant functionality to the system.
Documentation

Closing and Finalization (from August 21 to 29, 2017):

Finish documentation

GSOC 2017. Fly Over Your Big Data

Project Description

Some special questions

Use Case

Linked Technologies

Values for Liquid Galaxy community

Timeline

Previous to GSoC (before May 4th):

Bonding period (from May 4th to May 30, 2017):

First Working Period (from May 30, 2017 to June 30, 2017):

Second Working Period (from July 1,2017 to July - 24, 2017):

Third Working Period (from July 25,2017 to August - 20, 2017):

Closing and Finalization (from August 21 to 29, 2017):

Acerca de nosotros

Sede en el

Artificial Intelligence LAB

Buscar...

Colaboradores

Posts Populares

Nube de etiquetas

Archivos de Blog

Contribuidores