Thursday, 18 April 2019
The role of big data, analytics, technology and impact on society
Progress Mtshali, Sibusiso Moyo
Durban University of Technology
Presentation overview
The delivery of social services in the Republic of South Africa is a challenge for local municipalities, provincial governments, and the national government. In 2018, there were a number of protests by the residents of various municipalities due to the residents’ view that the social service deliveries were insufficient or non-existent. These protests have, in some cases, led to the destruction of property and injuries to people. If one assumes that the services deliveries are indeed insufficient, how come the local, provincial, and national governments are not able to deliver adequate social services? Are they not using technology to capture the consumption of social services, monitor the services delivery, perform analytical determination of how services are delivered, and make proper adjustments to ensure the adequacy of the social services? Interestingly, this problem is not unique to South Africa. The Tanzanian Government conducted the Afrobarometer survey over a period of three years and reported the results in 2006 [1]. That led to better social services deliveries. In 2016, Uganda’s Ministry of Finance launched the Social Service Delivery Equity Atlas which “helps the government and its partners track decentralized allocations from centre across three priority sectors; education, health, and water” [2]. Rapid changes within the higher education landscape both in South Africa and Africa at large also call for more efficient and accurate data acquisition and analytics in understanding the sector and intervening at right and appropriate intervals.
Technologies such as cloud computing have allowed us to store and process massive amounts of data. Sensors that number in the billions (they already surpass the world population) allow us to store all kinds of data and information without any limits. Modern mathematical models allow us to achieve accuracies never imagined before because of the availability of high performance computers.
In this paper, the authors propose a system that utilizes modern technologies such as Big Data, Internet of Things (IoT), Cloud Computing, Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning in order to collect data generated via the current social services delivery mechanisms and extract analytical information that could then be used by South African local, provincial, and the national government to improve social services deliveries. It is hoped that, this would have a very positive impact to the South Africa societies.
References
- http://www.eldis.org/document/A23774
- https://www.newvision.co.ug/new_vision/news/1438627/uganda-social-service-delivery-equity-atlas-launched
Presenters
Dr Progress Qhaqhi Thabani Mtshali studied and worked in the United States and after thirty-four years decided to return to South Africa in order to contribute to higher-education. His education includes a B.S. ChE (Bacheclor of Chemical Engineering) from the State University of New York at Buffalo, an A.S. (Computer Technology) from Midlands Technical College in Columbia, South Carolina in United States, a M.S. and Ph.D. (Information Systems) from Nova South-Eastern University in Fort Lauderdale, Florida in the United States. He has worked for IT companies such as Computer Sciences Corporation (CSC), Duck Creek Technologies, and Accenture Software as a software developer, principal software architect, and technical researcher. He has also worked in academia as a senior lecturer, and also worked with instructional designers that developed and deployed course’s content on LMS systems. His interests include IoT, computer networking, data analytics, database management, cryptography, compiler theory, computer programming, computational number theory and machine learning.
Professor Sibusiso Moyo holds a PhD in Mathematics from the University of Natal, Durban and a Masters (with distinction) in Tertiary Education Management from the LH Martin Institute, University of Melbourne Australia. She has published widely in the Mathematical Sciences with a focus on differential equations and optimization problems in international peer reviewed journals. She is currently an Associate Editor for the Journal of Higher Education Policy and Management published by Taylor and Francis and has served as Guest Editor of the Mathematical Methods in the Applied Sciences Journal (published by John Wiley & Sons) and Journal of Engineering Mathematics (published by Springer). She was appointed as DVC (Research, Innovation and Engagement) at the Durban University of Technology from 1st July 2017 responsible for a number of Directorates including the Information Technology Support Services. Her current interests include strategic and policy planning to support research, innovation, entrepreneurship and engagement. Her research interests include differential equations and their physical applications, optimization problems, symmetry analysis and group theoretic methods and issues in higher education including leadership.
Evaluation of clustering techniques for generating household energy consumption patterns in a developing country
Wiebke Toussaint
University of Cape Town
Presentation overview
This work compares and evaluates clustering techniques for generating representative daily load profiles that are characteristic of residential energy consumers in South Africa. The input data captures two decades of metered household consumption, covering 14 945 household years and 3 295 848 daily load patterns of a population with high variability across temporal, geographic, social and economic dimensions. Different algorithms, normalisation and pre-binning techniques are evaluated to determine the best clustering structure. The study shows that normalisation is essential for producing good clusters. Specifically, unit norm produces more usable and more expressive clusters than the zero-one scaler, which is the most common method of normalisation used in the domain. While pre-binning improves clustering results for the dataset, the choice of pre-binning method does not significantly impact the quality of clusters produced. Data representation and especially the inclusion or removal of zero-valued profiles is an important consideration in relation to the pre-binning approach selected. Like several previous studies, the k-means algorithm produces the best results. Introducing a qualitative evaluation framework facilitated the evaluation process and helped identify a top clustering structure that is significantly more useable than those that would have been selected based on quantitative metrics alone. The approach demonstrates how explicitly defined qualitative evaluation measures can aid in selecting a clustering structure that is more likely to have real world application. To our knowledge this is the first work that uses cluster analysis to generate customer archetypes from representative daily load profiles in a highly variable, developing country context.
Presenter
Wiebke Toussaint is an applied AI student at the Centre for Artificial Intelligence Research and a data scientist at UCT’s Energy Research Centre. She holds a Bachelor degree in Mechanical Engineering and has over 7 years experience in developing data science, engineering and applied business analysis solutions for clients in mining, transport, energy and e-commerce sectors. Wiebke has built systems and tools for decision support, data visualisation, information discovery and customer insights, with demonstrated impact on operational performance improvements and efficient resource use. As co-founder of Engineers Without Borders South Africa, Wiebke has spent the past decade advocating for transformation, diversity and the adoption of human-centred design principles in the engineering sector. Today she continues to serve on the board of Engineers Without Borders South Africa and the executive committee of Engineers Without Borders International. A strong advocate for data stewardship and digital transformation, Wiebke works to help organisations understand how strategy, people and technology can interoperate to achieve a successful transition into an intelligence-driven future.
Remote sensing data, machine learning and citizen science for development in sub-Saharan Africa (SSA)
Tawanda Chingozha, Dieter von Fintel
Stellenbosch University
Presentation overview
Most of sub-Saharan Africa’s (SSA) population live in rural areas and rely on agriculture as a source of livelihood. Against that background, poverty reduction and inclusive growth efforts of the World Bank and other development partners have largely focused on the agriculture sector. Access to land, security of tenure and reform are some of the pertinent issues at the core of raising agriculture incomes. Yet, Devarajan (2013) refers to a “statistical tragedy” in Africa wherein governments do not have the technical and financial capacity, as well as the political will to provide objective and reliable data for use by researchers and policy makers. We apply a Support Vector Machine (SVM) Learning algorithm on Landsat imagery to generate agricultural data that we use to answer two development questions. Firstly, we show the relationship between land titles, access to markets and crop cultivation using the case of Southern Rhodesia. Secondly, agrarian reform has often been seen as an appropriate strategy to restore land rights or ensure bankability of land – yet incomplete property rights many times cause more problems in the post-reform period. We take advantage of Zimbabwe’s year 2000 land reform program to show the effects of incomplete property rights post-reform on crop cultivation and welfare. In the urban setting, the Sustainable Development Goals (SDGs) emphasise decent work and sustainable cities and communities among other issues. It may be necessary of view these issues in cognisance of the high levels of housing and business informality in SSA and other developing regions. We employ citizen science techniques to develop a novel informality dataset from Very High Resolution (VHR) satellite images.
The Data2Dome initiative at the Iziko Planetarium and the IDIA visualisation lab
Lucia Marchetti, Thomas Jarrett
Univerisity of Cape Town and the University of the Western Cape
Presentation overview
With the advent of LSST, SKA and other petabyte-scale facilities, data storage and visualisation represent renewed challenges facing the astronomical community. In this context, having access to cutting-edge facilities capable of handling and visualising large amounts of data through innovative projection systems is of crucial importance for the advancement of Big Data Science. Inspired by these goals, a consortium of South African Universities, led by the University of Cape Town (UCT) and including the University of the Western Cape (UWC) and the Cape Peninsula University of Technology (CPUT), together with the Iziko Museum and the South African Department of Science and Technology have invested resources in upgrading the existing analogue Iziko Planetarium to transform it into the most advanced digital Planetarium of the entire African Continent (https://www.iziko.org.za/museums/planetarium). At the same time, the Inter-University Institute for Data Intensive Astronomy (IDIA) with some of the above partners has created an advanced Visualisation Lab (http://www.acgc.uct.ac.za/~jarrett/VisLab/) hosted at the University of Cape Town, where scientists, together with a team of software developers, can experiments on data visualisation techniques.
- Technical overview of the Iziko Planetarium and of the IDIA Visualisation Lab
As scientific data sets become larger and more complex, it is necessary to migrate to new technologies to facilitate scientific analysis and exploration. The new Iziko Planetarium Digital Dome was designed to have two computer clusters, one for public shows and production, and the other for scientific data visualisation research. The digital full dome theatre has the following key features: six Sony 4K Laser projectors (creating a total of ~8K pixel projection), two computer clusters, 5.1 Surround Sound, optimal reflecting dome, raised floor and a new control center. The projectors can be driven by either cluster. Each cluster has 12 client computers and one master computer, as well as a sound computer. Each computer has a NVIDIA P6000 GPU, which provides more than enough power to render large data sets on the fly as well as run numerical simulations. The primary software that is used to ingest data and drive the projectors is Sky-Skan’s DigitalSky Dark Matter (DS-DM), which is capable of traditional planetarium functionality as well as modern data exploration. Researchers use their own cluster, allowing them to optimize setups and save work areas without disrupting the production/show computers. The IDIA Visualisation Lab hosts another set of cutting edge visualisation tools: the IDIA Wide Area Large Interactive Explorer (or WALIE), an 8K resolution visualisation wall for data exploration and scientific discovery, which allows to carry out scientific analysis on features of interest by exploring multiple analytics (graphs, statistics, ancillary data etc.) at the same time.
For a more immersive individual experience, the IDIA Visualisation Lab hosts the Cobra, a 4K curved visual display system. The Cobra Panorama is very well suited to investigate and explore hyperspectral images and catalogues, using existing software tools, as well as custom developed software that will be used across the IDIA platforms. Finally the Lab hosts equipment and floor space to develop VR immersive data set visualisations in which the researcher can explore the data from a very unique perspective. Research groups working on Big Data visualisation at any partner can use the Iziko Planetarium and/or the IDIA Visualisation Lab to develop their studies.
- The Data2Dome initiative (in a nutshell)
The Iziko Planetarium and Digital Dome has officially opened in May 2017 and since then a number of activities have been developed. New digital-format shows have been uploaded to the systems and regular schools and general public visits have begun. Along with the more traditional planetarium shows (enriched by novel African contents), research activities have also begun within the much broader international effort known as the Data2Dome initiative. The Data2Dome project (http://www.data2dome.org), led by Dr. Mark Subbarao (Adler Planetarium) and the International Planetarium Society, aims to streamline the process of ingesting astronomical data into the Dome environment, increasing the potential for scientific communication and storytelling in the planetarium as well as to prepare planetaria for the big data streams that will come from next generation telescopes and numerical simulations. In this context, at the Iziko Planetarium several multidisciplinary scientific data sets have been ingested into the DS-DM, creating new opportunity for 3D data exploration in an immersive 360-degree context.
Presenters
Dr Lucia Marchetti holds a shared SARChI NRF/SKA post-doctoral Research Fellowship at the University of Cape Town and at the University of the Western Cape. She obtained a PhD in Astronomy in 2012 from the University of
Padova and before moving to South Africa she was an STFC post-doctoral research associate at the Open University in the Milton Keynes (UK). She is specialized in extragalactic astronomy and observational cosmology. Her research focuses on statistical studies of galaxy formation and evolution also via strong gravitational lensing analysis. She is now co-leading the Data2Dome initiative and Big Data Visualisation at the Iziko Planetarium and at the University of Cape Town. She is an active researcher as well as a passionate and expert science communicator, a fellow of the UK Royal Astronomical Society and a member of the International Astronomical Union.
Prof Thomas H. Jarrett is the SARChI Chair in Astrophysics and Space Science hosted in the Department of Astronomy at the University of Cape Town. He previously worked at IPAC (CalTech) where he played a major role in the
preparation - and subsequent analysis - of the 'extra-galactic' Two Micron All Sky Survey catalogue (2MASX), functioned as the Project Scientist for the Infrared Science Archive (IRSA), and most recently was the principal lead of the Nearby Galaxy Group of the Wide-Fields Infrared Space Explorer (WISE). He also served on the WISE Science Team which defined and managed the mission. While he is world-renowned for his expertise in the near and mid-infrared, he also has hands-on experience in all other bands of the electromagnetic spectrum, from radio to X-ray, both Earth- and space-based. He is a passionate researcher whose interest and expertise lies in the extragalactic large-scale structure - and
visualization thereof - of the nearby Universe, the Zone of Avoidance, interacting galaxies, star formation processes and galaxy evolution. He is now the lead scientist of the Visualisation Lab at the UCT Department of Astronomy as well as of the Visualisation Research group at the Iziko Planetarium. More information can be found at: http://www.acgc.uct.ac.za/~jarrett/
A deep reinforcement learning approach to autonomous driving among human drivers
Zenzo Ncube, Nontokozo Mpofu, Armstrong Kadyamatimba
Sol Plaatje University
Presentation overview
Traffic congestion is a world-wide problem that still pose a terrible nightmare for most road users. Apart from slowing down the global economy, traffic congestion has psychological and physical health implementations on individuals. In South Africa, cities like Cape Town and Johannesburg experience the most traffic jams annually. This study proposes autonomous vehicles (that receive Waze dataset as part of their input) as a potential solution to current ongoing traffic congestion in Johannesburg roads. When placed on the road, the driverless car i) uses past and real-time on-sight road user comments on the road surroundings (like Waze dataset for example) to predict an upcoming traffic jam before it occurs and sends this information to platforms like My i-Traffic for drivers to take necessary measures while there is still time and ii) when driverless vehicle is placed inside a traffic jam, it improves the flow of traffic by minimizing the number of times human drivers have to apply brakes on their cars and thus helps reduce fuel consumption. We train our self-driving cars under various weather conditions using a novel deep learning reinforcement algorithm. Evaluating our agent’s performance against several standard agents, our results indicate that our agent is capable of successfully controlling a car to navigate around a simulated traffic congested environment as well as predicting the possibility of a traffic congestion.
Presenters
Dr Zenzo Polite Ncube is employed by the University of Mpumalanga as a Senior Lecturer in the School of Computing and Mathematical Sciences where he teaches Computer Programming and other ICT related courses. He graduated with a degree in Mathematics and Computer Science from Cuba and completed his Master’s Degree in Computer Science at the National University for Science and Technology in Zimbabwe. He has taught Computer Science and Information Systems at a number of universities in South Africa and Zimbabwe and has supervised a number of postgraduate students. He has published several research articles in a number of journals as well as in local and international conference.
His research interests include ICT for development, Security in wireless networks, GPU related research, Speech processing especially the use of AI techniques such as neural networks in automatic speech recognition, Graphical processing Units (GPUs) for speech process in under-resourced languages, Image processing, Machine learning, Big Data/Data Science. He is a reviewer of a number of journals as well as a member of a number of professional bodies. He is also involved in a number of community projects in and around the country.
Ms Nontokozo Mpofu is a Data Science educator, a final year Phd student and a Data Science enthusiast. Occasionally, she participates in open source data science competitions. Competitions that she has been active on are Hackathon and Kaggle competitions. She graduated with a degree in Mathematics and Computer Science from Cuba and completed his Master’s Degree in Computer Science at the National University for Science and Technology in Zimbabwe. She has taught Computer Science and Information Systems at a number of universities in South Africa and Zimbabwe Her research interests include Data Analysis and Machine Learning, Image Processing and Computer Vision, E-Learning, Speech processing especially the use of AI techniques such as neural networks in automatic speech recognition, Graphical processing Units (GPUs) for speech process in under-resourced languages, Image processing, SuperComputing and Big Data/Data Science.
Professor Armstrong Kadyamatimba is a Professor and Dean in the School of Management Sciences at the University of Venda for Science and Technology, in the Limpopo Province, South Africa. He has a Doctorate in Computer Science from the United Kingdom and other qualifications at Masters and undergraduate level from overseas.
He has taught Computer Science and Information Systems at a number of universities in South Africa and Zimbabwe and has supervised a number of postgraduate students. He has published several research articles in a number of journals as well as in local and international conference. He has also held several directorship positions in both South Africa and Zimbabwe.
His research interests include ICT for development, Security in wireless networks, GPU related research, Speech processing especially the use of AI techniques such as neural networks in automatic speech recognition, Graphical processing Units (GPUs) for speech process in under-resourced languages, Image processing, Machine learning, Big Data/Data Science. He is a reviewer of a number of journals as well as a member of a number of professional bodies. He is also involved in a number of community projects in and around the country.
Galaxy for accessible, reproducible bioinformatics on the Ilifu research cloud
Peter van Heusden, Alan Christoffels
SANBI, University of the Western Cape
Presentation overview
Next Generation Sequencing has brought genomic analysis within the range of a great number of laboratories, while increasing the demand for bioinformatic analysis. These typically comprise workflows composed out of chains of analyses with data flowing between workflow steps. The Galaxy platform (https://www.galaxyproject.org ) provides an accessible interface for reproducible bioinformatics analysis.
The Ilifu Research Cloud, part of the Ilifu project ( http://www.ilifu.ac.za/ ) provides a powerful data analysis environment for partners in the Ilifu consortium. Galaxy on Ilifu aims to bring bioinformatics tools and workflows within the reach of researchers with limited exposure to command line interfaces and High Performance Computing environments. At the same time it poses challenges for a research institute with limited human resources. Addressing these challenges involves addressing the way (e)research work is
organised and incentivised.
Presenters
Peter van Heusden is a researcher at the South African National Bioinformatics Institute, where he has been developing research computing infrastructure since the 1990s. His research focus includes scientific workflow languages and workflow management systems, research software and systems engineering and biological sequence analysis with a specific focus on the pathogen Mycobacterium tuberculosis.
Registree: using blockchain technology to connect universities, students and employersCarolina Ödman
Christopher Maree
University of Cape Town
Presentation overview
Registree is a decentralized and cryptographically secured student database and platform that connects universities, students, and employers. The project is driven by students and staff at the University of Cape Town.
The Registree platform provides a number of valuable data services for its stakeholders. First, Registree offers an advanced analytics toolkit for universities which allows them to better understand student and employer needs. It also provides universities with a costless tool for the immediate verification of certificates and degrees. Second, Registree allows employers to find suitable applicants for open positions more effectively. And third, Students have full access control to their sensitive personal data thanks to Registree’s use of the latest blockchain technology. This provides students with self-sovereign ownership over their information in a way that until now was impossible to achieve. Students benefit from an improved access to job opportunities and advanced analytics as well.
In this talk, one of the co-founders will demonstrate how Registree works and highlights its impact on the higher education sector in South Africa.
Presenter
Chris Maree is currently doing his Masters in Financial Technology at UCT after completing his in honors in information engineering at WITS University. Throughout his studies, he has been involved with a number of startups. He founded AtomicWeb development in 2012, a web development house that grew to 3 employees and ran 15 production cooperate websites. Chris then became involved in blockchain in 2013, mining Bitcoin. He later picked up development on the Ethereum Blockchain in 2016 and have been focused on developing opensource decentralized applications ever since. To this end, Chris has competed in four international hackathons in 2018, including two in Europe. For his honors project in Electrical Engineering Chris designed and implemented MeterBlock: a decentralized energy exchange market enabling peer to peer trading of energy within open grids. This project placed first in the nation SAIEE student competition. He is currently involved in a number of opensource projects within the Ethereum ecosystem aiming to build out user onboarding and to drive adoption. Chris is one of the core developers at Registree.rocks a decentralized and cryptographically secured student database and platform that connects universities, students, and employers.
Starting up an eResearch office: supporting researchers with varying degrees of experience
Jonathan Padavatan, Taariq Surtee
University of Witwatersrand
Presentation overview
eResearch as a service is a well-established international practice. Key services include data management, computing, and supporting funding proposals with data management plans. Services and eResearch staff need to accommodate various types of data across many disciplines. Coupled with this, researcher experience in terms of Big Data and High Performance Computing vary widely. As a result, eResearch staff face many challenges in providing service and managing expectations. Expectations include providing stable remote computing services; seamless storage and IT spend recommendations to researchers, while at times handholding researchers who new to remote High Performance IT-based research.
Addressing researcher needs is subject to eResearch staff having a wide general understanding of research in general, without necessarily being an expert in a research field. Specialised skills in the high performance computing environment are necessary to provide advanced non-standard IT support for the researchers from the preparation of proposals, to producing results for their research and storing their data thereafter.
This non-technical case study presents the University of the Witwatersrand’s (Wits) approach to managing stakeholders (internal and external), defining services, considering IT spend and becoming part of the broader eResearch community. Wits has undertaken to set up an eResearch Office at the beginning of 2018. The discussion presents our experience to date.
The Wits eResearch Office, as a recent addition to Wits support services, is currently hosting the Wits Core HPC Cluster. The academic groups built the Cluster and from 2018, support of the cluster is in transition to the eResearch Office. The lead partners are Electrical & Information Engineering, Physics and Wits Bioinformatics, with support from Computer Science and the Mathematical Sciences.
Apart from the primary user community, we are actively fielding requests for large data and HPC jobs in other research areas. A new way for thinking was needed to offer eResearch Software as a Service (SaaS) and encourage researchers to utilise eResearch Services. The eResearch strategy was adopted to expand services, create benefits from economies of scale and encourage new data-driven research.
Current challenges regarding establishing the SaaS include cooling and power capacity requirements of the host data center and play a significant role in the funding model of the eResearch Office. Further, offering SaaS involves understanding cluster management software for efficient sharing of computational resources, minimizing wasted CPU cycles and ensuring groups have access to dedicated resources. Therefore, managing the clusters is a non-trivial exercise with a non-standard array of software tools available. Hence, keeping abreast of best practice in supporting research is crucial to maintaining reliability and resilience.
The eResearch Office is mandated to extend Big Data and HPC service to all faculties. The strategy was built around the data life cycle. Thereafter, to maintain efficiencies and prevent duplicating services, a federated approach was taken, where these services were not absorbed under the eResearch Office. In this way, the service units have their independence to look after their current users while contributing towards eResearch. In order to manage this federation, Wits constituted an eResearch Operations Committee (eROC) to manage resources and training. This paper concludes with the functions of eROC and discusses lessons learnt and proposes the next steps the eResearch Office will take.
Presenters
Taariq Surtee is a Wits Graduate, with majors in Mathematics and Computer Science. He has an Honours degree in Applied Mathematics and an MBA. His career started in data mining and applying machine learning at a major financial company. This followed with working on large data in the government sector, after which he joined Wits University, where he has worked in various roles, including Business Intelligence and IT management. His current position is Head of Wits eResearch Unit. He has extensive knowledge in working with data, learning algorithms, IT and governance.
Jonathan Padavatan is the Senior Research Computing Specialist at the eResearch office, Wits University. He is also a Wits graduate, with an Honours degree in Physics. He previously worked as a software engineer at iThemba Labs for Accelerator-based Sciences, and in science communication and outreach at the Wits Planetarium. His research interests include fault tolerance and monitoring, and file management systems in cluster computing.