a Schema Extraction of Document-Oriented Database for Data Warehouse

  • A. Nurul Istiqamah Telkom University
  • Kemas Rahmat Saleh Wiharja Telkom University
Abstract views: 408 , pdf downloads: 309
Keywords: data warehouse, schema extraction, unstructured data

Abstract

The data warehouse is a very famous solution for analyzing business data from heterogeneous sources. Unfortunately, a data warehouse only can analyze structured data. Whereas, nowadays, thanks to the popularity of social media and the ease of creating data on the web, we are experiencing a flood of unstructured data. Therefore, we need an approach that can "structure" the unstructured data into structured data that can be processed by the data warehouse. To do this, we propose a schema extraction approach using Google Cloud Platform that will create a schema from unstructured data. Based on our experiment, our approach successfully produces a schema from unstructured data. To the best of our knowledge, we are the first in using Google Cloud Platform for extracting a schema. We also prove that our approach helps the database developer to understand the unstructured data better.

Downloads

Download data is not yet available.

References

A. A. Alqarni and E. Pardede, “Integration of data warehouse and unstructured business documents,†in Proceedings of the 2012 15th International Conference on Network-Based Information Systems, NBIS 2012, 2012, pp. 32–37. doi: 10.1109/NBiS.2012.59.

E. Gallinucci, M. Golfarelli, and S. Rizzi, “Schema Profiling of Document-Oriented Databases,†Information Systems, vol. 75, pp. 13–25, Jun. 2018, doi: 10.1016/j.is.2018.02.007.

S. Bouaziz, A. Nabli, and F. Gargouri, “Design a Data Warehouse Schema from Document-Oriented Database,†in Procedia Computer Science, 2019, vol. 159, pp. 221–230. doi: 10.1016/j.procs.2019.09.177.

M. I. Halim, “Penerapan Document Oriented Database (NOSQL) Dalam Pembuatan E-LIBRARY Universitas Pendidikan Indonesia Menggunakan Mongodb Dan PHP,†2016.

S. Tiwari, Professional NoSQL. Indianapolis: John Wiley & Sons, Inc., 2011.

A. Stevenson, Oxford Dictionary of English. USA: Oxford University Press, 2010.

“Column-oriented DBMS,†Wikipedia, 2021. https://en.wikipedia.org/wiki/Column-oriented_DBMS (accessed Aug. 07, 2021).

K. Ibrahim Mohammed, “Data Warehouse Design and Implementation Based on Quality Requirements,†International Journal of Advances in Engineering & Technology, vol. 7, pp. 642–651, 2014, [Online]. Available: https://www.researchgate.net/publication/330666318

S. H. A. Aloush, “The Role of Data Warehouse in Decreasing the Time,†Australian Journal of Basic and Applied Sciences, vol. 9, no. 5, pp. 216–219, 2015.

P. S. Kumar, M. Antigopal, and S. Vetrivel, “Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics,†International Journal of Innovative Research in Computer and Communication Engineering, vol. 02, no. 12, pp. 7456–7462, Jan. 2015, doi: 10.15680/ijircce.2014.0212030.

H. Saradava, A. Patel, and R. Aluvalu, “A Survey on ETL Strategy for Unstructured Data in Data Warehouse using Big Data Analytics,†2016.

A. A. Frozza, R. dos S. Mello, and F. de S. da Costa, “An Approach for Schema Extraction of JSON and Extended JSON Document Collections,†Jul. 2018. doi: 10.1109/IRI.2018.00060.

H. Zhu, H. Yu, G. Fan, and H. Sun, “Mini-XML: An efficient mapping approach between XML and relational database,†May 2017. doi: 10.1109/ICIS.2017.7960109.

“Apa itu GCP?,†Cloud Ace Indonesia, 2021. https://id.cloud-ace.com/id/what-is-gcp-id/ (accessed Aug. 07, 2021).

“XML Introduction,†W3Schools, 2021. https://www.w3schools.com/xml/xml_whatis.asp (accessed Aug. 10, 2021).

F. Manola and E. Miller, “RDF Primer,†W3C, 2004. https://www.w3.org/TR/rdf-primer/ (accessed Aug. 10, 2021).

“JSON,†Json.org. https://www.json.org/json-en.html (accessed Sep. 10, 2021).

“DOIs: What they are and how to cite them: Overview,†Montana State University, 2021. https://guides.lib.montana.edu/doi/ (accessed Aug. 10, 2021).

Published
2021-12-31
How to Cite
Istiqamah, A. N., & Wiharja, K. R. S. (2021). a Schema Extraction of Document-Oriented Database for Data Warehouse. International Journal on Information and Communication Technology (IJoICT), 7(2), 36-47. https://doi.org/10.21108/ijoict.v7i2.584
Section
Database Systems