The International Conference on Data and Information Science (ICoDIS) 2017

The International Conference on Data and Information Science (ICoDIS) 2017


The advancement of today’s computing technology has driven people to generate a vast amount of data with the size and variety that have never been experienced in the history of computing. The need to process and analyze big data attracts researchers interest to propose solutions. The International Conference on Data and Information Science (ICoDIS) is organized to gather researchers to disseminate their relevant work on data science, computational linguistics, and information science. This inaugural conference is organized by School of Computing, Telkom University, and supported by Indonesia Data Scientist Society (IDSS) and Indonesia Association of Computational Linguistics (INACL).

ICoDIS 2017 Theme : Bridging Data Science and Internet of Things to Enhance Technology Competitiveness

Date : December 5-6, 2017
Venue : Telkom University, Bandung, Indonesia
Keynote speakers:

  1. Prof. Toru Ishida, Kyoto University
  2. Prof. Dr. Naomie Salim, Universiti Teknologi Malaysia
Prof. Toru Ishida Prof. Toru Ishida
Department of Social Informatics
Kyoto University

Keynote Talk : The Language Grid for Supporting Intercultural Collaboration
In the beginning of the new millennium, we proposed the concept of intercultural collaboration where participants with different cultures and languages work together towards shared goals. Because intercultural collaboration is a new area with scarce data, it was necessary to execute parallel experiments in both in real fields as well as in research laboratories. In 2002, we conducted a one-year experiment with Japanese, Chinese, Korean and Malaysian colleagues and students to develop open-source software using machine translation. From this experiment, we understood the necessity of language infrastructure on the Internet to create customized multilingual environments for various situations. In 2006, we launched the Language Grid project to realize a federated operation of servers for language services. So far, four servers have been set up in Asia, and more than 200 language services have been registered from 22 countries. Using the Language Grid, we have been working with an international NGO for four years to support communications between rice harvesting experts in Japan and farmers and their children in Vietnam. During these experiences, we gradually understood the nature of intercultural collaboration and we faced different types of difficulties. Problems are “wicked” and not easily defined because of their nested and open networked structure.

Short Biography
Toru Ishida has been a professor of Kyoto University since 1993. He has been a fellow of IEEE and a member of the Science Council of Japan. He is a co-founder of the Department of Social Informatics, Kyoto University and the Kyoto University Design School. His research interest lies with Autonomous Agents and Multi-Agent Systems and modeling collaboration within human societies. He contributed to create PRIMA/ICMAS/AAMAS conferences: he was a chair of the first PRIMA, a program co-chair of the second ICMAS, and the general co-chair of the first AAMAS. His projects include Community Computing, Digital City Kyoto, Intercultural Collaboration Experiments, and the Language Grid.

Prof. Dr. Naomie Salim Prof. Dr. Naomie Salim
Faculty of Computing
Universiti Teknologi Malaysia
Research areas:
information retrieval, cheminformatics

Keynote Talk : Methods for mining chemical and document databases to support computer-aided drug design and development process
The vast amount of data in chemical databases and document databases related to drugs offers a lot of opportunities to aid the process of drug design and development. For instance, searching structurally similar molecules to a promising lead compound can help us discover a better lead compound. The bioactivity of unknown compounds can also be predicted based on their structural similarity to known drug compounds. Similarity measures used for these search can also be used to build focused libraries against specific targets. Traditionally, Vector Space Model utilizing bit string representations of compound and the Tanimoto coefficient has been used to rank molecules based on their structural similarity to query compounds. However, we have proved that the Tanimoto coefficient is not necessarily the best coefficient and fusion of certain coefficients can result in a higher number of similarly active compounds among the top ranked compounds. In this talk, a number of approaches to enhance molecular search will be discussed. The approaches include modification of the Simple Matching Similarity Measure with bit-string re-weighting, probabilistic-based compound searching, Bayesian network-based similarity measures, fragment selection, fragment weightings, relevance feedbacks, fuzzy coefficients, quantum-based similarity searching, Multilevel Neighborhoods of Atoms molecular structure descriptors, shape-based similarity measures and deep learning will be discussed. A new method for the selection of representative subsets of compounds from chemical databases has also been proposed based on an improved chemical space representation and alpha shape theory.

On the other hand, screening of compounds in a virtual library can also be made more efficient if they are first clustered before selecting representatives from each cluster. The talk will present results from a number of clustering techniques for clustering chemical compounds databases and how consensus clustering is used to improve the clustering results.

Finally, instead of relying on manual inspection, the automatic detection of adverse drug effects from medical reports can help regulatory authorities in rapid information screening and extraction to accelerate the generation of medical decision support and safety alerts. In this talk, we will share two extraction methods for such purpose. The first method is based on mining rules augmented with lexical information, i.e. cue words for mining the syntactic dependency paths connecting the drugs and medical conditions entities, and then extracting the corresponding relation. The second method is a case based reasoning model based on automatically learned linguistic patterns from the dependency paths link between the drugs and the medical condition entities to identify the relations. A classification model based on automatically learned and manually curated linguistic patterns to detect sentences holding drug-adverse effect information without relying on a named entity recognition module to identify the entities in the input sentences will also be described.

Short Biography
Professor Salim’s main research goal is to design of new algorithms to improve the effectiveness of searching and mining new knowledge from various kinds of datasets, including unstructured, semi-structured and structured databases. The current focus of her research is on chemical databases and text databases to support the process of computer-aided drug design, text summarisation, plagiarism detection, automatic information extraction, sentiment analysis and recommendation systems. The output of the research has been incorporated into a number of software such as UTMChem Workbench and NADI Natural Products Database System to support drug design and drug optimisation process, UTMCLPD Cross Language Plagiarism Detection System to summarise documents and check for plagiarism and Oricheck for cross-language idea similarity checking and plagiarism detection. The systems can be used by pharmaceutical scientists to search, retrieve, optimize and discover new drug compounds from chemical and natural product databases and help academic institutions preserve academic integrity by providing support to detect intelligent, idea plagiarism across different languages.

Professor Salim has been involved in 53 research projects out of which she heads 21 of the projects. She has authored over 170 journal articles. 160 of her articles are indexed under Scopus and 84 are indexed under Web of Science. Her Google Scholar h-Index is 21, and she has 1931 citations to date. Her Scopus H-index is 14 with 733 Scopus-indexed citations.

Among the research and innovation awards received by Professor Salim are the PECIPTA 2011 Gold Medal award for her UTMCLP cross-language semantic plagiarism detection system, the I-inova 2010 Gold Medal award for her Islamic Ontology-based Quran search engine, BioInnovation 2011 Bronze Award for UTMChem Workbench Molecular Database System, iPhex Gold Medal Award for innovation in teaching and learning, UTM 2011 Best Research Award, UTM 2014 Best Research Award and the INATEX Distinction Award (1998). She has also won the UTM Citra Karisma Indexed Journal Paper Award for 2009, 2011, 2012, 2013 and 2014.

She is a fellow of Japan Society for the Promotion of Science (JSPS), the head of Soft Computing Research Group UTM, Associate Member of UTM Big Data Centre and a UTM Senate Member.


The conference welcomes all topics that are relevant to data science, computational linguistics, and information sciences. The listed topics of interest are (but not limited to) as follows:

Data Science

  • Data clustering and classifications
  • Statistical model in data science
  • Artificial intelligence and machine learning in data science
  • Data visualization
  • Data mining
  • Data intelligence
  • Business intelligence and data warehousing
  • Cloud computing for Big Data
  • Data processing and analytics in IoT
  • Tools and applications in data science
  • Vision and future directions of data science

Computational linguistics

  • Text mining
  • Text Classification
  • Language resources
  • Information retrieval
  • Information extraction
  • Machine translation
  • Sentiment analysis
  • Semantics
  • Summarization
  • Syntactic parser
  • Question answering
  • Speech processing
  • Mathematical linguistics
  • NLP applications

Information Science

  • Cryptography and steganography
  • Digital Forensic
  • Social media and social network
  • Crowdsourcing
  • Artificial intelligence
  • Computational intelligence
  • Collective intelligence
  • Graph theory and computation
  • Network science
  • Modeling and simulation
  • Parallel and distributed computing
  • High performance computing

Important Dates

Full paper submission : July 12, 2017
Notification of acceptance : September 12, 2017
Camera-ready paper submission : October 1, 2017
Conference Date : December 5-6, 2017

Supported By