Data science, a relatively new emerging multi-disciplinary area concerned with the collection, preparation, analysis, visu-alization, management and preservation of large collections of information aiming at generating value from the data itself. The term data science (originally used interchangeably with datalogy) was initially used as a substitute for computer sceince by Peter Naur in 1960. In industry data science is often considered as the New Kid on the Block [5] even though some of the data-intensive science such as bioinformatics, the high-energy Physics have been using some sort of data science more than a decade. The Committee on Data for Science and Technology (CODATA) defined data science as the methods and technologies used to conduct scientific research through management and utilization of scientific data. Scientific data are more structured, becomes easier to extract knowledge, make analysis easier, more precise or more accurate. There are many research area such as medical, astrophysics, etc totally based on data science.
There are two components in data science. First one is the study of the nature of the data and scientific issues related to data itself. The second one is the possible usefulness and real world application. Some of the view and application of data science are [6]:
• Data science is the science of studying scientific data: Some of the discipline that use data technology to deal with the data science are bioinformatics, neuroinformatics, social informatics etc.
• Data science is the science of studying business data: Extracting knowledge from data to solve many business related problem is one of the prime objective of data science. Recent trend in information technology allows end users to produce large scale data on the companies like Amazone, eBAY, Google or Facebook. This data can be used to predict new business strategy. For example Amazone uses collaborative filtering to generate high quality product to recommend the online customer, Facebook uses a People you may know feature to recommend friend connections.
• Data science is an integration of statistics, computing technology and artificial intelligence (AI): These fields together creates a new posts in the companies like Google called data scientist. A data scientist team consists of statisticians, computer scientists, AI scientists and experts in other relevant fields. Data driven scientific discovery is an important emerging paradigm for computing in areas including social, service, Internet of Things, sensor networks, telecommunications, biology, health-care and cloud. Under this paradigm, Data Science is the core that drives new research, from environmental to social. There are certain associated scientific challenges, ranging from data capture, creation, storage, search, sharing, modeling, analysis and visualization. The integration across heterogeneous, interdependent complex data resources for real-time decision making, streaming data, collaboration and ultimately value co-creation among the complex aspects to be addressed. Data science encompasses the areas of Statistics, mathematics, Computer Science, Information Theory, Information Technology, machine learning and optimization. It has become essential to overall understanding from large data sets and convert data into actionable intelligence, be it the data available to enterprises, Government or on the Web [6].
Broad areas of research
1. Mathematical Foundations
1) Information theory and models
2) Mathematical, probabilistic and statistical models and theories
3) Machine learning theories, models and systems
4) Knowledge discovery theories, models and systems
5) Manifold and metric and Deep learning
6) Scalable analysis and learning
7) Data curation, Heterogeneous data/information integration
8) Data pre-processing, sampling and reduction
9) Data Dimensionality reduction
10) Feature selection, transformation and construction
11) Large scale optimization
12) Architecture, management and process for data science
2. Machine learning and knowledge discovery
1. Learning for streaming data
2. Learning for structured and relational data
3. Latent semantics and insight learning
4. Mining multi-source and mixed-source information
5. Mixed-type and structure data analytics
6. Cross-media data analytics
7. Data visualization, modeling and analytics
8. Multimedia/stream/text/visual analytics
9. Relation, coupling, link and graph mining
10. Personalization analytics and learning
11. Web/online/social/network mining and learning
12. Structure/group/community/network mining
13. Cloud computing and service data analysis
3. Storage, retrieval and search
1. Data warehouses and cloud architectures
2. Large-scale databases
3. Information and knowledge retrieval and semantic search
4. Web/social/databases query and search
5. Personalized search and recommendation
6. Human-Machine Interaction (HMI) and interfaces
7. Crowdsourcing and collective intelligence
4. Privacy and security
1. Security, trust and risk in data
2. Data integrity, matching and sharing
3. Privacy and protection standards and policies
4. Privacy preserving data access/analytics
5. Social impact
5. Data Science applications
R&D proposals in application areas should motivate, describe and analyze the use of Data Science and Technology in practical application as well as demonstrate/ illustrate their actual impact. Proposals that address topics such as (but not limited to) the following:
1. Best practices and lessons learned from both success and failure
2. Data-intensive organizations, business and economy
3. Quality assessment and metrics
4. Complexity, efficiency and scalability
5. Data representation and visualization
6. Large scale application case studies and domain-specific applications, such as but not limited to:
7. Online/social/living/environment data analysis
8. Mobile analytics for hand-held devices
9. Anomaly/fraud/exception/change/drift/event/crisis analysis
10. Large-scale recommender and search systems
11. Data analytics applications in cognitive systems, planning and decision support
12. End-user analytics, data visualization, human-in-the-loop, prescriptive analytics
13. Govt/ Business data, such as for financial services, manufacturing, retail, utilities, telecom, national security, e-governance etc.