Professional Documents
Culture Documents
I and C Architecture Design
I and C Architecture Design
L1: Introduction
L2: Overview architecture frameworks
L3: Meta architecture and design framework
E1: Guest lecture; Digital Sandbox Bharosa
L4: Modular architectures and technology (!)
L6: Big data and data quality
L7: From BPMN to orchestration
L8 = E2: Guest lecture; CRANIUM
L9: Middleware (!!)
L10: Transactions concurrency and blockchain
L11 = E3: Guest Lecture; VKA (missed this one, check effect on exam grade)
L12: Project presentation
L14: Exam questions
Lecture 1
• Course objectives
• History and developments of ICT in public and private organizations
• EDI, XML, XSLT (S2S data exchange)
• Need for ICT-systems architecting
Path dependencies: explains how the set of decisions one faces for any given circumstance is
limited by the decisions one has made in the past.
à Within ICT, this could be the installed base of systems, chosen standards,
procedures and routines that influence future behavior.
à First movers’ advantage is temporary, and first movers are blocked by their
progress which will ultimately cause them to lag behind. ‘Wet van remmende voorsprong’.
Coherency Management: Architecting the Enterprise for Alignment, Agility, and Assurance.
Starting points for I&C design:
• Multi actor situation • Strategic fit: interrelation
• Limited influence/authority of all internal/external components
stakeholder • Translation from strategy to ICT
• All kinds and types of systems are and vice versa
already available • Switching between views:
• Need for understanding the big technological, economical,
picture organizational, psychological, user.
• Creating a shared understanding • Attention to issues like security
• No ‘optimal’ but negotiated and privacy, scalability, robustness,
solution flexibility, and standards.
Insurance companies have many different products as well: B2B, B2C, Direct/Indirect.
Middleware technology:
• Hides the complexity of source and target systems.
• Makes systems even more complex.
• Deals with protocols.
• Focus on sharing data between heterogeneous information systems.
à use architecture and modularity to implement middleware technology to simplify the
system! (Need to gain an overview in the mess / Information redundancy / understandable).
Namespaces: A collection of all element types and attributes names for a certain domain.
• Prevent naming conflicts
• Easier to assemble large schemata from smaller ones
• Each namespace is tied to a uniform resource identifier (URI) (=some sort of URL)
• The namespace name and the local name of the element together form a globally
unique name known as a qualified name
à Always use namespaces to avoid collisions
What is architecture?
All the parts are connected to one another and ensuring that you have an overview of what
is going on. In ICT architecture, the architecture is not tangible… Architecture can refer to
the structure, the process, or a profession. Architecting is a process.
Levels of design
• Conceptual design
• Implementation design
• Implementation
*often a maximum of three months is given for an architecture project. (Due to changes over
time)
Business context: Is the information; those are the business parts and responsibilities. Show
the actual work processes. (Requirements and needs)
IT context: the applications, the software, the infrastructure that support the actions and
processes. (IT solution, supply)
Resource-based view
• Resources as organizational assets
• Resource attributes: Valuable, Rare, In-Imitable, Non-substitutable (VRN)
• à Human resources, budget, …
Dynamic capabilities
• To change the resources to comply with new environment
• Aspects: path dependencies, …
A business event is something that happens (externally) and may influence business
processes, functions, or interactions.
A business process represents a sequence of business behaviors that achieves a specific
outcome such as a defined set of products or business services.
Lecture 3
Enterprise ICT-architecture = to support the design (not the actual design itself)
(Meta) Framework:
Know the environment, drives and development; market, customer and segments; available
resources and expertise; distribution channels; products. à these are situational factors
influencing the architecture. Thereafter, make a set of Business Requirements.
MoSCoW analysis:
Must, Should, Could, Would.
à what kind of tradeoffs are you expecting?
Layered-based engineering:
• Each layer can be used to represent one type of entities
• Reduce complexity and scope or understand relationships
• Design each layer independent of other layers
• Use of different views and objectives
• Reduce complexity
• One layer can be designed relatively independently of others
How are the layers connected to each other?
Layers can be split or merged, depends on what you want to show.
Grouping: element aggregates or composes concepts that belong together based on some
common characteristics.
Information architecture
Describes the relationship between the business processes, applications and information
sources aimed at storing, processing, reusing and distribution of information across
information resources.
à Information architecture is the organization of information to aid information sharing
among actors. Fe; Vital records registry.
Application architecture
• Describes the software applications, components and objects, and the relationship
between these parts.
• Best-of-breed vs. frameworks
• Integration and middleware
• IT systems vs. enterprise architecture
o IT systems architecture: decompose into individual functional software
components
o Enterprise architecture: decompose into manageable parts
Technical architecture
• Technical architecture Is about generic facilities, used by many application systems. It
is about functionality that is a common need of many different systems.
• Topics include Next Generation Infrastructure (NGI), grids, wireless networks
• In a NGI no new hardware is bought for each new system, but a standard
infrastructure is provided.
à changes due to the cloud!
The government is there to help and support the citizens. Yearly, 125 billion euros on public
services in the Netherlands, these costs are rising. While citizens expect these services to be
free, available, and well-established (personalized services, digital inclusion, proactive
services, responsible data sharing, life event support). To provide these services, data
exchange takes place. Atm; most agencies use portals. One-way vs Two-way information
flow portals. Companies do not have these portals yet for Standards Business reporting.
There are also many calls for improvement; too difficult and vague to apply for subsidies or
services. Unfortunately, many recent big ICT projects have failed to improve this situation.
What to do?
GovTech: Startups and SMEs want to provide public services directly to citizens,
(HuurPaspoort, Cleverbase).
Way of working: Innovation pipeline. Connecting research with policy making; the policy
cycle.
1. What is Digicampus?
The quadruple helix approach to public service innovation.
Combines: Government Agencies, User groups, Software providers (corporate & startups),
and Academia. The Digicampus is the Digital Sandbox for learning and experimentation.
Digicampus is helping to formulate the right agenda en policies for services. Combine
research with policy making.
What is a module?
Objects and components
If something is developed and used as an object or component depends on your viewpoint
A component
• is language independent
• A way of organizing and thinking about the runtime structures of a system
• Loosely coupling with components, you loosen the coupling between classes and the
developers responsible for them
• Stateless: components can be replaced and substituted in near real-time dependent
on interface
• Self-contained: enabling black-box reuse
• Component-and-connector model
• Combinations of components can be new components
An object
• Can be viewed as a type of component
• Is an abstraction and needs to be given context
• An object’s class instance has specific attributes and behaviors
• Encapsulation: implementation details are hidden, and only methods are exposed
• Inheritance as a way to reuse – this requires knowledge about the implementation
details of the base class
• Polymorphism (many forms): subclasses can define its behaviors and attributes while
retaining some of the functionality of its parent class
• If multiple developers work on the same code base, they have to share source files.
In such an application, a change made to one class can trigger a massive re-linking of
the entire application and necessitate retesting and redeployment of all the other
classes (White-box reuse)
Protocols
HTTP = Hypertext Transfer Protocol.
Web services
• Convergence of technology streams
o Ubiquitous infrastructure (IP, HTTP)
o Proven approaches (COBRA, RPC)
o XML
o Business standards (EDIFACT, X.12)
• Middleware for middleware
• Middleware agnostic
• RPC or messaging-based
• Access remote applications
• Accepted by most software vendors
Goal: To abstract business logic from implementation
• Web services perform encapsulated business functions
• Loosely coupled, self-contained, stateless properties (independent)
SOAP
• Not bound to HTTP REST
• WSDL interface and contracting • Simple
support • Suitable for simple CRUD (Create,
• Performance read, update, delete) applications
• POST statement is needed and not • High performance
URL can be used
Definitions
Information quality (IQ) is the characteristic of information to meet the functional,
technical, cognitive, and aesthetic requirements of information producers, administrators,
consumers and experts.
Quality information is information that meets specifications or requirements.
Information quality: a set of dimensions describing the quality of the information produced
by the information system. Information quality is one of the six factors that are used to
measure information systems success.
à Information Quality is fit for use.
Not all information is always needed, think of the dimensions that are needed. Fe: accuracy,
completeness, timeless.
It is subjective. Think of what is quality?
• Depends on the stakeholders’ view and context
• The dimensions (has many aspects)
• ‘how’ it has been measured.
IQ framework
Perspective Criteria
Content Relevance, obtainability, clarity of definition
Scope Comprehensiveness, essentialness
Level of detail Attribute granularity, precision of domains
Composition Naturalness, identifiability, homogeneity, minimum
unnecessary redundancy
Vies consistency Semantic consistency, structural consistency, conceptual
view
Reaction to change Robustness, flexibility
Values Accuracy, completeness, consistency, currency/ cycle time
IQ dimensions
• Accuracy
• Timeliness
• Relevance
• Quantity
• Completeness
IQ issues (system)
• Format
• Security
• Consistency
• Availability
Chinese walls = not being allowed to collect data from one site to another
Always think about the system and information quality, it differs per context!
Opposing and complementary approaches
• The information systems solution
o Brings the information required to the persons using information systems
o Technology is used to solve the problem
o Fe: create multi-purpose portal
• Organizational redesign solution
o Requires the redesign of organization structures
o Decision-making is where the pertinent information is
o Fe: decision-making as much as possible in the front office
• “information management” and “knowledge management”
o Information management: focuses on the business processes and functions
that create, manipulate, and manage information
o Knowledge management: focuses on how organizational units interact and
how organizational units add to the store of information
Information architecture
Where to put to “needed information” (user, case, product?) in the overall process.
“True Business Process Management is an amalgam of traditional workflow and the ‘new’
BPM technology. It then follows that as BPM is a natural extension of – and not separate
technology to – workflow, BPM is in fact the merging of process technology covering 3
process categories: interactions between (i) people-to-people; (ii) systems-to-systems (iii)
systems-to-people – all from a process-centric perspective. This is what true BPM is all
about.” – Jon Pyke, CTO Staffware.
In combination with Lean Six Sigma – Who is in control à process owner and authority?
Basic idea of Workflow Management Systems (WFMS)
• Separation of processes, resources and applications
• Focus on the logistics of work processes, not on the contents of individual tasks
• Process perspective (tasks and the routing of cases)
• Resource perspective (workers, roles)
• Case/data perspective (process instances and their attributes)
• Operation/application perspective (forms, application integration)
• Control perspective (progress, tracking and tracing)
Users in BPM
Who is the orchestrator (the one in charge)? à can be more in one process!
The one that is in charge, is also responsible. Who manages the flow? To whom?
Granularity of services:
Business units: business case, prioritization, service levels, change board, defining standards,
services, processes, policies.
Decision table / business rules are more applicable when there are many ‘if-then’
statements. It is better than a flow diagram because it will be simpler.
Data management is the practice of organizing and maintain data processes to meet
ongoing information lifecycle needs.
Data privacy focusses on the identification and mitigation of the risk of non-compliance with
the requirements of the GDPR in Europe, ensuring proper handling of personal identifiable
information.
Data security focuses on the implementation of the technical measures of the GDPR and all
other activities necessary to safeguard confidentiality, integrity, and availability of
information.
1. CRANIUM
Belgium based company, mainly focusing on data management in relation to data privacy
and data security (consultancy). Active in projects in 12 countries world-wide. KPMG did
many assessments, gather situation and bring out advice. CRANIUM is more involved in the
implementation of the advice (data management etc).
Topics: GDPR Compliance, Information security, Data strategy, Test data management, Data
governance, Big data and advanced analytics, Privacy and security retainer, data retention
and deletion, internet of things. à data-driven enterprise.
2. Enterprise Data
Complexity due to organization.
What data the company has? What external sources to data a company has? How they need
to manage and secure and protect that data?
Structured vs unstructured
Internal vs external
Static vs non-static
Growth: Innovative revenue models, improved insights in the requirements of the customer.
à Artificial intelligence, big data/ data lakes, data driven business models, advanced data
analytics, internet of things. (CEO)
3. Data Management
Two parts: Data Life Cycle & Data Management.
Data governance who is the owner of the data? How do the stakeholders within a company
work with the same data? à everyone is using the data, nobody feels responsible.
Rules and roles need to be established to safeguard the privacy and security of the system.
4. Data Privacy
Start with Data Minimization.
Main concept: Principles, Legal ground, Subject rights, Processing, Data breaches, DPO.
à Principles: transparency, accuracy, purpose limitation, data minimization, integrity and
confidentiality.
Privacy by default (and design) is the new advancement that is being considered.
Full back ups vs Incremental back-ups
5. Data Security
How to get a grip on the risk that a company has.
Who can access what data?
Data integrity
• Refers to that the information stored in a system corresponds to what is being
represented in reality.
• Refers to aspects like consistency, security, reliability, timeliness, non-repudiation,
non-manipulation, that need to be warranted
• CIA Triade – which are conditions for information security :
o Confidentiality – Data should only be accessed by authorized persons
o Integrity – ensures that data is accurate and consistent. In other words, data
stored in a system should correspond to what is being presented in reality.
o Availability – authorization to make data available at the right time to fulfill a
need
• Ensuring data integrity requires data management/governance and middleware
Almost all data and system quality dimensions
Batch oriented systems are often not available all the time (maintenance, back-ups, update)
Information-oriented integration
1. Interface processing: App A à API à App B
2. Data replication: Datab A ß à Datab B
a. Very easy to replicate data; minimize risks and increase speed
3. Data federation: Virtual Datab 1 ß àààà Datab A, Datab B, Datab C, Datab D
a. Often older databases, virtual datab is set up so that it feels like ‘one’ datab
4. Semantic integration: A network of connected nodes
a. Bottom-up approach (No standardization…)
Service-oriented integration
• Using web-services
• Reusability
• Need for transaction support (if one of the Apps is not working!! Causes
inconsistencies)
• Limited view on Service-oriented Architectures (SOA)
à A composite application is made that consists of App A, App B, App C for example.
Middleware classification
• Interaction patterns (1-1, 1-n, n-n)
• Synchronous or asynchronous
o Directly (in real time = the internet) or not directly (example = email)
• Connection-oriented or connectionless
• Language specific or independent
• Proprietary or standard-based
o Dependent or Object-management-group standards
• Embedded vs. Enterprise
o Embedded = “hidden”, is more and more applicable (all IoT devices) – Lack of
security and lack of processing power often
Vree’s model on middleware
Asymmetric very important for Personal Data – not all data in the same place
For synchronous: immediate reaction is expected (seconds), not doing anything else till a
response is received, error is seen immediately. (Calling)
Asynchronous: Emailing.
Middleware language
Standard (windows)
Interoperability Proprietary (stock market)
Easy replacement Cover areas standards do not address (yet)
Economies of scale resulting in low-costs Differentiate from competitors
Longer life-cycle Customer lock-in
Long-lasting standardization processes Support can be part of buying process
Support not dependent on one vendor
Approaches to integration
Most preferred to do it on the data level, yet most difficult. The internet for example is only
on UI integrated.
User-interface integration
• Sometimes only way an application logic can be called
• Screen scraping (green screen)
o Application thinks it is interacting with users
• Primitive but often very necessary
o Not always efficient navigation
o Not generally scalable
o Most follow interface format, extract results and ignore formatting returned
from applications
o High maintenance costs
• Extremely relevant for internet (HTML)
7 types of middleware
• Remote Procedure Call (RPC)
o Client-server interaction that makes it possible for the functionality of an
application to be distributed across multiple platforms. Local program
requests a service from a program located on a remote computer, without
having network details. Used for synchronous data transfers, where client and
server need to be online for the communication.
• Message Oriented Middleware (MOM)
o Becomes less complicated to use application spread over various platforms.
Enables the transmission of messages across distributed applications. Also has
a queuing mechanism the allows the interaction between the server and the
client to happen asynchronously. (Overlap message brokers)
• Message brokers
o Communication by using queues supporting asynchronous and synchronous
message passing. Validity check on data structures and completeness.
Database for supporting publish-subscribe models. (is the central / receiving
system of messages)
• Database middleware
o Between the databases and the applications. Call-level interface (CLI)
between databases (drivers) and applications.
• Transaction Processing (TP)
o Two major types: TP Monitors and Application servers. Transaction; unit of
work consisting of a number of interactions with a beginning and an end.
Generally: tightly coupled, method sharing, need to change source and
destination IS for transactions. à strict monitoring. Two phase; prepare and
commit.
• Application service (wrappers)
o TM becomes applications server by incorporating application logic, typically
web-enabled, more and more functionality of message brokers included:
messaging, transformation, intelligent routing.
• Distributed objects (very complicated)
o Middleware or application development? Creating distributed applications,
for cross-enterprise method sharing. E.g. Corba and DCOM. Client = stub,
server = skel.
Properties (ACID):
• A: Atomicity = ‘All or nothing’ property
• C: Consistency = Must transform database from one consistent state to another
• I: Isolation = Partial effects of incomplete transactions should not be visible to other
transactions
• D: Durable = effects of a committed transaction are permanent and must not be lost
because of later failure
Clearing transactions
A centralized ledger tracks asset movement within the financial system between institutions.
Users have a key to make transactions, which contain a timestamp. Ledger is for storing all
transactions and creates a list. Ledgers are maintained by banks or intermediary and need to
be secured. The key issue is how to secure the ledgers and not being able to manipulate it.
The solution: Distributed ledger technology (= Block chain)
• Distributed autonomous ledger
o Timestamped blocks that hold batches of valid transactions
o Each block includes the hash of the prior block
o The linked blocks form a chain
o The blocks are distributed and synchronized (=distributed ledger)
o Creating new blocks is known as mining
• Integrity is created by distributed consent (majority voting)
• Every node in a decentralized system has a copy of the block chain
• Longest chain represents the truth
Opinion: Not suitable for personal information because the information is open and can be
accessed by everyone.
Problem: The blockchain gets longer and longer, more processing power is needed.
Therefore, sometimes an older part of the blockchain is stored somewhere. ‘Tangels’ can also
be an option.
Possibilities of blockchain:
• Enabling tokenization
• Time proof sealing
• Data record immutability
• Viewing history
• Automatic execution of transactions (smart contracts)
• Various blockchain infrastructures have different properties
Properties:
Soundness – Everything that is provable is true, no cheating
Completeness – Everything that is true has a proof
Zero-knowledge – only the statement being proven is revealed
à eg; client password when login on a system
The collection of technologies contributes to benefits; not only blockchain or only smart
contracts.
Governance OF blockchain (programmers) // Governance BY blockchain (zero-knowledge)
Concurrency Control
• Process of managing simultaneous operations on systems without having them
interfere with one another
• Prevents interference when two or more users are accessing database or shared
object simultaneously and at least one is updating data
• Although two transactions may be correct in themselves, interleaving of operations
may produce an incorrect result
• Potential problems:
o Lost update problem
§ Successfully completed update is overridden
o Uncommitted dependency problem
§ Occurs when one transaction can see intermediate results of another
transaction before it has committed (might be rolled back)
o Inconsistent analysis problem
§ Occurs when transaction reads several values but second transaction
updates some of them during execution of first transaction
Summary Concurrency
Locking can be used to deny access to other transactions and so prevent incorrect updates
• Most widely approach to ensure serializability
• Generally, a transaction must claim a shared (read) or exclusive (write) lock on data
item before read or write
• Lock prevents another transaction from modifying item or even reading it, in case of
write lock
Locking rules:
1. If transaction has shared lock on item, can read but not update item
2. If transaction has exclusive lock on item, can both read and update item
3. Reads cannot conflict, so more than one transaction can hold shared locks
simultaneously on same item
4. Exclusive lock gives transaction exclusive access to that item
5. Some systems allow transaction to upgrade read lock to an exclusive lock, or
downgrade exclusive lock to a shared lock
• Deadlock is an impasse that may result when two (or more) transactions are each
other’s waiting locks held by the other to be released
• Deadlocks should be transparent to the user, so DBMS should restart transactions
• Three general techniques for handling deadlock:
o Timeouts (monitoring)
§ abort one (or both) transaction(s), commonly used
§ Disadvantage: A lock may be aborted without a deadlock & penalizing
long-running transactions
o Deadlock prevention
§ Looks ahead to see if transaction would cause deadlock and never
allows deadlock to occur
• Wait-Die- only an older transaction can wait for younger one,
otherwise transaction is aborted (dies) and restarted with
same timestamp
• Wound-wait- only a younger transaction can wait for an older
one. If older transaction requests lock held by younger one,
younger one is aborted (wounded)
o Deadlock detection and recovery
§ Monitor deadlocks to occur and breaks it
§ Monitor constructs of wait-for graph (WFG) showing transaction
dependencies
• Create a node for each transaction
• Create edge T1 à T2, if T1 waiting to lock item locked by T2
§ Deadlock exists if and only if WFG contains cycle
§ WFG is created at regular intervals
Data is subjective since it has been perceived. The same ‘outside world’ data can be
perceived differently. How does this affect AI? The same…
Data governance
1. Data is an asset and production factor
2. Data has an owner and a steward (accountable+responsible)
3. Data has a lifecycle (metadata model)
4. Data governance is a responsibility of the CIO/CDO
5. Data is documented in a data dictionary
6. Data access is through authorization (need to know) OR data access is open (need to
share)
7. Master data is only altered at the source
8. Data is validated on CREATE and UPDATE
9. Data addition/ enrichment leaves master data intact
10. Data is “Open by design”
Employees
• Chief Data Officer (CDO): Oversees a range of data-related function to ensure your
organization is getting the most from what could be its most valuable asset.
• Data Stewards: are accountable for the day-to-day management of data.
• Business Analyst / Data Translators: play a critical role in bridging the technical
expertise of data engineers and data scientists with the ‘business’.
• Data scientists: are analytical data experts who have the technical skills to solve
complex problems.
• Data architects: conceptualize an visualize data frameworks; Data engineers build
and maintain them.
Types of data
Structured:
• Master data: Company’s own data such as client’s info, personal info
• Reference data: common data such as NLD = Netherlands, from an external silo
• Transaction data: Reflection of a transaction
• reporting data: aggregated data, can be refined from Transaction data
Unstructured: (become increasingly important)
• Documents
• Media
• Photographs
Technology changes rapidly, but data is relatively stable. Master data forms the collective
memory of an organization. Focus shifts from internal à external data.
Master data Management
A wheel within DMBOK scheme
• External codes
• Internal codes
• Customer data
• Product data
• Dimension Mgmt
Data dictionary
Business object model
Data models
1. Semantical (dictionary) - What is the meaning of a certain word/ object?
2. Conceptual (Business objects) – Show relationship between products and actors fe
3. Logical (object relations and attributes) – Customer has ID, address, account, IBAN
4. Technical (database design) – How can we put all these things in a database?
4th layer (technical) is the only difference per design à there you choose which database to
use (Oracle for example).
Data architecture patterns
How to create value from data (data in itself is ‘dumb’)
Is also in the DMBOK scheme
3 patterns:
• Business intelligence (BI) – How you organize it into a systematic architecture
• Data science – analytics
• Data sharing – between organizations or departments
BI Architecture
Crisp-DM model
• Cross-Industry Standard Process for Data Mining
• Widely used and standard for DS projects, Finished: no longer maintained
• Non-proprietary
• Application/ Industry neutral
• Tool neutral
• Focus on Business issues
o As well as technical analysis
• Framework for guidance
• Experience base
o Templates for analysis
• Focus on Continues Evaluation &
CRISP-DM
Damhof Model: 1&2 Combined
AI and algorithms
NLP = Natural language processing
Don’t:
Start BIG (Think BIG, but act small)
Think you know what the business wants à analyze!
Look at the data without context to see if the correlation makes sense
Forget to assess quality; garbage in = garbage out
Thumb rules:
Always start with a clear business question
Know and engage the business domain
Semantics count! Understand the data!
Solve something quick, harvest small success
Remember the goal of the operation
Shared data must be used according to GDPR
Lecture 12
Project presentations
Lecture 14
Separate file on Brightspace with more Example Exam Questions.
Remarks: Can be a burden to business while also still too vague to be applicable.
Web services can send requests in the form of JSON, XML, an HTML file, images, Audio, etc.
All APIs are not web services à All web services are API.
API: Light weight architecture and good for devices which have limited bandwidth.
Web service: No lightweight architecture. Require SOAP protocol.
API functions:
1. Access to data
2. Hide complexity
3. Extend functionality
4. Security (gatekeepers)
Middleware
“Software-glue”. Between operating system and user application for example.
Database – JDBC – Java application.
Meta information
Meta information is information about information. For example, if a document is
considered to be information, its title, location, and subject are examples of meta
information. This term is sometimes used interchangeably with the term metadata.
BPEL4WS
Is a standard executable language for specifying actions within business processes with web
services. Processes in BPEL export and import information by using web service interfaces
exclusively.
Data Lineage
Refers to the traceable path for specific critical data element (CDE) from end user report
upstream to the ultimate source (that path includes aggregated sources such as data
warehouse and data marts, operational data stores, staging areas, and transactional
system).
Data Lake
A system or repository of data stored in its natural/raw format. Usually a single store of data
including raw copies of source system data, sensor data, social data, and transformed data
used for tasks such as reporting, visualization, advanced analytics and machine learning. Can
include structured, semi-structured, unstructured, and binary data. Can be ‘on premised’ or
‘in the cloud’.
A “data swamp” is a deteriorated and unmanaged data lake that is either inaccessible to its
intended users or is providing little value.
Code of Conduct
Is a set of rules outlining the norms, rules, and responsibilities of proper practices of an
individual party or an organization.
Data marts
Subject oriented data assets
Example Exam Questions
Q1: What is the best definition of enterprise IT-architecture according to Ross (2003)?
A) Policies for using IT in the organization
B) The organizing logic for data, applications, information and business processes
C) A destination plan for the IT-landscape
D) A description of the relationship between business and IT
Q2: Which of the following is the reason why a layered approach is preferred in architecting?
a) Each layer can be used to represent similar types of entities
b) Layers create greater complexity and scope
c) Each layer can be designed dependent on each other
d) Layers avoid different views and objectives
e) None of above
Q3: What are the four types of communication presented on the figure below? (from top to
bottom)
A- User interface, B- Application method level, C- Application Interface Level and D- Data
Level
B- Software and Operational Systems, B- System applications, C- Automated Data Collection
and D- Databases exchange.
C- User interface applications, B- Logical Operational Systems, C- Application Integration
Level and D- Data Exchange Level.
D -None of them.
Q4: Which of the following answer(s) is (are) NOT TRUE about the differences between
architecting and engineering? (more than one answer can be correct)
a) Architecting takes place in ill-structured situation, meanwhile engineering takes place in
better defined environment
b) Engineering serves the client, whereas architecting serves the builder
c) Heuristics/synthesis is mostly used in architecting, whereas engineering uses equations
and analysis
D) Architecting focuses on components, whereas engineering focusses on misfits interfaces
Q5: What are characteristics of block chain technology? (more than one answer can be
correct)
a) Ledger for storing transactions
b) Users have a (public/private) key to make transactions
c) Timestamped blocks that hold batches of valid transactions
d) Each block includes the hash of the prior block
e) Every node in a decentralized system has a copy of the block chain
f) Longest chain represents the truth
g) None of them
Q6: What characteristics belong to information stewardship? (more than one answer can be
correct)
a) Third parties can make changes
B) Third parties report changes and mistakes to the information stewardship
c) All (third) parties should reuse information from the steward
d) Third parties have the obligation to keep data actual
e) Information stewards have the obligation to keep data actual
f) None of them
In example; In a restaurant, the waiter is the means of communication between ‘you’ and
‘the kitchen’. The waiter is the web service / API. He is communicating between two
applications and making sure the communication is successful.
Server Service Provider: develops and implements the application (web service) and makes it
available over the web (internet). There should be a client (service consumer).
When the Service Provider and Service Consumer, do not know each other. How to share the
WSDL? >> “A web Service Provider publishes his web service (through WSDL) on an online
directory from where consumers can query and search the web services. This online
registry/directory is called Universal Description, Discovery and Integration (UDDI).
Constraints / Principles:
• Uniform interface
o Resource (nouns): everything is a resource (all modules/ databases etc are
available as resource when defined)
o URI: any resource/data can be accessed by a URI (=URL)
o HTTP (verbs): make explicit use of HTTP methods (‘CRUD’ = Get, Delete, Post,
Put)
• Stateless
o All client-server communications are stateless (Server = stateless, request
from client must contain all of the necessary data to handle the request)
(Improves the web service performance)
• Cacheable
o Happens at client side (Cache-control and Last-modified, What information
should be saved?)
• Layered System
o Layers can exist between server and client (proxies / gateways)
• Code on Demand (optional)
o Ability to download and execute code on client side
“The key abstraction of information in REST is a resource. Any information that can be
named can be a resource: a document or image and so on… “ – Roy Fielding
Authorization vs Authentication
Authentication = Who you are.
Authorization = What authority you have.
Recap
Protocol can be considered as a law suit or common agreement between two or more
parties (components) used for communication with each other. Most of the times protocol
includes the steps and/or procedures that should used when communicating with each
other.
API allows and defines how two applications can communicate with each other by using the
methodologies defined by the service providing application. Compared to a protocol, API
describes the programmatic ways to communicate in-between applications. Service calling
application must properly adhere to the standards in order to get the required service.
Web services these are very similar to APIs. Notable thing with the web services is,
developing a web service expects users to access it over the internet. Therefore web service
can be considered as an online API.
Middleware allows to communicate with distributed application components located in
several computers (Simply links the components located in various machines in order to get
the full application capabilities). Middleware minimizes the developing effort by overcoming
heterogeneous factors (OS, hardware, network equipment etc.). Middleware locates in
between application (application components) and the OS.
Cloud Computing Architecture
Why cloud computing?
Previous situation:
• On-premise is expensive
• Less scalability
• Allot huge space for servers
• Less chance of data recovery
• Long deployment times
• Lack of flexibility
• Poor data security
• Less collaboration
• Data cannot be accessed remotely
Benefits
Easily upgraded
Cost-efficient
Scalability
Automated
Highly available
Flexible
Better security
Customization
Cloud computing architecture
Front end;
• Cloud infrastructure consist of hardware and software components such as data
storage, server, virtualization software etc
• It also provides Graphical User Interface (GUI) to end users in order to perform
respective tasks
Back end;
• Manages all the programs that run the application on the front end
• It has a large number of data storage systems and servers
• It can be software or a platform
• “task is to provide utility in the architecture”
• Eg; Amazon S3, Oracle Cloud-storage, Microsoft Azure Storage
Components:
1. Hypervisor
a. Virtual Operating Platform, for every user
b. Divide and allocate resources
2. Management software
a. Manage and monitor the cloud operations
b. Improving the performance of the cloud
3. Deployment software
a. SaaS (Gmail)
b. PaaS (Microsoft Azure)
c. IaaS (pay-as-you-go pricing model)
4. Network
5. Cloud server
6. Cloud storage
Data Management (Online-course)
Course objectives:
• Understand data management capabilities from the people, process and technology
perspective.
• Understand how each capability fits into overall Data Management Framework.
Introduction
Data management refers to the development and execution of architectures, policies,
practices, and procedures in order to manage the information lifecycle of an enterprise in an
effective manner.
>> Al lecture titles are the Capabilities of data management. Each capability has three
aspects: People, Process, and Technology.
Metadata management involves managing data about other data, whereby ‘other data’ is
generally referred to data models and structures, not the content. (e.g. Business terms in
glossary, attributes in logical data model, or tables and columns in the database). It is to see
how the data is being managed by, and through, the organization.
Operational metadata includes information about application runs; their frequency, record
counts, component by component analysis.
>> System Development Lifecycle (SDL) = Plan > Create > Test > Deploy > Plan
Data quality dimensions refers to the aspect of feature of information that can be assessed
and used to determine quality of data.
Data quality rules refer to business rules that are set up to protect the data quality.
Data quality process: Define DQ requirements > Conduct DQ assessment > Resolve DQ
issues > Monitor and control.