Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

I and C Architecture Design 2020

L1: Introduction
L2: Overview architecture frameworks
L3: Meta architecture and design framework
E1: Guest lecture; Digital Sandbox Bharosa
L4: Modular architectures and technology (!)
L6: Big data and data quality
L7: From BPMN to orchestration
L8 = E2: Guest lecture; CRANIUM
L9: Middleware (!!)
L10: Transactions concurrency and blockchain
L11 = E3: Guest Lecture; VKA (missed this one, check effect on exam grade)
L12: Project presentation
L14: Exam questions

Lecture 1
• Course objectives
• History and developments of ICT in public and private organizations
• EDI, XML, XSLT (S2S data exchange)
• Need for ICT-systems architecting

Path dependencies: explains how the set of decisions one faces for any given circumstance is
limited by the decisions one has made in the past.
à Within ICT, this could be the installed base of systems, chosen standards,
procedures and routines that influence future behavior.
à First movers’ advantage is temporary, and first movers are blocked by their
progress which will ultimately cause them to lag behind. ‘Wet van remmende voorsprong’.

Coherency Management: Architecting the Enterprise for Alignment, Agility, and Assurance.
Starting points for I&C design:
• Multi actor situation • Strategic fit: interrelation
• Limited influence/authority of all internal/external components
stakeholder • Translation from strategy to ICT
• All kinds and types of systems are and vice versa
already available • Switching between views:
• Need for understanding the big technological, economical,
picture organizational, psychological, user.
• Creating a shared understanding • Attention to issues like security
• No ‘optimal’ but negotiated and privacy, scalability, robustness,
solution flexibility, and standards.

History of I&C development


Case: Ohra
Insurance company, direct writer, product-oriented information systems, large transaction
processing systems, large number of players in the market, no market transparency, large
number of insurance and banking products.

‘MainFrame’ situation in the 1980’s; Characteristics:


• Fully centralized architecture • Dumb terminals
• One mainframe built as monolithic • Central control and maintenance
entity • Only employees have access
• All applications reside on the • Simple client/server architecture
mainframe

Towards Distributed networks in 1990; Characteristics:


• Multiple applications on various geographic locations
• More complex architectures
• Basic interactions
o File transfer
o Remote printing
o Terminal transfer
o Remote file access

Functional applications 1995:


• For each product (car, medical, .. insurance) a separate information system
• For each department (accounting, human resources, ..) a separate information
system
• Each department selects technology and solutions independent of other
departments
• ‘Management by magazine’
• No communication between applications
• However,
– Similar data in applications
– Similar functionality used

Insurance companies have many different products as well: B2B, B2C, Direct/Indirect.

Mission critical legacy systems:


Built in Cobol, still running and are reliable, yet very hard to change. Wait to get the RoI
back.

Need for integration and architectures.


Electronic Data Interchange (EDI): As a way of communication by means of which formatted
business documents are sent electronically from one organization’s computer to another
organizations computer. Characteristics are;
- Data standards
- Transfer of structured data
- Between organizations
- Application to application
- Across heterogeneous computer platforms

Middleware technology:
• Hides the complexity of source and target systems.
• Makes systems even more complex.
• Deals with protocols.
• Focus on sharing data between heterogeneous information systems.
à use architecture and modularity to implement middleware technology to simplify the
system! (Need to gain an overview in the mess / Information redundancy / understandable).

XML Extensible Markup Language (Need to know the concepts!!)


Content <-> Presentation <-> Structure <-> Content (Format transformation)
‘Separation of concerns’
Check for errors: <x><y> … </x> </y> is false.
Difference between HTML and XML: HTML is only for the presentation. What you see in a
web browser. In an XML browser, a stylesheet (XSL) can be used to create HTML, but also
can be transferred to pdf for example.

By splitting the structure:


• Modularization: compose systems and enterprises of readily available components.
• Adaptive enterprise: adapt to changing circumstances.
• Networks of organizations: enterprise architectures need to be interoperable among
organizations.

Namespaces: A collection of all element types and attributes names for a certain domain.
• Prevent naming conflicts
• Easier to assemble large schemata from smaller ones
• Each namespace is tied to a uniform resource identifier (URI) (=some sort of URL)
• The namespace name and the local name of the element together form a globally
unique name known as a qualified name
à Always use namespaces to avoid collisions

Cloud computing (2012)


With just 1 Application Programming Interface (API):
• Infrastructure becomes less important
• Software as a Service (SaaS)
• Scalability is easier
• Saving costs
• Many issues.. : Long term sustainability, lock in, …
Lecture 2
• Knowledge of design science principles
• Understand relationship design and architecture
• Knowledge of various conceptualizations of architecture
• Kowledge about EA frameworks
§ Zachman
§ Tapscott
§ Togaff
§ Archimate

What is architecture?
All the parts are connected to one another and ensuring that you have an overview of what
is going on. In ICT architecture, the architecture is not tangible… Architecture can refer to
the structure, the process, or a profession. Architecting is a process.

Check Hevner document – Design science


on Rigor (theories and models into account) vs. Relevance (addressing the practical problem)
Designing in socio-technical setting in between the rigor and relevance.

Prototyping is important to make designs tangible.

Levels of design
• Conceptual design
• Implementation design
• Implementation

Business process (re)design? Database design? (Laws, regulations, culture à rules)

Goals: Functions and specifications for process/product. (can be conflicting / Tradeoffs)


Design space: Options, Alternatives (Decision variables, values, attributes, and ranges) (DoF)
Test or models: Agreed to procedure, computer program à used to transform the values for
the decision variables into an evaluation of the proposed design alternatives.
Starting points: Existing solutions, goals and tests. (Path dependencies / ‘No green field’)

Herder & Stikkelman: Elements of design process model


Analyzing the design space

*often a maximum of three months is given for an architecture project. (Due to changes over
time)

Goals Enterprise Architecture (EA) frameworks


• Dealing with complexity
• Defines and interrelates the various elements from multiple (stakeholders’) views
• Related sub architectures
• Means to order architecture results
• A means to guard their completeness, both in terms of scoping and level of detail
• Insight into the interrelationships of architecture results, enabling the traceability of
decisions and their impact
• Refrain from technological details
• Helps to translate to implementation

Zachman Framework – the first architecture framework


Integration and coordination across enterprise: (Matrix)
• Rows define stakeholders’ views. The five rows address the perspectives of the
planner, owner, designer, builder, programming and those involved in operation.
• The columns define various abstractions of the system. Describes using interrogative
words, insight may be gained into different aspects of an enterprise (Actors, timing,
processes, functionality, … )
Disadvantages:
• Not relating the cells to each other. No relationships are being shown.
• Time horizon is also not shown.

Tapscott Framework – The five views

Business context: Is the information; those are the business parts and responsibilities. Show
the actual work processes. (Requirements and needs)

IT context: the applications, the software, the infrastructure that support the actions and
processes. (IT solution, supply)

Dynamic Enterprise Architecture (DYA) framework – Show a process


BIT = Business, Information, Technical.

Model-driven architecture (OMG-MDA) – not a real framework. Independent of the


platform. Enables reuse.

IEEE1471 Framework – formal elements of software architecture


It takes a system level view – can be decomposed.
• System: A collection of components organized to accomplish a specific function or set
of functions.
• Architecture: The fundamental organization of a system embodied in its components,
their relationships to each other, and to the environment, and the principles guiding
its design and evolution.
• Architecture description (AD): A collection of products to document an architecture.
• View: A representation of a whole system from the perspective of a related set of
concerns.
Basic concepts IEEE1471

TOGAF – it’s the standard – The Open Group’s Architectural Framework

à ADM: Architecture Development method. à ADM cycle = Process model.


Collection of best practices, models, and checklists. Architecture function central. IT
centered, slowly adding more business architecture.
Disadvantage: Too much too handle. Very bureaucratic. ‘All is included’.

Archimate – a description language – Closely related to TOGAF.


ArchiMate connects architectural domains:
• Broader s cope, but less detail than UML (software) and BPMN (processes)
• Does not replace more specialized languages such as UML, BPMN, and others.

ArchiMate layers and Aspects

Resource-based view
• Resources as organizational assets
• Resource attributes: Valuable, Rare, In-Imitable, Non-substitutable (VRN)
• à Human resources, budget, …
Dynamic capabilities
• To change the resources to comply with new environment
• Aspects: path dependencies, …

A business event is something that happens (externally) and may influence business
processes, functions, or interactions.
A business process represents a sequence of business behaviors that achieves a specific
outcome such as a defined set of products or business services.
Lecture 3
Enterprise ICT-architecture = to support the design (not the actual design itself)
(Meta) Framework:

• Understand architectural framework in this course


• Understand the core concepts of architecture (framework, layers, views, principles)
In the exam; elements and principles from certain architectures.

Principles; guide. Standards; can be used.

Typical balancing aspects:


• Reasonable level of abstraction • Describes the current situations,
• Adequate coverage of the real evolution project, and prescribes
world desired situations
• Reasonable familiar and assessable • Defines standards, principles and
concepts guidelines
• Communication vehicle • Entities differ while maintaining
• Link to both strategy and similarities in domains.
implementation

Know the environment, drives and development; market, customer and segments; available
resources and expertise; distribution channels; products. à these are situational factors
influencing the architecture. Thereafter, make a set of Business Requirements.

Programme of Business Demands (PBD)


Bridge between the; business environment and strategic objectives – and the enterprise
architecture.
Serves as a guide. à Include a Goal Hierarchy.

MoSCoW analysis:
Must, Should, Could, Would.
à what kind of tradeoffs are you expecting?

Layered-based engineering:
• Each layer can be used to represent one type of entities
• Reduce complexity and scope or understand relationships
• Design each layer independent of other layers
• Use of different views and objectives
• Reduce complexity
• One layer can be designed relatively independently of others
How are the layers connected to each other?
Layers can be split or merged, depends on what you want to show.
Grouping: element aggregates or composes concepts that belong together based on some
common characteristics.

Business architecture – the highest layer


• Architecture as strategic capability (as core competences), a vision to guide
development of information systems in a ‘complex’ organization.
o Single capability of the firm cannot provide a sustainable competitive
advantage to the firm
o Competititve capabilities of the firm should be “complementary” or
“synergistic”
• Business Architecture takes into consideration the businesses strategy of the firm, its
long-term goals and objectives, the technological environment, and the external
environment
• Business Architecture: the arrangements of the responsibilities around the most
important business activities (fe; production, distribution, marketing) or the
economic activities (fe; manufacturing, assembly, transport)
• Per business domain a different goal hierarchy is possible.

Business process architecture


• Collection of business processes triggered by events:
o Each customer interaction results often in a business process
o Periodically triggers
o Internal triggers
• Interdependencies among sequences of tasks
• Operational (primary) and control (secondary) business process
• Include human as well as automated tasks
• Process decomposition: from value chain to detailed tasks

Information architecture
Describes the relationship between the business processes, applications and information
sources aimed at storing, processing, reusing and distribution of information across
information resources.
à Information architecture is the organization of information to aid information sharing
among actors. Fe; Vital records registry.

Application architecture
• Describes the software applications, components and objects, and the relationship
between these parts.
• Best-of-breed vs. frameworks
• Integration and middleware
• IT systems vs. enterprise architecture
o IT systems architecture: decompose into individual functional software
components
o Enterprise architecture: decompose into manageable parts

Technical architecture
• Technical architecture Is about generic facilities, used by many application systems. It
is about functionality that is a common need of many different systems.
• Topics include Next Generation Infrastructure (NGI), grids, wireless networks
• In a NGI no new hardware is bought for each new system, but a standard
infrastructure is provided.
à changes due to the cloud!

Implementation, Control, and Maintenance


• After architecture has been designed it needs to be implemented, controlled, and
maintained.
• Involves further development of systems (new releases)
• Often consumes most of the resources and is usually the bulk of the IT expenses
• Implementation will likely deviate from intentions and result in a revised architecture

Architectural guidelines principles and standards (!!)


• Design guidelines
o Supporting design, eg; use of an open source
o Often cannot completely be followed and need a trade-off (access vs security)
o Direct design decisions and are based on experiences of other designers
• Architecture principles
o Rules one has to follow, eg; front office focus’ on customers while back office
focus’ on efficiency
o Emphasize ‘doing the right things’ or give direction to behavior and are often
based on proven practices
o Are expected to give significant improvement
• Implementation principles
o Helps to translate the architecture into implementation
o Can be used to develop prototypes
• Standards
o Technology standards, eg; HTTP, XML
o Data standards, eg; Name before address, address contains street, no, zip
o Application standards, eg; Oracle for databases

These can all be categorized using layers or other categories

Principle-based Design and Architecting


Principles are leading instead of models. Useful when solving ill-structured ‘complex’
problems, which cannot be formulated in explicit and quantitative terms.
à Principles guide the designer in a certain direction, are generic by nature and do not
constrain creativity or possible solutions.
Name of the principle
Statement; what we do
Rationale; why we do it
Implications; when we (don’t) do it
E1: Digital Sandbox, Bharosa
Building the next generation of public services requires a Digital Sandbox

The government is there to help and support the citizens. Yearly, 125 billion euros on public
services in the Netherlands, these costs are rising. While citizens expect these services to be
free, available, and well-established (personalized services, digital inclusion, proactive
services, responsible data sharing, life event support). To provide these services, data
exchange takes place. Atm; most agencies use portals. One-way vs Two-way information
flow portals. Companies do not have these portals yet for Standards Business reporting.
There are also many calls for improvement; too difficult and vague to apply for subsidies or
services. Unfortunately, many recent big ICT projects have failed to improve this situation.
What to do?

GovTech: Startups and SMEs want to provide public services directly to citizens,
(HuurPaspoort, Cleverbase).

Barriers for public service innovation:


1. Lack of guidance
2. Access to data at government agencies
3. Building blocks from government agencies
4. No shared learning experimentation platform
5. Heterogenous/ non interoperable public service building blocks
6. Complex mix of rules and regulations
7. Lack of funding
8. Unclear process for transition from experiments to role out

Way of working: Innovation pipeline. Connecting research with policy making; the policy
cycle.

1. What is Digicampus?
The quadruple helix approach to public service innovation.
Combines: Government Agencies, User groups, Software providers (corporate & startups),
and Academia. The Digicampus is the Digital Sandbox for learning and experimentation.
Digicampus is helping to formulate the right agenda en policies for services. Combine
research with policy making.

2. Why do we need a digital sandbox?


Barriers 1-5 can be solved by using a digital sandbox.

3. What is a digital sandbox?


Goal; smoother transition from prototype to implementation.

4. What are the high-level requirements of the digital sandbox?


5. What are the use cases?
Over 100 calls for collaboration. Fe; Help elderly with digital authorization, help citizens with
personal with financial management, enable less tech savvy citizens to use voice
authentication via the phone, identify the barriers for digital inclusion.
Lecture 4
Modular Architectures and Technology – makes it possible for re-use of certain
(sub)systems. Thereby it only has to be developed once. Thereafter, it can be provided to
many other institutions. You do need one well-working UI.

• Understand principles of modular architectures


• Being able to modularize
• Understand basics of web services
• Knowledge about the main web services protocols (XML, SOAP, REST, WSDL, UDDI,
BPEL4WS)
• Understand how web services can be used to create a loosely-coupled, modular
application architecture
• Lean developments in web services protocol stack

Why modular architecture?


• Reuse of “working and proven” modules – many ICT projects do fail
• Shorter development time by reuse
• Focus on integration and orchestrating of modules
• Dealing with complexity: higher reliable systems
• Flexibility to modify and alter systems
• Building for modularity looks easy, but is challenging
o Interface design and configuration is a key aspect
o Information hiding: high cohesion within modules and loose coupling
between module (Parnas, 1972)
§ Providing the intended user with all the information needed to use the
module correctly and nothing more
§ Providing the implementer with all the information needed to
implement the module correctly and nothing more

What is a module?
Objects and components
If something is developed and used as an object or component depends on your viewpoint

• Component-oriented • Object-oriented (OO)


programming focuses on programming focuses on the
interchangeable code modules that relationships between classes
work independently • Objects are at a more granular
• Black box: Don’t require you to be (smaller) level and serve as
familiar with their inner workings building blocks of larger
to use them, but focus on the components/systems
interface • OO enables software reuse
• Components typically serve a • Once those classes are compiled,
specific purpose and functionality the result is monolithic binary code
(fe; identification)

A component
• is language independent
• A way of organizing and thinking about the runtime structures of a system
• Loosely coupling with components, you loosen the coupling between classes and the
developers responsible for them
• Stateless: components can be replaced and substituted in near real-time dependent
on interface
• Self-contained: enabling black-box reuse
• Component-and-connector model
• Combinations of components can be new components

An object
• Can be viewed as a type of component
• Is an abstraction and needs to be given context
• An object’s class instance has specific attributes and behaviors
• Encapsulation: implementation details are hidden, and only methods are exposed
• Inheritance as a way to reuse – this requires knowledge about the implementation
details of the base class
• Polymorphism (many forms): subclasses can define its behaviors and attributes while
retaining some of the functionality of its parent class
• If multiple developers work on the same code base, they have to share source files.
In such an application, a change made to one class can trigger a massive re-linking of
the entire application and necessitate retesting and redeployment of all the other
classes (White-box reuse)

Modularization Guidelines of Parnas


• The effectiveness of a “modularization” is dependent upon the criteria used in
dividing the system into modules
• Parnas (1972) recommends that systems should be decomposed along lines
encapsulating design decisions. Design decisions that are likely to result into changes
need to be hidden.
1. Minimize the interactions with the environment and standardize the services
interfaces
2. Create a well-defined interface and make a set of service level agreements
3. Every component contains a logical cluster of business objects and information needs
that can be used to operate a business process autonomously
4. There should be clear interfaces describing the inputs, outputs and responsibilities to
ensure accountability
5. Establish governance mechanisms to integrate the components not only at the
technical level, but also at the organizational level

Design principles for software modules


• A module should capture a business function
• A module should be self-contained (no information)
• Communication between modules should be minimized (loosely coupled)
• A module should be reusable, this is determined by:
o The scalability
o Interface extendibility
o Ability to configure
o Ability to replace
• Number of interactions
• Don’t forget the system response to the actions of actors
• Alternate courses of action are important

Design principles for creating a modular architecture


• Information should be captured only once at the source and reused by other
modules (coordination)
• There should be a (central) process control component integrating business process
steps with functionality provided by modules
• The module should, whenever possible, be offered as reliable and proven
commercial-off-the-shelf (COTS) software products supplied by a vendor
• Be able to manage the quality of modules (QoS, performance, security, ..)
• A module should be reusable and capture a business function
• Use of versioning (extensibility, multiple instances)
• Develop domain-specific modules (use of namespaces)

Orchestration: Is a way of controlling the dependencies between the modules.

Protocols
HTTP = Hypertext Transfer Protocol.

-) no service level support.

SOAP: Simple Object Access Protocol


• Platform-independent protocol
• Soap messages provide envelope in order to exchange structured data
o Header: meta-information to process its contents
o Body: data
-) not simple enough, therefore not a success

Web services
• Convergence of technology streams
o Ubiquitous infrastructure (IP, HTTP)
o Proven approaches (COBRA, RPC)
o XML
o Business standards (EDIFACT, X.12)
• Middleware for middleware
• Middleware agnostic
• RPC or messaging-based
• Access remote applications
• Accepted by most software vendors
Goal: To abstract business logic from implementation
• Web services perform encapsulated business functions
• Loosely coupled, self-contained, stateless properties (independent)

SOAP: provides an envelope around XML message in order to exchange structured


information.
UDDI: is a directory that offer a way to localize and register web-services.
WSDL: an XML-based protocol that shows the possibilities of a web-service.

REST: Representational State Transfer;


Based on underlying architecture of the WWW and its two-core specification; URIs and
HTTP. à simple and scalable, yet due to XML not able to read easily. More governance is
needed.
Difference SOAP request vs REST request

SOAP
• Not bound to HTTP REST
• WSDL interface and contracting • Simple
support • Suitable for simple CRUD (Create,
• Performance read, update, delete) applications
• POST statement is needed and not • High performance
URL can be used

API: Application Programming Interface

API vs. Webservices


• “An API is a set of functions and procedures that allow the creation of applications
which access the features or data of an operating system, application, or other
service”
• An API acts as an interface to an application to enable communication
• A webservice exposes an API over HTTP
• Only difference with a Webservice is that the latter facilitates interaction between
two machines of a net
• In general, all webservices are APIs but not all APIs are webservices
• API can use any style of communication
• API is often part of the application, whereas webservice is only a wrapper

Webservice vs. Micro service


• Both are language and platform independent
• Microservices often perform a single function
• Microservices have often a lower granularity and used at the programming level
• Microservices are often used to breakdown a monolith software application into
reusable components
• Webservices are often HTTP-based, whereas microservices might not be

Typical roles of an ICT-architect


• Create a library of reusable components
• Managing library of components
• Enabling reuse of components when projects develop a new component (which
comes at a price)
• Ensuring interoperability, adaptability, scalability, security, etc of components
• Stimulating reuse

How to determine services?


• Coarse or fine grained services?
o Business / Composite / Application
• Top down. Vs bottom up
o TD; Strategy processes and areas of business (DCE) (preferred, yet not always
possible because ‘no green field’)
o BU; existing applicating services
• Functional vs. process based
o Function; derived using use-case diagrams
o Business processes; take interdependencies into account
o For both case and alternative scenarios to support the life-cycle of systems
Lecture 5
Hints and tips on mid-term presentations;
• Do not forget the societal side of the problem (not only the technology)
• Opposing and conflicting requirements (also between stakeholders)

Delft’s Architectural and Design Framework

Continues on presentations from other groups


Lecture 6
Big data quality and data architecture; quality is enhanced when the architecture is stable.
Big data makes for the need for a better architecture.

• Understand Big Data characteristics and impact on information quality


• Being able to evaluate a dataset and an information architecture based on
information and systems quality
• Understand why various views need to be taken into account when designing on
information architecture and understand its limitations and benefits
• Gain an overview of information quality improvement methods
• Know information architecture basics: decoupling point, information flow vs. store
approach, stewardship

Why is information quality such an issue?


à Data glitches = systemic changes to data which are external to the recorded process
• Changes in data layout / data types
o Integer becomes a string, fields swap positions, etc
• Changes in scale / format
o Dollars vs. euros
• Temporary reversion to defaults
o Failure of a processing step
• Missing and default values
o Application programs do not handle NULL values well..
• Gaps in time series
o Especially when records represent incremental changes
• Missing data
o Match data specification against data – are all attributes present?
System data /= real data
Can be due to the system (crashing, sending, recovering) as well as due to human mistakes.

Definitions
Information quality (IQ) is the characteristic of information to meet the functional,
technical, cognitive, and aesthetic requirements of information producers, administrators,
consumers and experts.
Quality information is information that meets specifications or requirements.
Information quality: a set of dimensions describing the quality of the information produced
by the information system. Information quality is one of the six factors that are used to
measure information systems success.
à Information Quality is fit for use.

Not all information is always needed, think of the dimensions that are needed. Fe: accuracy,
completeness, timeless.
It is subjective. Think of what is quality?
• Depends on the stakeholders’ view and context
• The dimensions (has many aspects)
• ‘how’ it has been measured.
IQ framework
Perspective Criteria
Content Relevance, obtainability, clarity of definition
Scope Comprehensiveness, essentialness
Level of detail Attribute granularity, precision of domains
Composition Naturalness, identifiability, homogeneity, minimum
unnecessary redundancy
Vies consistency Semantic consistency, structural consistency, conceptual
view
Reaction to change Robustness, flexibility
Values Accuracy, completeness, consistency, currency/ cycle time

IQ dimensions
• Accuracy
• Timeliness
• Relevance
• Quantity
• Completeness

IQ issues (system)
• Format
• Security
• Consistency
• Availability

System quality issues


• Accessibility
• Response time
• Reliability
• Flexibility
• Integration (Inter-operability)

Quality improvement techniques


• Standardization
• Record linkage (connect data referring to the same object)
• Data and schema integration (master data management)
• Source trustworthiness (stewardship, selecting trustworthy data, recollecting of data
at source)
• Process control (checks and control procedures)
• Process redesign (reward accurate data entry)

Retrospective improvement techniques


• Data audits and reviews
• Cleaning focus (duplicate removal, merge/purge, name & address matching, field
value standardization)
• Acquisition of new data
• Error localization and correction
• Cost optimization (cost-benefits)

V’s of big data à influence the quality


1. Volume
2. Velocity
3. Variety
4. Variability
5. Veracity (accuracy)
6. Validity
7. Volatility
8. Visibility
9. Viability
10. Vast resources
11. Value

General Data Protection Regulation (GDPR)


Data storage vs. deletion
Individual vs. aggregated data
Storage of transactions (proof)
Privacy-by-design
Data portability – how well can you transfer the data to other systems?

Information as an asset – it is the ‘glue’ in many processes and organizations.


à the vision is to have an information infrastructure that is able to answer all kind of
questions.

A variety of information needs


• States of a product request (operational)
• Improvement of service (operational)
• Making of special offers (marketing)
• Do your customers like your services (sentiment analysis)
• Who are the most beneficial customer (customer management)
• To gain an overview of all products bought by one user (customer management)
• To identify trends and developments (business intelligence)
• To determine if decisions are just and fair (control and accountability)
• And many more

Who is responsible for maintaining the data?


What is the information quality?
When there are conflicts between sources: which source has the right information?
How can redundancy be avoided?
Response time and speed?
Who has the authority to remove low quality or hackers?

Chinese walls = not being allowed to collect data from one site to another
Always think about the system and information quality, it differs per context!
Opposing and complementary approaches
• The information systems solution
o Brings the information required to the persons using information systems
o Technology is used to solve the problem
o Fe: create multi-purpose portal
• Organizational redesign solution
o Requires the redesign of organization structures
o Decision-making is where the pertinent information is
o Fe: decision-making as much as possible in the front office
• “information management” and “knowledge management”
o Information management: focuses on the business processes and functions
that create, manipulate, and manage information
o Knowledge management: focuses on how organizational units interact and
how organizational units add to the store of information

Information architecture

• Is a blueprint describing the relationship between the business processes,


applications and information sources aimed at storing, processing, reusing and
distribution of information across information resources
• The organization of information to aid information sharing among actors
• The information architecture determines which information will be stored in which
database, application, software components and so on. It is a meta information
model
• Helps to navigate and find the right information. The art and science of structuring,
organizing and labeling information so people can find it
• High-level map of information requirements of an organization
Information stewardship principle

Information needs and roles


• Managers
• Customers
• Administrative staff
• ICT staff
• …

What to do with the information?


• Compile a registry
• Develop a “yellow page” (library)
• Construct a proto ontology
• Map flows, sequences, and dependencies among organizational units and business
processes
• Identify:
o Knowledge stewards
o Gatekeepers
o Isolated islands
o Narrow communication channels
o Improvement points

Examples of information architecture principles

Information store vs. Information flow approach


• ‘Need to know’ principle
• Data is driven by incoming data instead of queries
Information decoupling point
Where is the information being stored?
Lecture 7
Process architecture: from business processes management to webservice orchestration.
Business process management tools – BPM tools
Know difference between BPM and webservice orchestration… Depends on application and
technology.

• Understand complexity of determining services


• Understand relationship between processes and services
• Know the terms; workflow, WFM and BPM, BPMN, XPDL, BPEL and webservice
orchestration
• Know typical trade-offs
• Being able to make trade-offs given a case study

Process architecture provides an abstract overview of all processes, the business


units/departments/persons involved and their relationship with other processes, businesses,
information and application architecture.
• Abstract: No too detailed, detailed enough to determine the impact.
• A process: represents the sequence of activities performed to accomplish a certain
business function as a hierarchical representation of process steps, subprocesses,
activities, tasks and decisions.
• At least three types of processes:
o Product or primary process (“critical processes”)
o Supporting processes (finance, HRM, …) (secondary processes)
o Control and management processes (secondary processes)

Where to put to “needed information” (user, case, product?) in the overall process.

“True Business Process Management is an amalgam of traditional workflow and the ‘new’
BPM technology. It then follows that as BPM is a natural extension of – and not separate
technology to – workflow, BPM is in fact the merging of process technology covering 3
process categories: interactions between (i) people-to-people; (ii) systems-to-systems (iii)
systems-to-people – all from a process-centric perspective. This is what true BPM is all
about.” – Jon Pyke, CTO Staffware.

In combination with Lean Six Sigma – Who is in control à process owner and authority?
Basic idea of Workflow Management Systems (WFMS)
• Separation of processes, resources and applications
• Focus on the logistics of work processes, not on the contents of individual tasks
• Process perspective (tasks and the routing of cases)
• Resource perspective (workers, roles)
• Case/data perspective (process instances and their attributes)
• Operation/application perspective (forms, application integration)
• Control perspective (progress, tracking and tracing)

Separation of Control and Execution


Control layer = WFM, Application layer = execution

Users in BPM

WSDL Service presentation Functionality description


UDDI Service registration and publications Publication of service interfaces
(service level)
BPMN Service selection and composition Selection of service providers,
comparison, and customization
BPEL4WS Service execution Combining existing and new
services to execute a process

Is this a service architecture of a service library?

A display of all process that are being handled by several applications.


à consider Zachman framework; No time axes, no (little) relation between cells. As is to to
be situation not shown. Therefore, more a library / overview of the services, not an
architecture.
Web Service Orchestration (WSO) defines the control and data flow between web services
to achieve a business process. Orchestration defines an “executable process” or the rules for
a business process flow defined in an XML document which can be given to a business
process engine to “orchestrate” the process, from the viewpoint of one participant. (Carol
McDonald, SUN)

Business Process Execution Language for Web Service (BPEL4WS):

XML Process Definition Language (XPDL):


• A XML-based language to interchange business process definitions between different
BPM products
• Standardized and maintained by the Workflow Management Coalition (WfMC)
• XPDL defines an XML schema for specifying the declarative part of workflow /
business process
• XPDL is often used for the exchange of BPMN diagrams
• XPDL contains elements to hold graphical information and executable aspects,
whereas BPEL focuses exclusively on the executable aspects of the process (no
graphical aspect)

Business Process Modeling Notation (BPMN)


• is a standardized graphical notation for drawing business processes in a workflow.
• Maintained by the Object Management Group (OMG)
• Enables communication between business and ICT
• BPMN is constrained to support only the concepts of modeling that are applicable to
business processes
• Four basic element categories
o Flow objects (events, activities, gateways)
o Connecting objects (sequence flow, message flow, association)
o Swimlanes (Pool, lane)
o Artifacts (data objects, group, annotation)
• Can be executed in BPEL
www.bpmn.org

Who is the orchestrator (the one in charge)? à can be more in one process!

The one that is in charge, is also responsible. Who manages the flow? To whom?
Granularity of services:

Transparency of underlying processes: White-box / Grey-box / Black-box

Data and information flow

Balancing central and decentral management


Which roles should be executed central of decentral?
Central IT: Facilitating reuse and sharing of business processes, support making of
agreements, overseeing funding and investments, initiating service portfolios, change
management initiation, mandate standards and processes.

Business units: business case, prioritization, service levels, change board, defining standards,
services, processes, policies.

Decision table / business rules are more applicable when there are many ‘if-then’
statements. It is better than a flow diagram because it will be simpler.

Design Guidelines – Orchestration (SAME AS ‘rules modular design)


• Information should be captured only once at the source and reused by other
modules (coordination)
• There should be a (central) process control component integrating business process
steps with functionality provided by modules
• The module should, whenever possible, be offered as reliable and proven
commercial-off-the-shelf (COTS) software products supplied by a vendor
• Be able to manage the quality of modules (QoS, performance, security, ..)
• A module should be reusable and capture a business function
• Use of versioning (extensibility, multiple instances)
• Develop domain-specific modules (use of namespaces)
E2: Introduction to Data Management, Data Privacy & Data Security
Nick Martijn, CRANIUM

Data management is the practice of organizing and maintain data processes to meet
ongoing information lifecycle needs.
Data privacy focusses on the identification and mitigation of the risk of non-compliance with
the requirements of the GDPR in Europe, ensuring proper handling of personal identifiable
information.
Data security focuses on the implementation of the technical measures of the GDPR and all
other activities necessary to safeguard confidentiality, integrity, and availability of
information.

1. CRANIUM
Belgium based company, mainly focusing on data management in relation to data privacy
and data security (consultancy). Active in projects in 12 countries world-wide. KPMG did
many assessments, gather situation and bring out advice. CRANIUM is more involved in the
implementation of the advice (data management etc).

Topics: GDPR Compliance, Information security, Data strategy, Test data management, Data
governance, Big data and advanced analytics, Privacy and security retainer, data retention
and deletion, internet of things. à data-driven enterprise.

Approach: First aid, support, control, improve.

2. Enterprise Data
Complexity due to organization.
What data the company has? What external sources to data a company has? How they need
to manage and secure and protect that data?

Consider the (main) data flow.


4 axis: Data / People & governance / Processes / IT
Many linked together systems (APIs), much duplicated data, extremely hard to get an
overview.
Data management is more than just buying extra applications!! Must consider the human
aspect as well, since they interact with the processes and IT. The consult is often more a
‘training and educating people’ than ‘fixing the environment’.

Structured vs unstructured
Internal vs external
Static vs non-static

The Case for data


Compliance: Comply to legal requirements in order to maintain ‘license to operate’, Take
care of personal data, make sure there is only one version of the truth (no duplicates).
à GDPR, Regulatory reporting, financial statements, other data quality regulations, risk
management, other data quality regulations, BCBS239 BASEL lll Solvency ll. (CRO / CFO)
Efficiency: Accurate and timely management information, Less maintenance costs, Sharing
data and knowledge throughout the enterprise.
à Improved process efficiency, improved decision making, decreased running costs (OpEx),
Data-driven enterprise, knowledge management, management information, decreased
capital costs (CapEx), Business-IT alignment (through data). (CIO / COO)

Growth: Innovative revenue models, improved insights in the requirements of the customer.
à Artificial intelligence, big data/ data lakes, data driven business models, advanced data
analytics, internet of things. (CEO)

CDO should be added within companies? Chief Data Officer.

3. Data Management
Two parts: Data Life Cycle & Data Management.

To define the maturity of the DMM within a company.

Data governance who is the owner of the data? How do the stakeholders within a company
work with the same data? à everyone is using the data, nobody feels responsible.
Rules and roles need to be established to safeguard the privacy and security of the system.

Check: Data Lineage

4. Data Privacy
Start with Data Minimization.
Main concept: Principles, Legal ground, Subject rights, Processing, Data breaches, DPO.
à Principles: transparency, accuracy, purpose limitation, data minimization, integrity and
confidentiality.

1. Collect only what you need


2. Do not make useless copies
3. Safeguard the quality of data
4. Discard data when obsolete

Privacy by default (and design) is the new advancement that is being considered.
Full back ups vs Incremental back-ups

5. Data Security
How to get a grip on the risk that a company has.
Who can access what data?

Check: DMB model

Block / disable hardware or by Training people.


ISMS (based on ISO27x) = Need to have everything in place = Information Securty
Management Systems

6. Client Cases (eg)


Main Challenges Solutions
Compliance to GDRP Information Security Management Systems
(ISMS)
Compliance to health & safety guidelines Data privacy officer role
(regarding fe the use of alcohol and drugs)
Data security concerns Data privacy impact assessment
Renewed Code of Conduct
Privacy policies and measures
Lecture 9 (!!)
Enterprise Application Integration (EAI) and Middleware

• Learn types of EAI approaches


• Learn various classifications of middleware technology
• Learn and understand characteristics middleware technologies:
o RPC, MOM, Transaction monitors, brokers, database, Distributed objects
• Understand challenges of and solutions for distributed transactions (!)
• Be able to select integration approaches and middleware technology

Data integrity
• Refers to that the information stored in a system corresponds to what is being
represented in reality.
• Refers to aspects like consistency, security, reliability, timeliness, non-repudiation,
non-manipulation, that need to be warranted
• CIA Triade – which are conditions for information security :
o Confidentiality – Data should only be accessed by authorized persons
o Integrity – ensures that data is accurate and consistent. In other words, data
stored in a system should correspond to what is being presented in reality.
o Availability – authorization to make data available at the right time to fulfill a
need
• Ensuring data integrity requires data management/governance and middleware
Almost all data and system quality dimensions
Batch oriented systems are often not available all the time (maintenance, back-ups, update)

Objects can be programmed and interact with each other.


Database replicators = copy of a database (for faster response time)
Batch data extraction = not continuous
Legacy applications = a software program that is outdated or obsolete. Although a legacy
app can still work, it may be unstable because of compatibility issues with current OSes.
Wrappers = are used in front of a legacy system, they provide access to the system.

Application integration approaches


• Information-oriented
o Data replication
o Data federation
o Interface processing
o Semantic integration
• Service-oriented
• Portal-oriented
• Business process-oriented
Often mixed approaches are being used.
The point of departure (business, information, databases, applications) often influence the
outcomes.
The ‘no green field’ (=path dependencies) might block certain approaches.

Information-oriented integration
1. Interface processing: App A à API à App B
2. Data replication: Datab A ß à Datab B
a. Very easy to replicate data; minimize risks and increase speed
3. Data federation: Virtual Datab 1 ß àààà Datab A, Datab B, Datab C, Datab D
a. Often older databases, virtual datab is set up so that it feels like ‘one’ datab
4. Semantic integration: A network of connected nodes
a. Bottom-up approach (No standardization…)

Service-oriented integration
• Using web-services
• Reusability
• Need for transaction support (if one of the Apps is not working!! Causes
inconsistencies)
• Limited view on Service-oriented Architectures (SOA)
à A composite application is made that consists of App A, App B, App C for example.

Portal-oriented integration (Externalizing information)


• Web-services
• Common interface
• Heterogeneous content
• Externalizing information
• User integrates
Human à Web browser à Portal Server à Internet, Datab, App, Office
Very easy to use, not always considered to be a real Application Integration because ‘the
human’ is doing the integration job. Important to have a common user interface and have
heterogenous content on the back.

Business Process-oriented integration


Example of the webservice orchestration
• Process and service oriented
• Control information (business process and the data)
• Combining middleware technology and business process automation (BPM)
• The future of EAI (according to some)
Highly intuitive form

Middleware classification
• Interaction patterns (1-1, 1-n, n-n)
• Synchronous or asynchronous
o Directly (in real time = the internet) or not directly (example = email)
• Connection-oriented or connectionless
• Language specific or independent
• Proprietary or standard-based
o Dependent or Object-management-group standards
• Embedded vs. Enterprise
o Embedded = “hidden”, is more and more applicable (all IoT devices) – Lack of
security and lack of processing power often
Vree’s model on middleware

Asymmetric very important for Personal Data – not all data in the same place

Interaction patterns (directed)

Conversational mode is the ideal situation – a conversation can happen

Synchronous vs Asynchronous communincation

For synchronous: immediate reaction is expected (seconds), not doing anything else till a
response is received, error is seen immediately. (Calling)
Asynchronous: Emailing.
Middleware language
Standard (windows)
Interoperability Proprietary (stock market)
Easy replacement Cover areas standards do not address (yet)
Economies of scale resulting in low-costs Differentiate from competitors
Longer life-cycle Customer lock-in
Long-lasting standardization processes Support can be part of buying process
Support not dependent on one vendor

Levels of middleware (main focus is application)


• Application
• Domain-specific middleware services
• Common middleware devices
• Distribution middleware
• Host infrastructure middleware
• Hardware devices
Everything that is used to support this lecture reaching the listener

Approaches to integration

Most preferred to do it on the data level, yet most difficult. The internet for example is only
on UI integrated.

Data level integration


• Typically, relatively easy approach
• Extract data directly from databases
• Most applications make it possible to circumvent their business logic and access data
directly
• Transform data
• Frequency:
o Scheduled
o Instantaneous
o Triggers
Application level integration
• Method level: distributed computing
• Often integration based on accessing APIs
• API exposes application service to outside world
• API functionality dictates how an application can be accessed
o Business process
o Low level services
o Data
• Very wide variety of levels of services and quality of APIs, some are extremely
complex
• Wrappers: provide an interface based on some standard, e.g. Corba, Java, .net,
webservices
• Wrappers expose business services as methods in an interface
• Wrappers requires effort to build, test, and maintain (disadvantage)

User-interface integration
• Sometimes only way an application logic can be called
• Screen scraping (green screen)
o Application thinks it is interacting with users
• Primitive but often very necessary
o Not always efficient navigation
o Not generally scalable
o Most follow interface format, extract results and ignore formatting returned
from applications
o High maintenance costs
• Extremely relevant for internet (HTML)

7 types of middleware
• Remote Procedure Call (RPC)
o Client-server interaction that makes it possible for the functionality of an
application to be distributed across multiple platforms. Local program
requests a service from a program located on a remote computer, without
having network details. Used for synchronous data transfers, where client and
server need to be online for the communication.
• Message Oriented Middleware (MOM)
o Becomes less complicated to use application spread over various platforms.
Enables the transmission of messages across distributed applications. Also has
a queuing mechanism the allows the interaction between the server and the
client to happen asynchronously. (Overlap message brokers)
• Message brokers
o Communication by using queues supporting asynchronous and synchronous
message passing. Validity check on data structures and completeness.
Database for supporting publish-subscribe models. (is the central / receiving
system of messages)
• Database middleware
o Between the databases and the applications. Call-level interface (CLI)
between databases (drivers) and applications.
• Transaction Processing (TP)
o Two major types: TP Monitors and Application servers. Transaction; unit of
work consisting of a number of interactions with a beginning and an end.
Generally: tightly coupled, method sharing, need to change source and
destination IS for transactions. à strict monitoring. Two phase; prepare and
commit.
• Application service (wrappers)
o TM becomes applications server by incorporating application logic, typically
web-enabled, more and more functionality of message brokers included:
messaging, transformation, intelligent routing.
• Distributed objects (very complicated)
o Middleware or application development? Creating distributed applications,
for cross-enterprise method sharing. E.g. Corba and DCOM. Client = stub,
server = skel.

(object) Transaction Monitors (OTM) (USED TO BACK-UP FOR INFORMATION/TRANSACTION


LOSSES)
• Ensure that sequence of actions is committed or rolled back
• Incorporate application logic encapsulated in a transaction
• Often use of persistent message queues
• Creates overhead
• ACID requirements
o A: Atomic – number of tasks and interactions are executed in its entirely
o C: Consistent – state of all applications is similar
o I: Isolated – Other application only transact with the TM as they were alone
o D: Durable – Redoing and undoing of changes data is not lost

OTM vs. Message brokers


• OTM
o Synchronos communication
o CORBA, DCOM, RMI, …
• Message brokers
o Asynchronous communication
o Messages XML, SOAP
o Queuing
• Combinations
o COM+ = COM, MTS (Microsoft Transaction Server) en MSMQ (Microsoft
Message Queue)
Lecture 10 – Concurrency and transactions in distributed architectures
Difficult topic need for understanding of the concepts.

• Understand the potential use of distributed ledger technology (blockchain), smart


contracts, zero proof of knowledge
• Be able to explain the problems and the need for transactions mechanisms for
distributed application architecture (due to the loss and manipulation of information)
• Be able to explain the working of concurrency (do things in parallel) and locking
mechanisms, 2PL, and deadlock control and blockchain
• Be able to explain transactions concepts, 2PC, ACID properties, transaction control,
transaction monitors, roll back (undo everything what you’ve done) and
compensation

Examples of Distributed architectures:


Insurance company (having data stored at multiple locations)
Multiple servers (S1 waits for S2 to process and proceed, S1 holds in between à deadlock)
Data replication (Users and data are distributed, how to process change?)

Transaction = a series of actions, carried out by a user or application, which accesses or


changes contents of database or other system.
Distributed systems = having everything decentralized, which makes is more secure and
faster. But also, more difficult. Book a flight, hotel and rent a car à Get insurance policy
information which is distributed over a number of (legacy) systems.
Databases = logical unit of work on the database à transforms data from one consistent
state to another, although consistency may be violated during transaction

Transaction can result in:


• Success – transaction commits and database reaches a new consistent state
• Failure – transaction aborts, and database must be restored to consistent state
before it started à is rolled back or undone
• Committed transactions cannot be aborted
• Aborted transaction that is rolled back can be restarted later

Properties (ACID):
• A: Atomicity = ‘All or nothing’ property
• C: Consistency = Must transform database from one consistent state to another
• I: Isolation = Partial effects of incomplete transactions should not be visible to other
transactions
• D: Durable = effects of a committed transaction are permanent and must not be lost
because of later failure

Clearing transactions
A centralized ledger tracks asset movement within the financial system between institutions.
Users have a key to make transactions, which contain a timestamp. Ledger is for storing all
transactions and creates a list. Ledgers are maintained by banks or intermediary and need to
be secured. The key issue is how to secure the ledgers and not being able to manipulate it.
The solution: Distributed ledger technology (= Block chain)
• Distributed autonomous ledger
o Timestamped blocks that hold batches of valid transactions
o Each block includes the hash of the prior block
o The linked blocks form a chain
o The blocks are distributed and synchronized (=distributed ledger)
o Creating new blocks is known as mining
• Integrity is created by distributed consent (majority voting)
• Every node in a decentralized system has a copy of the block chain
• Longest chain represents the truth
Opinion: Not suitable for personal information because the information is open and can be
accessed by everyone.
Problem: The blockchain gets longer and longer, more processing power is needed.
Therefore, sometimes an older part of the blockchain is stored somewhere. ‘Tangels’ can also
be an option.

Old: 10 people à 1 Database


New: 10 people à X number of Ledgers (=distributed ledgers, many nodes: data integrity)

Possibilities of blockchain:
• Enabling tokenization
• Time proof sealing
• Data record immutability
• Viewing history
• Automatic execution of transactions (smart contracts)
• Various blockchain infrastructures have different properties

Types of blockchain infrastructure (Think of who has access?) (!!)


Try to minimize what you store inside the blockchain.

Evaluation of the blockchain technology

Smart contracts describing inputs needed resulting in actions.


• Self-executing contracts with the terms of agreement between buyer and seller being
directly written into lines of code
• Code and agreements are contained and stored in a distributed ledger
• The code controls the execution, the transactions are trackable and irreversible
• No central trusted third party (TTP) needed

Zero-knowledge and blockchain


• Need: Validate cryptocurrency transactions managed on a blockchain and combat
fraud without revealing data about which wallet a payment came from, where it was
sent, or how much currency changed hands?
• Why? Protection of personal data related to the identity of individuals (date of birth,
bank statements, transaction histories, education credentials)
• Zero-knowledge proof is a method by which one party (the prover) can prove to
another party (the verifier) that they know a value x
o Without disclosing any information apart from the fact that they know value x
o Statement being proved must include the assertion that the prover has such
knowledge to avoid fraud
• A zero-knowledge proof of knowledge is a special case when the statement consists
only of the fact that the prover possesses the secret information.

Properties:
Soundness – Everything that is provable is true, no cheating
Completeness – Everything that is true has a proof
Zero-knowledge – only the statement being proven is revealed
à eg; client password when login on a system

The collection of technologies contributes to benefits; not only blockchain or only smart
contracts.
Governance OF blockchain (programmers) // Governance BY blockchain (zero-knowledge)

Concurrency Control
• Process of managing simultaneous operations on systems without having them
interfere with one another
• Prevents interference when two or more users are accessing database or shared
object simultaneously and at least one is updating data
• Although two transactions may be correct in themselves, interleaving of operations
may produce an incorrect result
• Potential problems:
o Lost update problem
§ Successfully completed update is overridden
o Uncommitted dependency problem
§ Occurs when one transaction can see intermediate results of another
transaction before it has committed (might be rolled back)
o Inconsistent analysis problem
§ Occurs when transaction reads several values but second transaction
updates some of them during execution of first transaction

Check the read(x) and write(x) !! these are important

Solutions to the problem:


Serializability: Objective of a concurrency control protocol is to schedule transactions in such
a way as to avoid any interference.
• Could run transactions serially, but this limits degree of concurrency of parallelism in
system (speed!!)
• Serializability identifies those executions of transactions guaranteed to ensure
consistency:
o Schedule – Sequence of reads/writes by a set of concurrent transactions
o Serial Schedule – schedule where operations of each transaction are executed
consecutively without any interleaved operations from other transactions
• No guarantee that results of all serial executions of a given set of transactions will be
identical

Precedence graph to show ‘who waits for who’.


If it is a cycle, it is not serializable.

Distributed Transaction Management


• Divided into a number of sub-transactions, one for each site that has to be accesses
represented by an agent
• Systems must ensure indivisibility of each sub-transaction:
• Synchronization of sub-transactions with other local transactions executing
concurrently at a site
• Synchronizing of sub-transactions with global transaction running simultaneously
at same or different sites
Phase Commit (2PC) Protocols
• Governs whether a transaction is to be aborted or carried
• Can be sued for nested transactions
• Two phases: Voting phase and Decision phase
• Coordinator asks all participants whether they are prepared to commit transaction:
o If one participant votes abort, or fails to respond within a timeout period,
coordinator instructs all participants to abort transaction (veto)
o If all vote commit, coordinator instructs all participants to commit
• All participants must adopt global decision

Two-phase Commit (2PC)


If participant votes abort, free to abort transaction immediately (on your own)
In bitcoin, it is ‘majority vote’ since it will happen more often that one node does not respond
and thereby aborts the whole transaction

Summary Concurrency

Locking can be used to deny access to other transactions and so prevent incorrect updates
• Most widely approach to ensure serializability
• Generally, a transaction must claim a shared (read) or exclusive (write) lock on data
item before read or write
• Lock prevents another transaction from modifying item or even reading it, in case of
write lock
Locking rules:
1. If transaction has shared lock on item, can read but not update item
2. If transaction has exclusive lock on item, can both read and update item
3. Reads cannot conflict, so more than one transaction can hold shared locks
simultaneously on same item
4. Exclusive lock gives transaction exclusive access to that item
5. Some systems allow transaction to upgrade read lock to an exclusive lock, or
downgrade exclusive lock to a shared lock

à two-phase locking (2PL)


• Transaction follows 2PL protocol if all locking operations precede first unlock
operation in the transaction
• Two phases for transaction:
o Growing phase – acquires all locks but cannot release any locks
o Shrinking phase – releases locks but cannot acquire any new locks

2PL to prevent Lost Update Problem


2PL to prevent Uncommitted Dependency Problem
2Pl to prevent Inconsistent Analysis Problem

• Deadlock is an impasse that may result when two (or more) transactions are each
other’s waiting locks held by the other to be released
• Deadlocks should be transparent to the user, so DBMS should restart transactions
• Three general techniques for handling deadlock:
o Timeouts (monitoring)
§ abort one (or both) transaction(s), commonly used
§ Disadvantage: A lock may be aborted without a deadlock & penalizing
long-running transactions
o Deadlock prevention
§ Looks ahead to see if transaction would cause deadlock and never
allows deadlock to occur
• Wait-Die- only an older transaction can wait for younger one,
otherwise transaction is aborted (dies) and restarted with
same timestamp
• Wound-wait- only a younger transaction can wait for an older
one. If older transaction requests lock held by younger one,
younger one is aborted (wounded)
o Deadlock detection and recovery
§ Monitor deadlocks to occur and breaks it
§ Monitor constructs of wait-for graph (WFG) showing transaction
dependencies
• Create a node for each transaction
• Create edge T1 à T2, if T1 waiting to lock item locked by T2
§ Deadlock exists if and only if WFG contains cycle
§ WFG is created at regular intervals

Timestamping is transactions ordered globally so that older transactions, transactions with


smaller timestamps, get priority in the event or conflict
à Conflict is resolved by rolling back and restarting transaction
No locks so no deadlock

Four main locking strategies:


1. Dirty read: where apps may read data, which has been updated but not yet
committed to a database
2. Committed read: where apps may not read dirty data
3. Cursor stability: where a row being read by T1 is not allowed to be changed by T2
4. Repeatable read: All data items are locked until a transaction reaches a commit point
Lecture 11 – Data architecture & governance in practice in a multi-actor domain
Guest lecture by Verdonck, Klooster & Associates (VKA)

• Data > information > knowledge (differences)


• Data fundament = data management
o Data architecture patterns
o BI architectures
• Data exchange and sharing
• Artificial Intelligence and data science
• Our experience in practice DO’s and DON’T’s
Architecting as a way of structuring reality for others.

The concept of data-information-knowledge (Boisot, 2004) (Data >Information> knowledge)

Data is subjective since it has been perceived. The same ‘outside world’ data can be
perceived differently. How does this affect AI? The same…

Data fundament = Data management


Data, Management, Body, Of, Knowledge
à a practitioner’s guide (DMBOK) is a knowledge base and 10 data management areas
model:
Data governance is not the same as data management.
Data governance is at the heart of the DMBOK model, focuses on the question: Who to do
with data? And How to organize it? Exercise of authority and control of data assets.
Ownership, governance of everyday business, strategy of the organization for data.
Data management focuses on the implication of data governance on fe the architecture.

Data governance
1. Data is an asset and production factor
2. Data has an owner and a steward (accountable+responsible)
3. Data has a lifecycle (metadata model)
4. Data governance is a responsibility of the CIO/CDO
5. Data is documented in a data dictionary
6. Data access is through authorization (need to know) OR data access is open (need to
share)
7. Master data is only altered at the source
8. Data is validated on CREATE and UPDATE
9. Data addition/ enrichment leaves master data intact
10. Data is “Open by design”

Breaking down Data Silos starts with Governance.


Silos = collection of data that is isolated from other parts of the organization. à prevent free
flow of data within an organization.
The ownership is often fragmented.

Employees
• Chief Data Officer (CDO): Oversees a range of data-related function to ensure your
organization is getting the most from what could be its most valuable asset.
• Data Stewards: are accountable for the day-to-day management of data.
• Business Analyst / Data Translators: play a critical role in bridging the technical
expertise of data engineers and data scientists with the ‘business’.
• Data scientists: are analytical data experts who have the technical skills to solve
complex problems.
• Data architects: conceptualize an visualize data frameworks; Data engineers build
and maintain them.

Types of data
Structured:
• Master data: Company’s own data such as client’s info, personal info
• Reference data: common data such as NLD = Netherlands, from an external silo
• Transaction data: Reflection of a transaction
• reporting data: aggregated data, can be refined from Transaction data
Unstructured: (become increasingly important)
• Documents
• Media
• Photographs
Technology changes rapidly, but data is relatively stable. Master data forms the collective
memory of an organization. Focus shifts from internal à external data.
Master data Management
A wheel within DMBOK scheme
• External codes
• Internal codes
• Customer data
• Product data
• Dimension Mgmt
Data dictionary
Business object model

Data models
1. Semantical (dictionary) - What is the meaning of a certain word/ object?
2. Conceptual (Business objects) – Show relationship between products and actors fe
3. Logical (object relations and attributes) – Customer has ID, address, account, IBAN
4. Technical (database design) – How can we put all these things in a database?
4th layer (technical) is the only difference per design à there you choose which database to
use (Oracle for example).
Data architecture patterns
How to create value from data (data in itself is ‘dumb’)
Is also in the DMBOK scheme

3 patterns:
• Business intelligence (BI) – How you organize it into a systematic architecture
• Data science – analytics
• Data sharing – between organizations or departments

BI Architecture

Crisp-DM model
• Cross-Industry Standard Process for Data Mining
• Widely used and standard for DS projects, Finished: no longer maintained
• Non-proprietary
• Application/ Industry neutral
• Tool neutral
• Focus on Business issues
o As well as technical analysis
• Framework for guidance
• Experience base
o Templates for analysis
• Focus on Continues Evaluation &

CRISP-DM
Damhof Model: 1&2 Combined

Hadoop for Data Analytics and Use: (a sort of data lake)


Data discovery:
• Keep data warehouse for operational BI and analytics
• Allow data scientists to gain new discoveries on raw data (no format or structure)
• Operationalize discoveries back into the warehouse

Data Exchange and Sharing


• Canonical model (predefined dictionary)
o Predictable data to exchange, closed
o Partners are known
o Design paradigm (design before actually knowing it)
• Linked data (RDF, semantic web)
o Flexible exchange, define local data context, open
o Partners/ users unknown beforehand
o Organic development paradigm (add new parts of the data at the moment)

Service delivery in chains

Artificial Intelligence and data science


• Artificial Intelligence = A system that is capable of coming up with a solution to a
problem on its own
• Machine learning = Programming computers to optimize a performance criterion
using example data or past experience
• Data science = data science combines multiple field including statistics, scientific
methods, and data analysis to extract value from data

AI and algorithms
NLP = Natural language processing

Three types of algorithms:


1. Classification
2. Regression
3. Clustering

Examples are named.


Predictive policing
Predictive maintenance
Do’s & Don’t
Having one truth is very difficult
Why data quality is important

Don’t:
Start BIG (Think BIG, but act small)
Think you know what the business wants à analyze!
Look at the data without context to see if the correlation makes sense
Forget to assess quality; garbage in = garbage out

Thumb rules:
Always start with a clear business question
Know and engage the business domain
Semantics count! Understand the data!
Solve something quick, harvest small success
Remember the goal of the operation
Shared data must be used according to GDPR

Lecture 12
Project presentations
Lecture 14
Separate file on Brightspace with more Example Exam Questions.

Q1: What are characteristics of blockchain technology?


• Ledger for storing transactions
• Users have a key to make transactions
• Timestamped blocks that hold batches of valid transactions
• Each block includes the hash of the prior block
• Every node in a decentralized system has a copy of the block chain
• Longest chain represents the truth

Figure of Orlikowoski (1992) ; there is always an influence of the technology on the


governance. à Different meanings are given to the same technology. (=duality of
technology)
Extra information from the internet
GDPR
General Data Protection Regulation protects the consumer for the collection of data by
companies and governments. The collectors need to prove that you gave permission to
collect the data and proof that they handle this carefully and delete after a certain time.
Fines can be up to 4% of their annual revenue. Right of access and right to be forgotten are
applicable.

Remarks: Can be a burden to business while also still too vague to be applicable.

Web service vs. API


Both serve as a mean of communication. Yet, a web service facilitates interaction between
two machines over a network, while an API serves as an interface between two different
applications so that they can communicate with each other.

Web services can send requests in the form of JSON, XML, an HTML file, images, Audio, etc.

API: Not always Need for Network. (online & offline)


Web Service: Always Need for Network.

All APIs are not web services à All web services are API.

API: Light weight architecture and good for devices which have limited bandwidth.
Web service: No lightweight architecture. Require SOAP protocol.

API: Any style of communication.


Web Service: only three styles; SOAP, REST, XML-RPC.

API functions:
1. Access to data
2. Hide complexity
3. Extend functionality
4. Security (gatekeepers)

Middleware
“Software-glue”. Between operating system and user application for example.
Database – JDBC – Java application.

Meta information
Meta information is information about information. For example, if a document is
considered to be information, its title, location, and subject are examples of meta
information. This term is sometimes used interchangeably with the term metadata.

BPEL4WS
Is a standard executable language for specifying actions within business processes with web
services. Processes in BPEL export and import information by using web service interfaces
exclusively.
Data Lineage
Refers to the traceable path for specific critical data element (CDE) from end user report
upstream to the ultimate source (that path includes aggregated sources such as data
warehouse and data marts, operational data stores, staging areas, and transactional
system).

Data Lake
A system or repository of data stored in its natural/raw format. Usually a single store of data
including raw copies of source system data, sensor data, social data, and transformed data
used for tasks such as reporting, visualization, advanced analytics and machine learning. Can
include structured, semi-structured, unstructured, and binary data. Can be ‘on premised’ or
‘in the cloud’.

A “data swamp” is a deteriorated and unmanaged data lake that is either inaccessible to its
intended users or is providing little value.

Code of Conduct
Is a set of rules outlining the norms, rules, and responsibilities of proper practices of an
individual party or an organization.

Wrapper (as middleware)


Batch extraction vs continuous extraction
Clearing transactions with a Ledger system
TP monitors
A control program that monitors the transfer of data between multiple local and remote
terminals to ensure that the transaction processes completely or, if an error occurs, to take
appropriate actions.

Distributed Ledger Technologies (DLT)

Machine learning (Classification, Regression, Clustering)


Supervised: input and output, labels are given by user.
Unsupervised: Input, no attached labels (training set/ real set).
Reinforcement learning: Feedback with reward.

Data marts
Subject oriented data assets
Example Exam Questions
Q1: What is the best definition of enterprise IT-architecture according to Ross (2003)?
A) Policies for using IT in the organization
B) The organizing logic for data, applications, information and business processes
C) A destination plan for the IT-landscape
D) A description of the relationship between business and IT

Q2: Which of the following is the reason why a layered approach is preferred in architecting?
a) Each layer can be used to represent similar types of entities
b) Layers create greater complexity and scope
c) Each layer can be designed dependent on each other
d) Layers avoid different views and objectives
e) None of above

Q3: What are the four types of communication presented on the figure below? (from top to

bottom)
A- User interface, B- Application method level, C- Application Interface Level and D- Data
Level
B- Software and Operational Systems, B- System applications, C- Automated Data Collection
and D- Databases exchange.
C- User interface applications, B- Logical Operational Systems, C- Application Integration
Level and D- Data Exchange Level.
D -None of them.

Q4: Which of the following answer(s) is (are) NOT TRUE about the differences between
architecting and engineering? (more than one answer can be correct)
a) Architecting takes place in ill-structured situation, meanwhile engineering takes place in
better defined environment
b) Engineering serves the client, whereas architecting serves the builder
c) Heuristics/synthesis is mostly used in architecting, whereas engineering uses equations
and analysis
D) Architecting focuses on components, whereas engineering focusses on misfits interfaces
Q5: What are characteristics of block chain technology? (more than one answer can be
correct)
a) Ledger for storing transactions
b) Users have a (public/private) key to make transactions
c) Timestamped blocks that hold batches of valid transactions
d) Each block includes the hash of the prior block
e) Every node in a decentralized system has a copy of the block chain
f) Longest chain represents the truth
g) None of them

Q6: What characteristics belong to information stewardship? (more than one answer can be
correct)
a) Third parties can make changes
B) Third parties report changes and mistakes to the information stewardship
c) All (third) parties should reuse information from the steward
d) Third parties have the obligation to keep data actual
e) Information stewards have the obligation to keep data actual
f) None of them

Q7: What characteristics belong to Enterprise Data Management (EDM)?


a) data management strategy
b) data governance
c) data quality
d) platform & architecture
e) data maintenance
f) supporting process

Q8: Which of the following statements is (are) correct? (Governance)


a. Governance can deal with business and IT alignment
b. Governance can deal with translating Strategy into implementation
c. Architecture use should be governed
d. Architecture development should be governed
Web services Beginner tutorial
4 YouTube videos

Introduction – What is a Web Service


What is a web service?
• Service available over the web.
• Determine the criticality of the webservice.
• Enables communication between applications over the web.
• Provides a standard protocol/format for communication.
• Platform independent communication.
• Using web services, two different applications (implementation) can talk to each
other and exchange data/ information.

In example; In a restaurant, the waiter is the means of communication between ‘you’ and
‘the kitchen’. The waiter is the web service / API. He is communicating between two
applications and making sure the communication is successful.

How web services work (overview)


Client -- Requests à Server
ß Response –

Enables communication between applications over the web


Applications written in different languages, using different databases.

Server Service Provider: develops and implements the application (web service) and makes it
available over the web (internet). There should be a client (service consumer).

Needs for communication in webservices:


Medium – HTTP/ Internet
Format – XML/ JSON

Two main types of webservices:


1. Simple Object Access Protocol (SOAP)
a. Medium: HTTP (Post)
b. Format: XML
2. Representational State Transfer (REST)
a. Medium: HTTP (Post, Get, Put, Delete)
b. Format: XML/ JSON/ TEXT..
REST is more flexible than SOAP.

What is WSDL and UDDI


Components of a web services

Consumer / Client needs to know:


What are the services available?
What are the request and response parameters?
How to call the web service?
Structure & description of the webservice.

Web Service Description Language (WSDL)


Is an interface that the Service Provider publishes that describes all attributes and
functionalities of the web service. It is XML based so it can be easily requested.

When the Service Provider and Service Consumer, do not know each other. How to share the
WSDL? >> “A web Service Provider publishes his web service (through WSDL) on an online
directory from where consumers can query and search the web services. This online
registry/directory is called Universal Description, Discovery and Integration (UDDI).

What are SOAP web service?


A web service that complies to the SOAP web services specifications is a SOAP web service.
• Defined by W3C (World Wide Web Consortium) – An international community that
develops open standards for the world wide web.
• Service specifications:
o Basic
§ SOAP
§ WSDL
§ UDDI
o Extended
§ WS-security
§ WS-policy
§ WS-I
§ …
It is a protocol/ set of rules/ definitions on how 2 application will talk to each other over the
web.
A SOAP message consists of: an Envelope (the root element), a Header, and a Body.

What are REST web service?


A web service that communicates / exchanges information between 2 applications using
REST architecture/ principles is called a RESTful web service.

Representation State Transfer (REST)


• Unlike SOAP (which is a protocol), REST in an architectural style.
• There is no central body determining the standards; REST defines a set of principles
to be followed while designing a service for communication / data exchange between
2 applications. When the principles are applied >> RESTful Web Service.

Constraints / Principles:
• Uniform interface
o Resource (nouns): everything is a resource (all modules/ databases etc are
available as resource when defined)
o URI: any resource/data can be accessed by a URI (=URL)
o HTTP (verbs): make explicit use of HTTP methods (‘CRUD’ = Get, Delete, Post,
Put)
• Stateless
o All client-server communications are stateless (Server = stateless, request
from client must contain all of the necessary data to handle the request)
(Improves the web service performance)
• Cacheable
o Happens at client side (Cache-control and Last-modified, What information
should be saved?)
• Layered System
o Layers can exist between server and client (proxies / gateways)
• Code on Demand (optional)
o Ability to download and execute code on client side

“The key abstraction of information in REST is a resource. Any information that can be
named can be a resource: a document or image and so on… “ – Roy Fielding

Representation = description of the current state of the resource

Authorization vs Authentication
Authentication = Who you are.
Authorization = What authority you have.

Recap
Protocol can be considered as a law suit or common agreement between two or more
parties (components) used for communication with each other. Most of the times protocol
includes the steps and/or procedures that should used when communicating with each
other.
API allows and defines how two applications can communicate with each other by using the
methodologies defined by the service providing application. Compared to a protocol, API
describes the programmatic ways to communicate in-between applications. Service calling
application must properly adhere to the standards in order to get the required service.
Web services these are very similar to APIs. Notable thing with the web services is,
developing a web service expects users to access it over the internet. Therefore web service
can be considered as an online API.
Middleware allows to communicate with distributed application components located in
several computers (Simply links the components located in various machines in order to get
the full application capabilities). Middleware minimizes the developing effort by overcoming
heterogeneous factors (OS, hardware, network equipment etc.). Middleware locates in
between application (application components) and the OS.
Cloud Computing Architecture
Why cloud computing?
Previous situation:
• On-premise is expensive
• Less scalability
• Allot huge space for servers
• Less chance of data recovery
• Long deployment times
• Lack of flexibility
• Poor data security
• Less collaboration
• Data cannot be accessed remotely

With cloud computing:


• No server space required
• No experts required for hardware and software maintenance
• Better data security
• Disaster recovery
• Ease of deployment
• Cost-effective (pay as you go)
• Collaboration is efficient
• Management of services is easy

What is cloud computing?


“the delivery of on-demand resources (such as server, database, software, etc) over the
internet.”
Cloud providers – Companies offering the cloud (AWS, Azure, Google Cloud)
Cloud Computing Service provider – are the vendors that provide services to manage
applications through a global network.

Benefits
Easily upgraded
Cost-efficient
Scalability
Automated
Highly available
Flexible
Better security
Customization
Cloud computing architecture

Front end;
• Cloud infrastructure consist of hardware and software components such as data
storage, server, virtualization software etc
• It also provides Graphical User Interface (GUI) to end users in order to perform
respective tasks

Back end;
• Manages all the programs that run the application on the front end
• It has a large number of data storage systems and servers
• It can be software or a platform
• “task is to provide utility in the architecture”
• Eg; Amazon S3, Oracle Cloud-storage, Microsoft Azure Storage

Components:
1. Hypervisor
a. Virtual Operating Platform, for every user
b. Divide and allocate resources
2. Management software
a. Manage and monitor the cloud operations
b. Improving the performance of the cloud
3. Deployment software
a. SaaS (Gmail)
b. PaaS (Microsoft Azure)
c. IaaS (pay-as-you-go pricing model)
4. Network
5. Cloud server
6. Cloud storage
Data Management (Online-course)
Course objectives:
• Understand data management capabilities from the people, process and technology
perspective.
• Understand how each capability fits into overall Data Management Framework.

Introduction
Data management refers to the development and execution of architectures, policies,
practices, and procedures in order to manage the information lifecycle of an enterprise in an
effective manner.

>> Al lecture titles are the Capabilities of data management. Each capability has three
aspects: People, Process, and Technology.

L1; Metadata management


Data Element (DE) = a unit of data for which the definition, identification, representation,
and permissible values are specified by means of a set of attributes.
> Critical data elements (CDE) = the data element that is “critical to success” in a specific
business or process.
Criteria to CDE: (list is not exclusive)
• Business facts that are deemed critical to the organization
• Support Critical Business Processes across an organization and its components
• Data used to derive values that appear in key reports
• Unique identifiers of things important to the business (e.g. Customer ID)

Metadata management involves managing data about other data, whereby ‘other data’ is
generally referred to data models and structures, not the content. (e.g. Business terms in
glossary, attributes in logical data model, or tables and columns in the database). It is to see
how the data is being managed by, and through, the organization.

Roles & Responsibilities


Business owner: is ultimately accountable with the definition of all data and metadata.
Responsibly for confirming that data is used in a fashion consistent with the overall strategy.
Also responsible for driving data management processes and activities. (Business role)
Data steward: responsible for operational oversight of assigned data and interactions with
subject matter experts across organization as well as identifying the approach to
standardize, measure, and monitor data quality. (Business role)
Technical owner: ultimately accountable that data from particular data system are managed
and used according to the defined data standards. (Technical role)
Data Custodian: technology specialist that is responsible for the secure storage and
management of the data for the particular system. (Technical role)

Operational metadata includes information about application runs; their frequency, record
counts, component by component analysis.

Metadata management process:


1. Identify Critical Data elements (CDE)
2. Collect CDE business metadata
3. Collect CDE technical metadata
4. Create CDE data standard (360 view)
5. Enforce CDE data standard

>> System Development Lifecycle (SDL) = Plan > Create > Test > Deploy > Plan

L2; Data Quality Management


Data quality refers to the methodical approach, policies and processes by which an
organization manages the accuracy, validity, timeliness, completeness, uniqueness, and
consistency (are the Dimensions) of its data in systems and data flows.

>> Is the data accurate? Is it valid? Is it on time? Is it complete? Is it unique? Is it consistent?

Data quality dimensions refers to the aspect of feature of information that can be assessed
and used to determine quality of data.

Accuracy Validity Timeliness Completeness Uniqueness Consistency


Data Data Data Data are Data are Data are
accurately conforms to represents complete in properly represented
represents the syntax reality from terms of identified consistently
“the real (format, the required required and recorder across the
world” type, range) point of time potential of only once data set
values of its data
definition
> Incorrect > Incorrect > Customer > Address > single > Customer
spellings of classification address missing the customer is account is
name values for changes zip code recorded closed, but
gender of twice there is a
customer new order to
type that account

Data quality rules refer to business rules that are set up to protect the data quality.
Data quality process: Define DQ requirements > Conduct DQ assessment > Resolve DQ
issues > Monitor and control.

L3; Data Governance


L4; Master and Reference data Management
L5; Data Integration
L6; Analytics
L7; Data Privacy
L8; Data Architecture
Data architecture refers to the models, policies, rules, or standards that govern which data
is collected, and how it is stored, arranged, and put to use in a database system, and/or in a
n organization.
Proof of work vs Proof of stake
Proof of work: requires all of its miners to attempt to solve a complex sum, with the winner
determined by the person who has the most powerful/quantity.
Proof of stake: model randomly chooses the winner based on the amount they have staked.

You might also like