Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Synthetic Data for Deep Learning:

Generate Synthetic Data for Decision


Making and Applications with Python
and R 1st Edition Necmi Gürsakal
Visit to download the full and correct content document:
https://ebookmass.com/product/synthetic-data-for-deep-learning-generate-synthetic-d
ata-for-decision-making-and-applications-with-python-and-r-1st-edition-necmi-gursak
al-2/
Necmi Gü rsakal, Esma Birişçi and Sadullah Çelik

Synthetic Data for Deep Learning


Generate Synthetic Data for Decision Making and
Applications with Python and R
Necmi Gü rsakal
Bursa, Turkey

Esma Birişçi
Bursa, Turkey

Sadullah Çelik
Aydın, Turkey

ISBN 978-1-4842-8586-2 e-ISBN 978-1-4842-8587-9


https://doi.org/10.1007/978-1-4842-8587-9

© Necmi Gü rsakal, Sadullah Çelik, and Esma Birişçi 2022

Apress Standard

The use of general descriptive names, registered names, trademarks,


service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress


Media, LLC, part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
This book is dedicated to our mothers.
Einführung
“The claim is that nature itself operates in a way that is analogous to a
priori reasoning. The way nature operates is, of course, via causation:
the processes we see unfolding around us are causal processes, with
earlier stages linked to later ones by causal relations” [1]. Data is
extremely important in the operation of causal relationships and can be
described as the “sine qua non” of these processes. In addition, data
quality is related to quantity and diversity, especially in the AI
framework.
Data is the key to understanding causal relationships. Without data,
it would be impossible to understand how the world works. The
philosopher David Hume understood this better than anyone.
According to Hume, our knowledge of the world comes from our
experiences. Experiences produce data, which can be stored on a
computer or in the cloud. Based on this data, we can make predictions
about what will happen in the future. These predictions allow us to test
our hypotheses and theories. If our predictions are correct, we can have
confidence in our ideas. If they are wrong, we need to rethink our
hypotheses and theories. This cycle of testing and refinement is how we
make progress in science and life. This is how we make progress as
scientists and as human beings.
Many of today’s technology giants, such as Amazon, Facebook, and
Google, have made data-driven decision-making the core of their
business models. They have done this by harnessing the power of big
data and AI to make decisions that would otherwise be impossible. In
many ways, these companies are following in the footsteps of Hume,
using data to better understand the world around them.
As technology advances, how we collect and store data also changes.
In the past, data was collected through experiments and observations
made by scientists. However, with the advent of computers and the
internet, data can now be collected automatically and stored in a
central location. This has led to a change in the way we think about
knowledge. Instead of knowledge being stored in our minds, it is now
something that is stored in computers and accessed through
algorithms.
This change in the way we think about knowledge has had a
profound impact on the way we live and work. In the past, we would
have to rely on our memory and experience to make decisions.
However, now we can use data to make more informed decisions. For
example, we can use data about the past behavior of consumers to
predict what they might buy in the future. This has led to a more
efficient and effective way of doing business.
In the age of big data, it is more important than ever to have high-
quality data to make accurate predictions. However, it is not only the
quantity and quality of the data that is important but also the diversity.
The diversity of data sources is important to avoid bias and to get a
more accurate picture of the world. This is because different data
sources can provide different perspectives on the same issue, which can
help to avoid bias. Furthermore, more data sources can provide a more
complete picture of what is happening in the world.

Machine Learning
In recent years, a method has been developed to teach machines to see,
read, and hear via data input. The point of origin for this is what we
think of in the brain as producing output bypassing inputs through a
large network of neurons. In this framework, we are trying to give
machines the ability to learn by modeling artificial neural networks.
Although some authors suggest that the brain does not work that way,
this is the path followed today.
Many machines learning projects in new application areas began
with the labeling of data by humans to initiate machine training. These
projects were categorized under the title of supervised learning. This
labeling task is similar to the structured content analysis applied in
social sciences and humanities. Supervised learning is a type of
machine learning that is based on providing the machine with training
data that is already labeled. This allows the machine to learn and
generalize from the data to make predictions about new data.
Supervised learning is a powerful tool for many machine learning
applications.
The quality of data used in machine learning studies is crucial for
the accuracy of the findings. A study by Geiger et al. (2020) showed that
the data used to train a machine learning model for credit scoring was
of poor quality, which led to an unfair and inaccurate model. The study
highlights the importance of data quality in machine learning research.
Data quality is essential for accurate results. Furthermore, the study
showed how data labeling impacts data quality. About half of the papers
using original human annotation overlap with other papers to some
extent, and about 70% of the papers that use multiple overlaps report
metrics of inter-annotator agreement [2]. This suggests that the data
used in these studies is unreliable and that further research is needed
to improve data quality.
As more business decisions are informed by data analysis, more
companies are built on data. However, data quality remains a problem.
Unfortunately, “garbage in, garbage out,” which was a frequently used
motto about computers in the past, is valid in the sense of data
sampling, which is also used in the framework of machine learning.
According to the AI logic most employed today, if qualified college
graduates have been successful in obtaining doctorates in the past, they
will remain doing so in the future. In this context, naturally, the way to
get a good result in machine learning is to include “black swans” in our
training data, and this is also a problem with our datasets.
A “black swan” is a term used to describe outliers in datasets. It is a
rare event that is difficult to predict and has a major impact on a
system. In machine learning, a black swan event is not represented in
the training data but could significantly impact the results of the
machine learning algorithm. Black swans train models to be more
robust to unexpected inputs. It is important to include them in training
datasets to avoid biased results.
Over time, technological development has moved it into the
framework of human decision-making with data and into the decision-
making framework of machines. Now, machines evaluate big data and
make decisions with algorithms written by humans. For example, a
driverless car can navigate toward the desired destination by
constantly collecting data on stationary and moving objects around it in
various ways. Autonomous driving is a very important and constantly
developing application area for synthetic data. Autonomous driving
systems should be developed at a capability level that can solve
complex and varied traffic problems in simulation. The scenarios we
mentioned in these simulations are sometimes made by gaming
engines such as Unreal and Unity. Creating accurate and useful
“synthetic data” with simulations based on real data will be the way
companies will prefer real data that cannot be easily found.
Synthetic data is becoming an increasingly important tool for
businesses looking to improve their AI initiatives and overcome many
of the associated challenges. By creating synthetic data, businesses can
shape and form data to their needs and augment and de-bias their
datasets. This makes synthetic data an essential part of any AI strategy.
DataGen, Mostly, Cvedia, Hazy, AI.Reverie, Omniverse, and Anyverse can
be counted among the startups that produce synthetic data. Sample
images from synthetic outdoor datasets produced by such companies
can be seen in the given source.
In addition to the benefits mentioned, synthetic data can also help
businesses train their AI models more effectively and efficiently.
Businesses can avoid the need for costly and time-consuming data
collection processes by using synthetic data. This can help businesses
save money and resources and get their AI initiatives up and running
more quickly.
Who Is This Book For?
The book is meant for people who want to learn about synthetic data
and its applications. It will prove especially useful for people working in
machine learning and computer vision, as synthetic data can be used to
train machine learning models that can make more accurate
predictions about real-world data.
The book is written for the benefit of data scientists, machine
learning engineers, deep learning practitioners, artificial intelligence
researchers, data engineers, business analysts, information technology
professionals, students, and anyone interested in learning more about
synthetic data and its applications.
Book Structure
Synthetic data is not originally collected from real-world sources. It is
generated by artificial means, using algorithms or mathematical
models, and has many applications in deep learning, particularly in
training neural networks. This book, which discusses the structure and
application of synthetic data, consists of five chapters.
Chapter 1 covers synthetic data, why it is important, and how it can
be used in data science and artificial intelligence applications. This
chapter also discusses the accuracy problems associated with synthetic
data, the life cycle of data, and the tradeoffs between data collection
and privacy. Finally, this chapter describes some applications of
synthetic data, including financial services, manufacturing, healthcare,
automotive, robotics, security, social media, marketing, natural
language processing, and computer vision.
Chapter 2 provides information about different ways of generating
synthetic data. It covers how to generate fair synthetic data, as well as
how to use video games to create synthetic data. The chapter also
discusses the synthetic-to-real domain gap and how to overcome it
using domain transfer, domain adaptation, and domain randomization.
Finally, the chapter discusses whether a real-world experience is
necessary for training machine learning models and, if not, how to
achieve it using pretraining, reinforcement learning, and self-
supervised learning.
Chapter 3 explains the content and purpose of a generative
adversarial network, or GAN, a type of AI used to generate new data,
like training data.
Chapter 4 explores synthetic data generation with R.
Chapter 5 covers different methods of synthetic data generation
with Python.

Learning Outcomes of the Book


Readers of this book will learn about the various types of synthetic
data, how to create them, and their benefits and challenges. They will
also learn about its importance in data science and artificial
intelligence. Furthermore, readers will come away understanding how
to employ automatic data labeling and how GANs can be used to
generate synthetic data. Lastly, readers who complete this book will
know how to generate synthetic data using the R and Python
programming languages.

Source Code
The datasets and source code used in this book can be downloaded
from github.com/apress/synthetic-data-deep-learning.

References
[1]. M. Rozemund, “The Nature of the Mind,” in The Blackwell Guide to
Descartes’ Meditations, S. Gaukroger, John Wiley & Sons, 2006.
[2]. R. S. Geiger et al., “Garbage In, Garbage Out?,” in Proceedings of
the 2020 Conference on Fairness, Accountability, and Transparency, Jan.
2020, pp. 325–336. doi: 10.1145/3351095.3372862.
Preface
In 2017, The Economist wrote, “The world’s most valuable resource is
no longer oil, but data,” and this becomes truer with every passing day.
The gathering and analysis of massive amounts of data drive the
business world, public administration, and science, giving leaders the
information they need to make accurate, strategically-sound decisions.
Although some worry about the implications of this new “data
economy,” it is clear that data is here to stay. Those who can harness the
power of data will be in a good position to shape the future.
To use data ever more efficiently, machine and deep learning—
forms of artificial intelligence (AI)—continue to evolve. And every new
development in how data and AI are used impacts innumerable areas of
everyday life. In other words, from banking to healthcare to scientific
research to sports and entertainment, data has become everything. But,
for privacy reasons, it is not always possible to find sufficient data.
As the lines between the real and virtual worlds continue to blur,
data scientists have begun to generate synthetic data, with or without
real data, to understand, control, and regulate decision-making in the
real world. Instead of focusing on how to overcome barriers to data,
data professionals have the option of either transforming existing data
for their specific use or producing it synthetically. We have written this
book to explore the importance and meaning of these two avenues
through real-world examples. If you work with or are interested in data
science, statistics, machine learning, deep learning, or AI, this book is
for you.
While deep learning models’ huge data needs are a bottleneck for
such applications, synthetic data has allowed these models to be, in a
sense, self-fueled. Synthetic data is still an emerging topic, from
healthcare to retail, manufacturing to autonomous driving. It should be
noted that since labeling processes start with real data. Real data,
augmented data, and synthetic data all take place in these deep
learning processes.
This book includes examples of Python and R applications for
synthetic data production. We hope that it proves to be as
comprehensive as you need it to be.
—Necmi Gü rsakal
— Sadullah Çelik
— Esma Birişçi
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub
(https://github.com/Apress). For more detailed information, please
visit http://www.apress.com/source-code.
Table of Contents
Chapter 1:​An Introduction to Synthetic Data
What Synthetic Data is?​
Why is Synthetic Data Important?​
Synthetic Data for Data Science and Artificial Intelligence
Accuracy Problems
The Lifecycle of Data
Data Collection versus Privacy
Data Privacy and Synthetic Data
Synthetic Data and Data Quality
Aplications of Synthetic Data
Financial Services
Manufacturing
Healthcare
Automotive
Robotics
Security
Social Media
Marketing
Natural Language Processing
Computer Vision
Summary
References
Chapter 2:​Foundations of Synthetic data
How to Generated Fair Synthetic Data?​
Generating Synthetic Data in A Simple Way
Using Video Games to Create Synthetic Data
The Synthetic-to-Real Domain Gap
Bridging the Gap
Is Real-World Experience Unavoidable?​
Pretraining
Reinforcement Learning
Self-Supervised Learning
Summary
References
Chapter 3:​Introduction to GANs
GANs
CTGAN
SurfelGAN
Cycle GANs
SinGAN-Seg
MedGAN
DCGAN
WGAN
SeqGAN
Conditional GAN
BigGAN
Summary
References
Chapter 4:​Synthetic Data Generation with R
Basic Functions Used in Generating Synthetic Data
Creating a Value Vector from a Known Univariate
Distribution
Vector Generation from a Multi-Levels Categorical Variable
Multivariate
Multivariate (with correlation)
Generating an Artificial Neural Network Using Package “nnet”
in R
Augmented Data
Image Augmentation Using Torch Package
Multivariate Imputation Via “mice” Package in R
Generating Synthetic Data with the “conjurer” Package in R
Creat a Customer
Creat a Product
Creating Transactions
Generating Synthetic Data
Generating Synthetic Data with “Synthpop” Package In R
Copula
t Copula
Normal Copula
Gaussian Copula
Summary
References
Chapter 5:​Synthetic Data Generation with Python
Data Generation with Know Distribution
Data with Date information
Data with Internet information
A more complex and comprehensive example
Synthetic Data Generation in Regression Problem
Gaussian Noise Apply to Regression Model
Friedman Functions and Symbolic Regression
Make 3d Plot
Make3d Plot
Synthetic data generation for Classification and Clustering
Problems
Classification Problems
Clustering Problems
Generation Tabular Synthetic Data by Applying GANs
Synthetic data Generation
Summary
Reference
Index
About the Authors
Necmi Gürsakal
a statistics professor at Mudanya
University in Turkey, where he shares his
experience and knowledge with his
students. Before that, he worked as a
faculty member at the Econometrics
Department Bursa Uludağ University for
more than 40 years. Necmi has many
published Turkish books and English
and Turkish articles on data science,
machine learning, artificial intelligence,
social network analysis, and big data. In
addition, he has served as a consultant to
various business organizations.

Sadullah Çelik
a mathematician, statistician, and data scientist who completed his
undergraduate and graduate education in mathematics and his
doctorate in statistics. He has written Turkish and English numerous
articles on big data, data science, machine learning, multivariate
statistics, and network science. He developed his programming and
machine learning knowledge while writing his doctoral thesis, Big Data
and Its Applications in Statistics. He has been working as a Research
Assistant at Adnan Menderes University Aydin, for more than 8 years
and has extensive knowledge and experience in big data, data science,
machine learning, and statistics, which he passes on to his students.
Esma Birişçi
a programmer, statistician, and
operations researcher with more than 15
years of experience in computer
program development and five years in
teaching students. She developed her
programming ability while studying for
her bachelor degree, and knowledge of
machine learning during her master
degree program. She completed her
thesis about data augmentation and
supervised learning. Esma transferred to
Industrial Engineering and completed
her doctorate program on dynamic and
stochastic nonlinear programming. She
studied large-scale optimization and life
cycle assessment, and developed a large-scale food supply chain system
application using Python. She is currently working at Bursa Uludag
University, Turkey, where she transfers her knowledge to students. In
this book, she is proud to be able to explain Python’s powerful
structure.
About the Technical Reviewer
Fatih Gökmenoğlu
is a researcher focused on synthetic data,
computational intelligence, domain
adaptation, and active learning. He also
likes reporting on the results of his
research.
His knowledge closely aligns with
computer vision, especially with
deepfake technology. He studies both the
technology itself and ways of countering
it.
When he’s not on the computer, you’ll
likely find him spending time with his
little daughter, whose development has
many inspirations for his work on
machine learning.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2022
N. Gü rsakal et al., Synthetic Data for Deep Learning
https://doi.org/10.1007/978-1-4842-8587-9_1

1. An Introduction to Synthetic Data


Necmi Gü rsakal1 , Sadullah Çelik2 and Esma Birişçi2

(1) Bursa, Turkey


(2) Aydın, Turkey

In this chapter, we will explore the concept of data and its importance
in today’s world. We will discuss the lifecycle of data from collection to
storage and how synthetic data can be used to improve accuracy in data
science and artificial intelligence (AI) applications. Next, we will
explore of synthetic data applications in financial services,
manufacturing, healthcare, automotive, robotics, security, social media,
and marketing. Finally, we will examine natural language processing,
computer vision, understanding of visual scenes, and segmentation
problems in terms of synthetic data.

What Synthetic Data is?


Despite 21st-century advances in data collection and analysis, there is
still a lack of understanding of how to properly utilize data to minimize
the perceived ambiguity or subjectivity of the information it represents.
This is because the same meaning can be expressed in a variety of ways,
and a single expression can have multiple meanings. As a result, it is
difficult to create a comprehensive framework for data interpretation
that considers all of the potential nuances and implications of the
information. One way to overcome this challenge is to develop
standardized methods for data collection and analysis. This will ensure
that data is collected consistently and that the results can be compared
across different studies and synthetic data can help us do just that.
People generally view synthetic data as being less reliable than
data that is obtained by direct measurement. Put simply, Synthetic
data is data that is generated by a computer program rather than being
collected from real-world sources. While synthetic data is often less
reliable than data that is collected directly from the real world, it is still
an essential tool for data scientists. This is because synthetic data can
be used to test hypotheses and models before they are applied to real-
world data. This can help data scientists avoid making errors that could
have negative consequences in the real world.
Synthetic data that is artificially generated by a computer program
or simulation, rather than being collected from real-world sources [11].
When we examine this definition, we see that the following concepts
are included in the definition: “Annotated information, computer
simulations, algorithm, and “not measured in a real-world”.
The key features of synthetic data are as follows:
Not obtained by direct measurement
Generated via an algorithm
Associated with a mathematical or statistical model
Mimics real data
Now let’s explain why synthetic data is important.

Why is Synthetic Data Important?


Humans have a habit of creating synthetic versions of expensive
products. Silk is an expensive product that began to be used thousands
of years ago, and rayon was created in the 1880s. So, it’s little surprise
that people would do the same with data, choosing to produce synthetic
data because it is cost-effective. As mentioned earlier, synthetic data
allows scientists to test hypotheses and models in a controlled
environment it can also be used to create “what if” scenarios, helping
data scientists to better understand the outcomes of their models.
Likewise, synthetic data can be used in a variety of ways to improve
machine learning models and protect the privacy of real data. First,
synthetic data can be used to train machine learning models when real
data is not available. This is especially important for developing
countries, where data is often scarce. Second, synthetic data can be
used to test machine learning models before deploying them on real
data. This is important for ensuring that models work as intended and
won’t compromise the real data. Finally, synthetic data can be used to
protect the privacy of real data by generating data that is similar to real
data but does not contain any personal information.
Synthetic data also provides more control than real data. Actual
data comes from many different sources, which can result in datasets so
large and diverse that they become unwieldy. Because synthetic data is
created using a model whose data is generated for a specific purpose, it
will not be randomly scattered. In some cases, synthetic data may
even be of a higher quality than real data. Actual data may need to be
over-processed when necessary, and too much data may be processed
when necessary. These actions can reduce the quality of the data.
Synthetic data, on the other hand, can be of higher quality, thanks to the
model used to generate the data.
Overall, synthetic data has many advantages over real data.
Synthetic data is more controlled, of higher quality, and can be
generated in the desired quantities. These factors make synthetic data a
valuable tool for many applications. A final reason why synthetic data is
important is that it can be used to generate data for research purposes,
allowing researchers to study data that is not biased or otherwise not
representative of the real data.
Now let’s explain the importance of synthetic data for data science
and artificial intelligence.

Synthetic Data for Data Science and Artificial


Intelligence
The use of synthetic data is not a new concept. In the early days of data
science and AI, synthetic data was used to train machine learning
models. However, synthetic data of the past was often low-quality and
not realistic enough to be useful for training today’s more sophisticated
AI models.
Recent advances in data generation techniques, such as Generative
Adversarial Networks (GANs), have made it possible to generate
synthetic data that is virtually indistinguishable from real-world data.
This high-quality synthetic data is often referred to as “realistic
synthetic data”.
The use of realistic synthetic data has the potential to transform the
data science and AI fields. Realistic synthetic data can be used to train
machine learning models without the need for real-world data. This is
especially beneficial in cases where real-world data is scarce, expensive,
or difficult to obtain.
In addition, realistic synthetic data can be used to create “virtual
environments” for testing and experimentation. These virtual
environments can be used to test machine learning models in a safe and
controlled manner, without the need for real-world data.
For example, a computer algorithm might be used to generate
realistic-looking images of people or objects. This could be used to train
a machine learning system to better recognize these objects in real-
world images. Alternatively, synthetic data could be used instead of
real-world data if the latter is not available or is too expensive to obtain.
Overall, the use of synthetic data is a promising new trend in data
science and AI. The ability to generate high-quality synthetic data
is opening new possibilities for training and experimentation. For
example, a synthetic data set could be created that contains 6000
words, instead of the usual 2000. This would allow the AI system to
learn from a larger and more diverse data set, which would in turn
improve its performance on real-world data. In the future, synthetic
data is likely to play an increasingly important role in the data science
and AI fields.
Let us now consider accuracy problems in terms of synthetic data.

Accuracy Problems
Supervised learning algorithms are trained with labeled data. In this
method, the data is commonly named “ground truth”, and the test data
is called “holdout data”. We have three types to compare accuracies
across algorithms [2]:
Estimator score method: An estimator is a number that is used to
estimate, or guess, the value of something. The score method is a way
to decide how good an estimator is by analyzing how close the
estimator’s guesses are to the actual value.
Scoring parameter: Cross-validation is a model-evaluation technique
that relies on an internal scoring strategy.
Metric functions: The sklearn.metrics module provides functions for
assessing prediction error for specific purposes.
It’s important to acknowledge that the accuracy of synthetic data
can be problematic for several reasons. First, the data may be generated
by a process that is not representative of the real-world process that
the data is meant to represent. This can lead to inaccuracies in the
synthetic data that may not be present in real-world data. Second, the
data may be generated with a specific goal in mind, such as training a
machine learning algorithm, that does not match the goal of the data’s
user of the synthetic data. This can also lead to inaccuracies in the
synthetic data. Finally, synthetic data may be generated using a
stochastic process, which can introduce randomness into the data that
may not be present in real-world data. This randomness can also lead
to inaccuracies in the synthetic data.
One way to overcome potential issues with accuracy in machine
learning is to use synthetic data. This can be done by automatically
tagging and preparing data for machine learning algorithms, which cuts
down on the time and resources needed to create a training dataset.
This also creates a more consistent dataset that is less likely to contain
errors. Another way to improve accuracy in machine learning is to use a
larger training dataset. This will typically result in better performance
from the machine learning algorithm.
Working on the recognition and classification of aircraft from
satellite photos, the Airbus and OneView companies, in their studies on
data for machine learning, achieved accuracy of 88% versus 82% with
the simulated dataset of OneView company, compared to
data consisting of only of real data. When real data and synthetic data
are used in a mixed way, an accuracy of ~ 90% is obtained, and this
number represents an 8% improvement over real-only real data [3].
This improved accuracy is due to the increased variety of data that is
available when both real and simulated data are used. The increased
variety of data allows the machine learning algorithm to better learn
the underlying patterns of the data. This improved accuracy is
significant and can lead to better decision-making in a variety of
applications.
Now let’s examine the life cycle of data in terms of synthetic data.

The Lifecycle of Data


Before leveraging the power of synthetic data, it’s important to
understand the lifecycle of data. First, it can help organizations to
better manage their data; by understanding the stages that data goes
through, organizations can more effectively control how data is used
and prevent unauthorized access. Additionally, the data lifecycle can
help organizations ensure that their data is of high quality. Finally, the
data lifecycle can help organizations plan for the eventual destruction
of data; by understanding when data is no longer needed, organizations
can ensure that they do not keep data longer than necessary, which
can both save space and reduce costs.
The data lifecycle is the process of managing data from its creation
to its eventual disposal. Figure 1-1 shows the five main phases of the
data lifecycle.
Figure 1-1 Data lifecycle
In the context of synthetic data, the data life cycle refers to the
process of generating, storing, manipulating, and outputting
synthetic data. This process is typically carried out by computers and
involves the use of algorithms and rules to generate data that resembles
real-world data.
Following are the five stages of the data lifecycle:
1. Data creation: The first stage of the data lifecycle is data creation.
This is the stage at which synthetic data is first generated, either
through direct input or through the capture of information from an
external source.

2. Data entry and storage: This stage involves the entry of synthetic
data into a computer system and its storage in a database. Data
entry and storage typically involve the use of algorithms or rules to
generate data that resembles real-world data.
3. Data processing: This stage covers the manipulation of synthetic
data within the computer system, to convert it into a format that is
more useful format for users. This may involve the use of
algorithms and the application of rules and filters. Data processing
typically involves the use of algorithms or rules to generate data
that resembles real-world data.

4. Data output and dissemination: This stage is the process of


generating synthetic data from a computer system and making it
available to users. This may involve the generation of reports, the
creation of graphs and charts, or the output of data in a format that
can be imported into another system.

5. Data disposal: The final stage of the data lifecycle is data disposal.
This stage covers the disposal of synthetic data that is no longer
needed. This may involve the deletion of data from a database or
the physical destruction of storage media. Data disposal typically
involves the use of algorithms or rules to generate.

Reinforcement learning algorithms are used to learn how to do


things by interacting with an environment. However, these algorithms
can be inefficient, meaning they need a lot of interactions to learn well.
To address this issue, some people are using external sources of
knowledge, such as data from demonstrations or observations. This
data can come from experts, real-world demonstrations, simulations, or
synthetic demonstrations.
Researchers at Google and DeepMind think that datasets also have a
lifecycle and they summarize the lifecycle of datasets in three stages:
Producing the data, consuming the data, and sharing the data [4].
In the production data phase, users record their interactions with
the environment and provide datasets. At this stage, users add
additional information to the data automatically or manually labeling
or filtering the data.
In the consuming the data phase, researchers analyze and visualize
datasets or use them to train algorithms for machine learning purposes.
In the sharing the data stage, researchers often share their data
with other researchers to help with their reserach. When researchers
share data, it makes it easier for other researchers to run and validate
new algorithms. However, the researchers who produced the data still
own it and should be given credit for their work.
Let’s now consider data collection and privacy issues in terms of
synthetic data.

Data Collection versus Privacy


Data can be collected in many ways. For example, data from radar,
LIDAR, and the camera systems of driverless cars can be taken and
fused into a format usable for decision making. Considering that fusion
data is also virtual data, it is necessary to think in detail about the
importance of real and virtual data. So, since real data can be converted
into virtual data, and used or augmented data can be used
together with real data in machine learning, these two data types
are important for us. This is sometimes referred to as the Ensemble
Method.
In the Ensemble Method, a few basic models can be combined to
create an optimal predictive model; Why can’t the data be fused, and
more qualified data can be obtained?
Labeling the data is tedious and costly, as machine learning models
require large and diverse data to produce good results. Therefore
creating synthetic data by transforming real data using with data
augmentation techniques or directly generating synthetic data and
using it as an alternative to real data can reduce transaction costs.
According to Gartner, by 2030, there will be more synthetic data than
real data in AI models.
Another issue that synthetic data can help overcome is that of data
privacy.

Data Privacy and Synthetic Data


Today, many institutions and organizations use large amounts of data to
forecast, create policies, plan, and achieve higher profit margins. By
using this data, they can better understand the world around them and
make more informed decisions. However, due to privacy restrictions
and guarantees given to personal data, only the personnel of the
institutions have full access to such data. Anonymization techniques are
used to prevent the identities of data subjects from being revealed.
Sure, data collectors can maintain data privacy by using aggregation,
recoding, record exchange, suppression of sensitive values, and random
error insertion, data collectors can maintain data privacy. However,
advances in computer and cloud technologies are likely to make such
measures insufficient to maintain data privacy. We’ll explore some
examples in the next section.
In today’s world, with the advances in information technology,
patient data, and driver the data, of those using vehicles the data
obtained by research companies from public opinion surveys have
reached enormous amounts. However, most of the time, when this data
is used to find new solutions, the concept of “individual privacy” comes
up. This problem is overcome by anonymizing the data, which is the
process of modifying the data to eliminate any information that could
lead to privacy intrusion. Anonymizing data is important to protect
people’s privacy, as even without personal identifiers, the remaining
attributes in the data may still be used to re-identify an individual. In
the simplest form of data anonymization, all personal identifiers are
removed. However, it has been shown that this is not enough to protect
people’s privacy. Therefore, it is important to fully anonymize data
possible to protect people’s privacy rights.
Most people think that privacy is protected by anonymization. This
is when there is no name, surname, or any sign to indicate identity in
the database. However, this is not always accurate. This means that if
you have an account on both Twitter and Flickr, there’s a good chance
that someone could identify you from the anonymous Twitter graph.
However, the error rate is only 12%, so the chances are still pretty good
that you won’t be identified [14]. Even though the chances of being
identified are relatively low, it is still important to be aware of the
potential risks of sharing personal information online. Anonymity is not
a guarantee of privacy, and even seemingly innocuous information can
identify individuals in certain cases. Therefore, exercising caution is
required when sharing personal information online, even if it is
ostensibly anonymous.
Anonymization and labeling are two primary techniques in AI
applications. However, both techniques have their own set of problems.
Anonymization can lead to the loss of vital information, while labeling
can introduce bias and be costly to implement. In addition, hand-
labeled data might not be high quality because it is often mislabeled. To
overcome these problems, researchers have proposed various methods,
such as semi-supervised learning and active learning. However, these
methods are still not perfect, and further research is needed to improve
them.

The Bottom Line


The collection of more data with increasing data sources makes it
necessary for businesses to take security measures against information
attacks. In some cases, businesses need more data than is available to
innovate in certain areas. In some cases, more data may be necessary
due to a lack of practical research or high costs of data collection. Many
businesses generate data programmatically in the real world to obtain
otherwise unattainable information. The use of synthetic data is
becoming increasingly popular as businesses attempt to collect more
data and test different scenarios. Synthetic data is created by computer
programs and is designed to mimic real-world data. This allows
businesses to gather data more efficiently and to test various scenarios
to see what may happen in the real world.
The world is becoming more data-centric, so businesses are starting
to use computer programs to create data similar to data gathered from
the real world. This is useful because it facilitates data collection and
helps businesses test different scenarios to see what will happen in the
real world.
Now let’s examine synthetic data and data quality.

Synthetic Data and Data Quality


When working on AI projects, it is important to focus on data quality. It
all starts there; if data quality is poor, the AI system will be fatally
compromised. Data cascades can occur when AI practitioners apply
conventional AI practices that don’t value data quality. Most AI
practitioners (92%) have reported experiencing one or more data
cascades. This often happens because they applied conventional AI
practices that didn’t value data quality. For this reason, it is important
to use high-quality data when training deep learning networks [5].
Andrew Ng has said that “Data is food for AI” and that the issue of data
quality should be focused on data more than the model/algorithm [6].
The use of synthetic data can help to address the issue of data quality in
AI projects. This is because synthetic data can be generated to be of
high quality, and it can be generated to be representative of the real-
world data that the AI system will be used on. This means that the AI
system trained on synthetic data will be more likely to generalize well
to the real world.
AI technologies in particular use synthetic data intensively. Just a
few examples include medicine, where synthetic data is used
extensively to test specific conditions and cases for which real data is
not available; self-driving cars, such as the ones used by Uber and
Google, are trained using synthetic data; fraud detection and protection
in the financial industry is facilitated using synthetic data. Synthetic
data gives data professionals access to centrally stored data while
maintaining the privacy of the data. In addition, synthetic data
reproduce important features of real data without revealing its true
meaning and protects confidentiality. In research departments, on the
other hand, synthetic data is used to develop and deliver innovative
products for which the necessary data may not be available [7]. Overall,
the use of synthetic data is extremely beneficial as it allows for the
testing of new products and services while maintaining the privacy of
the original data. Synthetic data is also incredibly versatile and can be
used in a variety of different industries and applications. In the future,
the use of synthetic data is likely to become even more widespread as
the benefits of using it become more widely known.
Let us now examine some of synthetic data applications.

Aplications of Synthetic Data


Synthetic data is often used in financial services, manufacturing,
healthcare, automotive, robotics, security, social media, and marketing.
Let’s first quickly explore how synthetic data cen be used in finance.
Finanzdienstleistungen
The use of synthetic data is becoming increasingly important in
financial services as the industry moves towards more data-driven
decision-making. Synthetic data can be used to supplement or replace
traditional data sources, providing a more complete picture of the
underlying risk.
Financial services is an industry that is highly regulated and subject
to constant change. New rules and regulations are
constantly being introduced, and the industry is constantly evolving. As
a result, it can be difficult for financial institutions to keep up with the
changes and ensure that their data is compliant.
Synthetic data can be used to generate data that is compliant with
the latest rules and regulations. This can help financial institutions
avoid the costly fines and penalties that can be associated with non-
compliance. In addition, synthetic data can be used to test new
products and services before they are launched. This can help financial
institutions avoid the costly mistakes that can be made when launching
new products and services.
Synthetic data can also help to improve the accuracy of risk models
by providing a more complete picture of underlying risks. For example,
consider a portfolio of loans. Traditional data sources may only provide
information on the loan amount, interest rate, and term. However,
synthetic data can provide additional information on the borrower’s
credit score, employment history, and other factors that can impact the
risk of default. This additional information can help to improve the
accuracy of the risk model.
Another key benefit of synthetic data is that it can provide a way to
test and validate models before they are deployed in live environments.
This is because synthetic data can be generated with known values for
the inputs and outputs. This allows for the testing of models under a
variety of different scenarios, which can help to identify any potential
issues before the model is deployed in a live environment.
Synthetic data can be used in financial services in a variety of other
ways. For example, it can be used to:
Generate realistic scenarios for stress testing and risk
management: Generating synthetic data can help financial
institutions to identify potential risks and to develop plans for
dealing with them. This can be used to generate realistic scenarios
for stress testing and risk management purposes. Doing so can help
to improve the resilience of the financial system.
Train machine learning models: Synthetic data can help train
machine learning models for tasks such as fraud detection and credit
scoring. This can automate processes for financial institutions and
make them more efficient.
Generate synthetic transactions: Synthetic data can be used to
generate synthetic transactions, which can help financial institutions
test new products and services, or simulate market conditions.
Generate synthetic customer data: Financial institutions can use
synthetic data to generate synthetic customer data. This can help
them to test new customer acquisition strategies or to evaluate
customer service levels.
Generate synthetic financial data: Synthetic data can be used to
generate synthetic financial data. This can help financial institutions
to test new financial products or to evaluate the impact of new
regulations.
Finally, synthetic data can help to reduce the cost of data acquisition
and storage. This is because synthetic data can be generated on-
demand, as needed. This eliminates the need to store large amounts of
data, which can save on both the cost of data acquisition and storage.
Now, let’s look at how synthetic data can be used in the
manufacturing field.

Manufacturing
In the world of manufacturing, data is used to help inform decision-
makers about various aspects of the manufacturing process, from
production line efficiency to quality control. In some cases, this data is
easy to come by- for example, data on production line outputs can be
gathered through sensors and other monitoring devices. However, in
other cases, data can be much more difficult to obtain. For example,
data on the performance of individual components within a production
line may be hard to come by or may be prohibitively expensive to
gather. In these cases, synthetic data can be used to fill in the gaps.
In many manufacturing settings, it is difficult or impossible to
obtain real-world data that can be used to train models. This is often
due to the proprietary nature of manufacturing processes, which can
make it difficult to obtain data from inside a factory. Additionally, the
data collected in a manufacturing setting may be too noisy or
unrepresentative to be useful for training models.
To address these issues, synthetic data can be used to train models
for manufacturing applications. However, it is important to consider
both the advantages and disadvantages of using synthetic data before
deciding whether it is the right choice for a particular application.
Synthetic data can be employed in manufacturing in several ways.
First, synthetic data can be used to train machine learning models that
can be used to automate various tasks in the manufacturing process.
This can improve the efficiency of the manufacturing process and help
to reduce costs. Second, synthetic data can be used to test and validate
manufacturing processes and equipment. This can help to ensure that
the manufacturing process is running smoothly, and that the equipment
is operating correctly. Third, synthetic data can be used to monitor the
manufacturing process and to identify potential problems. This can
help to improve the quality of the products being produced and to avoid
costly manufacturing defects.
Synthetic data can be used to improve the efficiency of data-driven
models. This is because synthetic data can be generated much faster
than real-world data. This is important because it is allowing
manufacturers to train data-driven models faster and get them to
market quicker.
The use of synthetic data is widespread in the manufacturing
industry. It helps companies to improve product quality, reduce
manufacturing costs, and improve process efficiency. Some examples of
the use of synthetic data in manufacturing are as follows:
Quality Control: Synthetic data can be used to create models that
predict the likelihood of defects in products. This information can be
used to improve quality control procedures.
Cost Reduction: The use of synthetic data can help identify patterns
in manufacturing processes that lead to increased costs. This
information can be used to develop strategies for reducing costs,
thereby reducing the overall cost of production.
Efficiency Improvement: Synthetic data can be used to create
models that predict the efficiency of manufacturing processes. This
information can be used to improve process efficiency.
Product Development: Synthetic data can help improve product
development processes by predicting the performance of new
products. In this way, it can be decided which products to monitor
and how to develop them.
Production Planning: Production planning can be done by using
synthetic data to create models that predict the demand for products.
In this way, businesses can improve their production planning by
making better predictions about future demand.
Maintenance: Synthetic data can be used to create models that
predict the probability of equipment failures. In this way, preventive
measures can be taken, and maintenance processes can be improved
by predicting when equipment will fail.
Now, let’s quickly explore how synthetic data can be employed in
the healthcare realm.

Healthcare
The most obvious benefit of utilizing synthetic data in healthcare is to
protect the privacy of patients. By using synthetic data, healthcare
organizations can create models and simulations that are based on real
data but do not contain any actual patient information. This can be
extremely helpful in situations where patient privacy is of paramount
concern, such as when developing new treatments or testing new
medical devices.
The use of synthetic data will evolve in line with the needs and
requirements of health institutions. However, the following are some of
the most common reasons why healthcare organizations might use
synthetic data include:
Machine learning models: One of the most common reasons why
healthcare organizations use synthetic data is to train machine
learning models. This is because synthetic data can be generated in a
controlled environment, which allows for more reliable results.
Artificial intelligence: synthetic data can be used to identify patterns
in patient data that may be indicative of a particular condition or
disease. This can then be used to help diagnose patients more
accurately and to also help predict how they are likely to respond to
treatment. This is extremely important in terms of ensuring that
patients receive the most effective care possible.
Protect privacy: One of the biggest challenges in the healthcare
industry is the reliable sharing of data. Health data is very important
for doctors to diagnose and treat patients quickly. For this reason,
many hospitals and health institutions attach great importance to
patient data. Synthetic data help provide the best possible treatment.
In addition, synthetic data is a technology that will help healthcare
organizations share information while protecting personal privacy.
Treatments: Another common reason why healthcare organizations
use synthetic data is to test new treatments. This is because synthetic
data can be used to create realistic simulations of real-world
conditions, which can help to identify potential side effects or issues
with a new treatment before it is used on real patients.
To help design new drugs and to test their efficacy.
Improve patient care: Healthcare organizations can also use
synthetic data to improve patient care. This is because synthetic data
can be used to create realistic simulations of real-world conditions,
which can help healthcare professionals to identify potential issues
and make better-informed decisions about patient care.
Reduce costs: Healthcare organizations can also use synthetic data to
reduce cost. This is because synthetic data can be generated
relatively cheaply, which can help to reduce the overall costs
associated with real-world data collection and analysis.
Several hospitals are now using synthetic data in the health sector to
improve the quality of care that they can provide. This is being one in
several different ways, but one of the most common is to use
computer simulations. This allows for a more realistic representation
of patients and their conditions, which can then be used to test
out new treatments or procedures. This can be extremely beneficial
in reducing the risk of complications and ensuring that patients
receive the best possible care.
Overall, the use of synthetic data in the health sector is extremely
beneficial. It is helping to improve the quality of care that is being
provided and is also helping to reduce the risk of complications. In
addition, it is also helping to speed up the process of diagnosis and
treatment.
Now let’s look at how synthetic data can be used in the automotive
industry field.

Automotive
Another application of synthetic data in the automotive industry is
autonomous driving. A large amount of data is needed to train an
autonomous driving system. This data can be used to train a machine
learning model that can then be used to make predictions about how
the autonomous driving system should behave in different situations.
However, real-world data is often scarce, expensive, and difficult to
obtain.
Another important application of synthetic data in automotive is in
safety-critical systems. To ensure the safety of a vehicle, it is
important essential to be able to test the systems in a variety of
scenarios. Synthetic data can be used to generated data for all the
different scenarios that need to be tested. This is important because it
allows for to provide more thorough testing of system and helps ensure
the safety of the vehicle.
Overall, synthetic data has to potential to be a valuable tool for the
automotive industry. It can be used to speed up the development
process and to generate large quantities of data. However, it is
important to be aware of the challenges associated with synthetic data
and to ensure that it is used in a way that maximizes its benefits.
There are a few reasons why automotive companies need synthetic
data. The first has to do with the development of new technologies a
large amount of data. In order to create and test new features or
technologies, companies need a large amount of data. This data is used
to train algorithms that will eventually be used in the product. However,
collecting this data can be difficult, time-consuming, and expensive.
Another reason automotive companies need synthetic data is for
testing purposes. Before a new product is released, it needs to go
through rigorous testing. This testing often includes putting the
product through a range of different scenarios. However, it can be
difficult to test every single scenario in the real world. This is where
synthetic data comes in. It can be used to create realistic test scenarios
that would be difficult or impossible to re-create in the real world.
Synthetic data can be used for marketing purposes. Automotive
companies also often use data to create marketing materials such as
ads or website content. However, this data can be difficult to obtain.
Synthetic data can be used to create realistic marketing scenarios that
can be used to test different marketing strategies.
In conclusion, synthetic data is needed in automotive industry for a
variety of reasons. It can be used to create realistic test scenarios, train
algorithms, and create marketing materials.
Now let’s look at how synthetic data is used in the robotics field.

Robotics
Robots are machines that can be programmed to do specific tasks.
Sometimes these tasks are very simple, like moving a piece of paper
from one place to another. Other times, the tasks are more complex, like
moving around in the world and doing things that humans can do, like
solving a Rubik’s Cube. Creating robots that can do complex tasks is a
challenge because the robots need a lot of training data to behave like
humans. This data can be generated by simulations, which is a way of
creating a model of how the robot will behave.
There are several reasons why synthetic data is needed in robotics.
The first is that real-world data is often scarce. This is especially true
for data needed to train machine learning models, a key component of
robotics. Synthetic data can be used to supplement real-world data and,
in some cases, to replace them entirely. Second, real-world data is often
noisy. This noise can come from a variety of sources, such as sensors,
actuators, and the environment. Synthetic data can be used to generate
noise-free data that can be helpful for training machine learning
models. The third reason is that collecting real-world data is often
expensive. This is especially true for data needed to train machine
learning models. Synthetic data can be used to generate data that is
much cheaper to collect. A fourth reason is that real-world data is often
biased. This bias can come from a variety of sources, such as sensors,
actuators, and the environment. Synthetic data can be used to generate
bias-free data that can be helpful for training machine learning models.
The fifth reason synthetic data is needed in robotics is that real-world
data is often unrepresentative. This is especially true for data needed to
train machine learning models. Synthetic data can be used to create
data that better represents the real world, which can be helpful for
training machine learning models.
Robots can learn to identify and respond to different types of
objects by using synthetic data. By learning from this data, the robot
can learn how to better identify and respond to different types of
human behavior. For example, a robot might be given a set of synthetic
data that includes variousa variety of different types of human behavior
and how to respond to them.
Now let’s look at how synthetic data can be used in security field.

Security
Synthetic data can play a vital role in enhancing security, both through
its ability to train machine learning models to better detect security
threats and by providing it also provides a means way of testing
security systems and measuring their effectiveness.
Machine learning models that are trained on synthetic data are
more effective at detecting security threats because they are not limited
by available the real-world data that is available synthetic data can be
generated to match any desired distribution, including distributions
that are not present in the real world. This allows machine learning
models to learn more about the underlying distribution of data, and to
better identify outliers that may represent security threats.
Testing security systems with synthetic data is important because it
allows a controlled environmentets for measure the system’s
performance. Synthetic data can be generated to match any desired
distribution of security threats, making it possible to test how well a
security system can detect and respond to a wide variety of
threats. This is importnat because real-world data is often limited in
scope and may not be representative of the full range of security threats
that a system may encounter.
Overall, the use of synthetic data is importnat essential for
both training machine learning models to detect security threats and
for testing the performance of security systems. Synthetic data
provides a more complete picture of the underlying distribution of data
which leads to better improves the detection of security threats.
Additionally, synthetic data can be used to create controlled
environments for testing security system performance, making it
possible to measure the effectiveness of a security system more
accurately.
Now, let’s quickly explore how synthetic data can be employed in
the social media realm.

Social Media
Social media has become an integral part of our lives. It is a platform
where we share our thoughts, ideas, and feelings with our friends and
family. However, social media has also become a breeding ground for
fake news and misinformation. This is because anyone can create a fake
account and spread false information.
To combat this problem, many social media platforms are now using
AI to detect fake accounts and flag them. However, AI can only be as
effective as the data it is trained on. If the data is biased or inaccurate,
the AI ​will also be biased or inaccurate. This is where synthetic data
comes in. Synthetic data can be used to train AI algorithms to be more
accurate in detecting fake accounts. Synthetic data can help reduce the
spread of fake news and misinformation on social media.
One way to generate synthetic data is to use generative models. For
example, a generative model could be trained on a dataset of real
images of people. Once trained, the model could then generate new
images of people that look real but are fake. This is important because
it allows us to creates data that is representative of the real world.
Simulation is another way of generating synthetic data. For
example, we could create a simulation of a social media platform. This
simulation would include all the same features as the real social media
platform. However, it would also allows us to control what data is
generated. This is important because it allows us to test different
scenarios. For example, we could test what would happen if a certain
percentage of accounts were fake. This would allow us to see how
our AI algorithms would react in the real world.
Some social media platforms that have been known to use synthetic
data include Facebook, Google, and Twitter; Each of this platforms has
used synthetic data in different ways and for different purposes.
Facebook has been known to uses synthetic data to train its
algorithms. For example, Facebook has used synthetic data to train its
facial recognition algorithms. Because it is difficult to obtain a large
enough dataset of real-world faces to train these algorithms
effectively. In addition, Facebook has also used synthetic data to
generate fake user profiles. This is done to test how effective
plartfom algorithms are at detecting fake profiles.
In addition to using real data, Google has been known to
use synthetic data. Synthetic data is generated data that is designed to
mimic real data. For example, Google has to used synthetic data to train
its machine learning algorithms to better understand natural language.
Google has also used synthetic data to generate fake reviews. This is
done to test how effective the platform’s algorithms at
detectare detecting fake reviews.
Twitter is also known to use synthetic data. The platform has used
synthetic data to generate fake tweets and fake user profiles to test how
effective its algorithms are at detecting detect them.
Now, let’s quickly explore how synthetic data can be employed in
the marketing realm.

Marketing
There are many benefits to using synthetic data in marketing. Perhaps
the most obvious benefit is that it can be used to generate data that
would be otherwise unavailable. This is especially useful for marketing
research, as because it can be used to generate data about consumer
behavior that would be difficult or impossible to obtain through
traditional means.
The use of synthetic data in marketing is important for several
reasons. First, it allows marketing researchers to study behavior in a
controlled environment. This is important because it allows for the
isolation of variables and the testing of hypotheses in a way that would
not be possible with real-world data. Second, synthetic data can be
used to generate new insights into consumer behavior. By analyzing
how consumers behave in a simulated environment, marketing
researchers can develop new theories and models that can be applied
to real-world data. Finally, synthetic data can be used to evaluate
marketing campaigns and strategies. By testing campaigns and
Another random document with
no related content on Scribd:
A folding table frame, designed as a support for a circular split-
bamboo tray, is shown in the photographs reproduced and detailed
in the working drawing. It is a serviceable and inexpensive piece of
furniture, and can be constructed readily by the home mechanic. As
the trays vary in size, the frame must be made to correspond, those
from 24 to 28 in. in diameter being satisfactory. The tray may be
made by the ambitious craftsman or purchased at stores dealing in
Oriental goods. A wooden top may, of course, be substituted. The
frame is made preferably of soft wood. The following finished pieces
are required for a 24-in. tray: 4 legs, ⁷⁄₈ by 3 by 30 in.; 4 crosspieces,
1 by 2 by 25 in. Mortise the legs to the ends of the crosspieces, one
set of mortises being ⁷⁄₈ in. below the other. Assemble the parts and
fasten the joints with glue and 2-in. flat-head screws, countersunk.
This Tray Table Is Readily Portable, and Useful in the House and on the
Porch or Lawn

Adjust the crosspieces of each set so that their centers match, and
fasten them in this position with screws, from the under side. The
two parts of the frame revolve on them when the table is “knocked
down.” On the ends of the lower crosspieces of each set, fasten
blocks to level the support for the tray. Finish the frame to harmonize
with the furniture of the room. Conceal the screw heads under bands
of hammered or oxidized copper, fastened with copper or brass pins.
A second tray may be placed on the lower crosspieces.—F. E. Tuck,
Nevada City, Calif.
Small Desk Lamp Supported by Paper Weight
Those who wish a small desk light that may be pushed back out of
the way in the daytime, will find the accompanying sketch of interest.
When in use on a roll-top desk, the lamp is placed on top near the
edge, so that the bulb overhangs. A 25-watt lamp will light the bed of
the desk, and the small metal shade is so placed that no part of the
bulb is visible to the eye of the worker. By providing a suitable base,
the lamp may be adapted to other uses. A stock paper weight, about
2 in. in diameter and covered with green felt, was used as a base. An
ordinary drop-cord socket is provided, and to one side of the top cap
a strip of brass, ¹⁄₁₆ by ¹⁄₂ in., is soldered. A hole is drilled near the
end of this strip so that the screw which holds the knob will also hold
the socket. Connect the flexible cord in the usual manner.
The shade is made of sheet metal, bent in the form of a cone,
having the front shorter than the opposite edge. Make a sketch of the
bulb, and determine the lengths of the two sides A and B, and then
draw two concentric circles of corresponding radii on paper, as
indicated in the small diagram. The proper curve for the shade will
then lie between these two circles. Cut a paper pattern, and form it
into a cone. After the proper shape is determined, mark it on the
metal, cut it to shape, and solder it. A small spring clip, C, engages
the tip of the bulb; the back of the shade is held by a piece of spring
wire, D. It is easy to spring the shade off in replacing the bulb. The
outside of the shade should be enameled an olive-green.—John D.
Adams, Phoenix, Arizona.
Device Frightens Flies at Screen Door

The Scalloped Roller is Revolved Rapidly When the Door is Opened,


Frightening Flies

An effective means of frightening flies away from a screen door


may be made from a spring curtain rod and cotton duck. Scallops of
8-oz. duck, 6 in. long, are fastened to the pole, on opposite sides, as
shown. The ratchet on the end of the pole is arranged so as not to
catch. A small cord is wound around the pole and fastened to the
screen door. The rod supports are fixed near the top of the door
frame.—Josef H. Noyes, Paris, Tex.
Porch Swing Made from Automobile Seat
The Seat Discarded from a Rebuilt Car was Put to Good Use
When an obsolete type of automobile was converted into a truck
for marketing purposes, a leather-upholstered seat, discarded, was
utilized as an attractive and comfortable porch swing. Hooks were
secured to the front corners of the seat and to the upper edge of the
back, for the chains attached to suitable supports.—George L.
Ayers, Washington, D. C.
Linoleum Panels for a Homemade Chest
A strong packing box was converted into a useful and not
unsightly chest by covering it with panels of linoleum left over from a
job of covering a floor. Strips, ¹⁄₂ by 2 in. wide, were nailed around
the corners of the box to form a panel on the top, sides, and ends.
The wood and the linoleum were shellacked, and made a good
appearance.

¶The lower corner of an envelope may be used as a small funnel.


Camera for Bird Photography
Bird Images Large Enough to Show Identification Markings are Obtained
with This Camera

A reasonably large image must be obtained in photographing bird


life, or the details of plumage and identification are lost, reducing the
value of the pictures. The “gun camera” shown in the photograph
was devised for this purpose, and with it exposures may be made
more quickly than with the telephoto type of camera, a feature of
great value in this class of photography. The device consists of an
ordinary reflecting-type camera, mounted on a carriage for ready
portability and quick adjustment. The bellows is supplemented with a
tube, permitting the use of lenses of upward of 30-in. extreme focus.
This gives a larger image without loss in speed. A ¹⁄₄-in. image of a
bird was obtained with a 7¹⁄₂-in. extreme-focus lens, as against a 2-
in. image with one of 30-in. focus, from the same position. The lens
is set near the rear end of the tube, giving a deep hood for shading
the sunlight. Lenses of an old type, known as “Long Toms,” were
used. They are inexpensive compared with newer types with iris
diaphragms, and give good results even at ¹⁄₁,₀₀₀ exposures.—Arthur
Farland, New Orleans, La.
Electric Fan an Aid to Heating Room
The electric fan is useful not only for cooling the air in summer, but
also for distributing the warm air to advantage in the winter. An
efficient way of warming a room fairly uniformly is to place an electric
fan near a radiator, so that its breeze passes through the heating
coils, or near another source of heat. The heat is circulated around
the room, instead of being kept in a limited area.—Peter J. M. Clute,
Schenectady, N. Y.
Cat-and-Bells Scarecrow
A scarecrow resembling a living animal is often more effective
than other devices, and the cat-and-bells arrangement shown in the
sketch was found especially so. The hide of a cat was stretched over
a hollow frame and suspended by a cord from a large weather vane.
Several bells were attached to the cord, and when the vane shifts in
the wind, the movement of the hide and the rattling of the bells
combine to frighten the birds.—F. H. Sweet, Waynesboro, Va.

¶A coating of five parts of coal tar, one part gasoline, and one part
japan drier will make canvas nearly water-tight.
A Small Hydraulic Turbine
By FRANK D. BELL

C onsiderable power and speed can be developed under ordinary


water-supply pressure by the turbine, or water motor, shown in
the sketch and detailed in the working drawings. The parts are of
simple construction, and the machine may be assembled or taken
down easily. It is useful for either belt or direct connection to
electrical generators, small machines, etc., the direct connection
being preferable for a generator. The wheel is built up of sheet metal
and provided with curved buckets set in the saw-tooth edge. The
water is admitted through an opening in the lower part of the housing
and passes out at the opposite end into a suitable drain pipe. The
housing is made of two sections, the main casting and a cover plate.
Bearings for the shaft are cast into the housing, which is reinforced
on the back by ribs radiating from the center.
View of the Water Turbine with the Cover Plate Removed, Showing Inlet and
Drain

Wooden patterns are made for the housing, the main casting and
the cover plate being cast separately. The pattern for the cover plate
should provide for the bearing lug, as shown in the sectional detail,
and for the angle forming a support at the bottom. Special attention
should be given to allowance for proper draft in making the pattern
for the main casting; that is, the edges of the reinforcing ribs, and the
sides of the shell should be tapered slightly to make removal from
the sand convenient. The advice of a patternmaker will be helpful to

You might also like