Skip to content

Combining studies with Research Databases

Do you have to change your database often in your study or project? Exploration means you may not know where you will end up. Do you have to combine multiple studies, but keep track of each piece of data separately – where it came from, what conditions, what equipment? Do you have to combine data from different types of databases, like XML or spreadsheets or database tables? These are all reasons for using a Research Database system.

Datura now offers Legume Research DB for SQL Server. This product turns Microsoft SQL Server into a Research Database system. Legume Research DB comes with tools to import the contents of spreadsheets, database tables, XML files, and more, so that you can combine the data from different sources into one common database. Each piece of data can be tagged with the name and source file and the system automatically adapts to the data being imported. Use SQL to analyze data across studies and between studies. To learn more, click here.


A Master Reference to control discrepancies

We’ve seen engineers who keep their bills of materials and equipment lists in spreadsheets. We’ve seen engineers purchase expensive analysis software for the input database it has built into it. We’ve seen engineers try to build databases in Access so they can control the database, but never complete the task enough to use it in time for it to be helpful. We’ve seen engineers make expensive mistakes because of the discrepancies between their drawings and analysis files and equipment lists . These are all reasons for using a Master Reference system.

Datura now offers Legume Master Reference for SQL Server. This product turns Microsoft SQL Server into a Master Reference system. Legume Master Reference comes with tools to import the contents of spreadsheets, database tables, XML files, and more, so that you can combine the data from different sources into one common database. Each piece of data can be tagged with the name and source file and the system automatically adapts to the data being imported. Using the data, you can look for discrepancies and generate equipment lists or bills of material. To learn more, click here.


Design Libraries – ready for any thing

Every time you start a new project, there is a good chance that you will re-enter the technical data for all of the equipment and materials in the project, even though someone probably already entered the same data in a previous project. There are several reasons for this:

  • You don’t know the source and validity of the data in the previous project.
  • The previous project had different customer requirements, standards and units, which changed the column content.
  • New equipment brings new options and capabilities.

These difficulties have made engineering more expensive than it needs to be, forcing engineers into using spreadsheets as databases to give them flexibility, but repeating a lot of work over and over. This is the reason for using a Design Library system.

Datura now offers Legume Design Library for SQL Server. This product turns Microsoft SQL Server into a Design Library system. Legume Design Library comes with tools to import the contents of spreadsheets, database tables, XML files, and more, so that you can combine the data from past projects into one common database. Each piece of data can be tagged with the project name and source file and the system automatically adapts to the data being imported. The data is time-stamped and given identifiers that make it easier to distribute to other team members more easily. Each change is archived to make it easier to audit your changes or reverse them. To learn more, click here.


B-52s and Repositories for Long-term Archival (LTA)

Can you get at the data from a project you worked on two years ago? How about 5 years ago? How about an analysis file from 20 years ago? Before you think that’s unlikely, the Space Shuttle program operated over a 30-year lifespan, and the B-52 is still in service after 60 years and it is not alone among critical manufactured goods. Ships, buildings and facilities last even longer.

One of the reasons it becomes difficult to view engineering data from the past is because most of it is stored in a proprietary format that contains a variety of structures, such as records inside of records inside of records. To read the data, you must use the original software, which may cost thousands of dollars or may not be available any longer. This is the reason for using a Long-Term Archival system.

Datura now offers Legume Repository for SQL Server. This product turns Microsoft SQL Server into a Long-Term Archival system (LTA). Legume Repository comes with tools to import the contents of spreadsheets, database tables, XML files, and more.  Each piece of data can be tagged with the project name and source file and the system automatically adapts to the data being imported. To learn more, click here.

The Memex has been Missing a Database – and it’s finally here!

The Memex is a machine that was described in Vannevar Bush‘s July 1945 Atlantic Monthly article, “As We May Think“. Before the end of World War II, President Truman’s Director of Scientific Research wrote this article to describe how we could use technologies and ideas developed during the war years to improve scientific research and collaboration. He describes how scientists were forced into collaboration for the war effort, and the benefits of the accelerated development through collaboration were so great that he envisioned tools that would make them work faster and share faster. Part of this vision included what we would now describe as stereo-headcams and microphones on each scientist to record their observations. All information would be transferred to another part – a microfilm viewing system. Another part was the Memex, short for “memory-index”, which allowed scientists to link together an index of related microfilm pages in the way you link things in your memory, using what we now call hypertext links. These links would be created by researchers of a particular subject, and could be followed in order on a thread of connections. A scientist could share the links with another researcher, who could add to more. Sound familiar? No wonder the Memex was considered an ancestor of the World Wide Web.

As useful as the Web is, pages of text don’t really store knowledge in an easily processed way. This has been helped somewhat with the Semantic Web, using simple sentences stored in XML text to bind pages or concepts together with more knowledgeable links. It can be used to store data, but the simple sentences, called triples, used have diffficulty handling data models of any complexity with much efficiency.

Last year, in the Communications of the ACM, Feb 2011, the article “Still Building the Memex“, outlines the different technologies that are attempting to fulfill the rest of the promise of the Memex – a flexible, personal database that lets us “remember” and organize things in as flexible and interconnected a manner as your memory. They review the products, and how they organize data into trees and graphs, spatially, categories, “zigzags” (link threads), and transclusions, which resemble object linking and embedding (such as a picture in a bar chart in a spreadsheet embedded in word processing document). The authors identify what’s missing – a database to store it in and the ability to handle all of the organization types simultaneously and seamlessly.

Welcome the Semantic-Relational Database to the Memex. It has exactly the characteristics needed for the full Memex. Using any database that you can store a table, row, or record, you can store any set of data as sentences that include pictures and documents as “word-phrases”. This is the same flexible conceptual method we use to design any data model and structure, but which is always mapped to a table or object. Actually storing as sentences had not yet been the direct implementation, short of the first steps taken with the simpler Semantic Web design with semantic “triples” of Subject-Verb-Object. The Semantic-Relational Database uses full sentences, which makes it multi-dimensional, is self-referential, and this flexiblity lets it handle any organizational structure needed.

The Data Chain – Combining Data with Knowledge

A semantic-relational database (SRDB) can be used to store the following style of nested information to any depth you want:

  • Ringo Starr said [John Lennon said [Brian Jones said […]]]

Combining this capability with the flexibility of an SRDB, one can create a new kind of data system – one that tracks data like products in a supply chain – and more. Let’s take an engineering example:

Several years ago, a new occupational safety rule required electrical workers to wear protective apparel when working near live equipment while other workers must be kept back the necessary safe distance. This new Arc Flash requirement forced facilities to re-analyze their electrical systems, calculate the effects of Arc Flash, and post labels at each panel that inform workers of the necessary apparel and distances. During the early phase of this rule, it was realized that many analyses were being done with data that was outdated, invalid, or chosen for use in the proper sizing of the equipment by a previous engineer who did the design – not for the proper analysis of Arc Flash safety. Furthermore, when the previous engineer did their analysis, much of the data used in the analysis would be “default” data that the analysis program’s data entry form fills in automatically. When this happens, there is rarely a record of whether the data is “default” or from the nameplate or the manufacturer’s technical specifications. Even if the data has been entered from the technical specifications, it may have been converted from horsepower (mechanical power) to amps (electrical current) because that is how the analysis program requests it, while the original values may have been lost over time. Another complication comes when the user selected from one of several values to give a worst-case for that analysis, but not worst-case for another analysis, and losing the other value possibilities along the way. Additionally, analyses aren’t done alone – they occur in groups, based on configurations and alternative designs.

If the data is stored in an SRDB, the situation improves tremendously. We can build the information in layers, connected from source data to analysis to knowledge, like the chain from factory to distributor to customer. As the data is collected and processed, it can be tagged or annotated automatically or can be prompted from the user. Let’s see how this works in our Arc Flash example:

  1. we start our site data collection with drawings, loading the documents as document phrases, and we use sentences to annotate the documents with source information (what is the document about, its date and time, the user who created, and the user who loaded it, etc.)
  2. we walk through the site and we take pictures of the equipment and nameplates, load those documents, and annotate them
  3. we collect the technical documents of each piece of equipment, and load and annotate those documents
  4. we collect strip charts and other data collections, and load and annotate those documents
  5. we collect any previous analysis data, and load and annotate those documents

Now that we have the RAW DATA, we can develop the INFORMATION we need from that raw data for our analysis.

  1. Any image can be scanned with optical-character recognition OCR software into chunks of searchable text, each of which can be loaded as a text phrase and connected to the picture with a sentence indicating it as an OCR scan of that picture. This chunk will tagged or annotated as created by the user who performed this OCR scan at a particular date-time.
  2. Any spreadsheet or standard data file can be imported into the database as phrases and sentences, and each piece is tagged as coming from that imported file by that user. If an error is made in the import, one can find and remove all of those sentences and try again.
  3. At every level, the data can be checked by a supervisor and let them sign off on it. One can annotate the data with a confidence level of probability (80% sure), an error value (+4, -2%), or “guess-timate” range, which can carry forward into the analyses.

Now we have scanned, imported, and converted all of our data as far as we can automatically. The analysis program we want to use this data will often the data to be processed into a specific format – amps, not milliamps – volt, not kilovolts – etc.

  1. Using a variety of tools, from data entry forms to SQL calculated views of existing data, we can create the set of sentences that represent the input data for an analysis.
  2. Results and logs from an analysis can be put back into the database, and tagged with the date-time. Thanks to the incremental, non-destructive nature of the an SRDB, a snapshot view can be seen for the point-in-time that analysis was done.
  3. Using processes and triggers, we can make the database automatically update it’s content as changes are made.
  4. Configuration states can be defined on the fly, used to tag data, and those tags used to filter the selection to be used for analysis
  5. Since the data changes are incremental, you can define a set of projects that are incremental changes of a base project frozen at a particular point-in-time, lessening the need to duplicate projects or publish snapshots.
  6. Over time, as the same equipment is used from one project to the next, time is saved as the user gets to use data that has been validated already by a known entity.
  7. As experience is generated with equipment, recommendations for usage and other notes can be added, turning the database into a knowledgebase.

With this type of system, you can generate a list of any data entered and calculated by any user or program, you can get a list of data based on its document source, whether it has been signed off by the supervisor. With this type of system, you can point at any value and burrow down to see where it came from, and even get the notes from the engineer on why they used a particular value. When changes are made, the old values are still there so an audit trail is maintained. The ultimate engineering database.

Expanding the Syntax

So far, we’ve only considered three syntax parts to a sentence – Subject, Verb, and Object. We are not limited to these three parts. By using a Syntax Table, we can expand sentences to include more syntax parts.


By adding more syntax positions, we can handle more subtleties of communication, like using a Passive Verb or Punctuation.

Subsentences With Extended Syntax

Expanding the syntax of a sentence is another way to put metadata into a sentence. Using syntax for metadata can save space, but not all applications may know what to do with the new syntax positions.

Why use a Semantic-Relational Database Design?

OK – this is all very clever (or not), you may ask, but WHY? SO WHAT? Why do all of this? What are the advantages to doing it this way instead of the traditional way?

In a previous post, we defined situations that that caused the rejection of traditional relational database design. The relational database is a powerful data engine with many available tools that can be very useful, but as we have seen, it can have difficulty with certain situations. If we can find a way to consolidate the data world…

Our goal – A relational database design that can do one or more of the following:

  1. It is one, pre-designed, relational database schema, that can handle any combination of data models and meta-data models, without foreknowledge of what it will have to store in the future, so the schema is not changing every update;
  2. Where each column and value can be given a universal, time-sensitive key, making it easier to share vocabularies, data, metadata, and control its flow across an enterprise, and where each key can be used to match translations in different languages;
  3. Where data can be built and modified in a non-destructive, incremental manner, where a new project can be defined to be based on another project’s dataset, at a given point-in-time, even though the base project is still changing – non-destructively, incrementally changing;
  4. Where each piece of data can have many items of meta-data – enough to be able to track the history and use of the data in a “workflow” – where did the data come from, who collected it, what device collected it, what location, who validated it, who checked it, was it a default value from an ANSI or ISO standard or the manufacturer’s specification or a measured value;
  5. Where data in a common library can be used by reference as instances in a project, carrying the library data, yet be able to override individual values used in the instances;
  6. Where users can create new relations, new attributes, new fields, new ways of organizing, based on the data entered and not on the data structure;
  7. That accommodates and records the conceptual structure and information of any data model, without losing vital information lost in a transformation.

What kind of applications are these characteristics useful for:

Engineering Libraries including

  1. Manufacturer’s Catalog Data with Engineering Information and Specifications, Curves, and Graphics
  2. ANSI and ISO Standards and Tables
  3. Site collection  and metering data
  4. Geometry and Geography data
  5. Internationalized and translated data

Engineering Project Management and project data

Medical and Bioinformatics Databases

Data Warehousing Applications for Business Intelligence

Databases for Multi-dimensional Analysis, CAD, CAM, and CAE

Consolidating proprietary data, drawing data, and other databases, for project error-checking and bill of materials matching.

Transfering or migrating data from other application’s datasets, such as legacy data.

We have discussed some of this in other posts so far, and we will discuss more in the future.

When are traditional relational database designs rejected?

As you may know, in engineering and design applications, object databases and XML-based designs are being chosen over traditional relational database designs, because of the need for the flexibility of having thousands of combinations of records, grouped and inter-related with each other in many different and unpredictable ways. Each value may come in different units of measurement, or based on different standards. All of the variation and variety of the world leads to many possible options of form and precision that are important to capture. This is difficult to do in a traditional relational database design, because the rules of normalization will direct the creation of thousands of tables and subtables, with many columns to handle all of the possible attributes one might need, even if rarely.

As flexible as these systems are, they have limitations of various types, such as speed or portability or problems inherent in encapsulation of objects. Besides objects, there are other data models to consider – a knowledgebase of rules, mathematical formulas, control diagrams, and more. Multiple inheritances and interfaces are also important, as well as multiple platforms and the long-term sustainability of the data.

The fracturing of data systems add to the difficulties by applying different search methods and querying languages across multiple data domains.

A Convergence of Data Models in Semantic-Relational Databases

One of the best things about Semantic-Relational Databases is how easy it is to map your data model to a sentence. Think about it – how does all of your training for a particular data model start? To consolidate: “Start with the sentences that describe the conceptual processes, things, actions, etc … and convert the sentences into entities, relations, objects, etc.”

Let’s survey how different models map to semantic-relational databases:

  1. Entity-Relational Data Modeling – the entities become the nouns, subjects and objects, of the sentences. The relations become the verbs. An attribute can be considered as a relationship to a value, so the attribute or column heading become verbs, while the cell value becomes the object of the sentence. Multiple primary keys become subjects with modifiers. Multiple columns that work together become verbs or objects with modifiers.
  2. Object Data Modeling – the primary key becomes the subject. Each field becomes a verb, and the value of each field becomes the object of the verb. Fields that are arrays or collections become verbs with modifiers.
  3. Semantic Data Modeling, Semantic Network, or Semantic Web – this one is simple because subjects become subjects, verbs (predicates) become verbs, predicate objects (direct or indirect objects) become objects of verbs. Not be confusing, but, the terms are used interchangeably and the word semantic “object” is not the same as the object-programming “object’. This is also called the “semantic triple”.
  4. Mathematical Modeling – as anyone knows who has done story problems in math, the sentences are converted to formulas and formula trees.

Formulas in Sentences