Skip to content

Combining studies with Research Databases

Do you have to change your database often in your study or project? Exploration means you may not know where you will end up. Do you have to combine multiple studies, but keep track of each piece of data separately – where it came from, what conditions, what equipment? Do you have to combine data from different types of databases, like XML or spreadsheets or database tables? These are all reasons for using a Research Database system.

Datura now offers Legume Research DB for SQL Server. This product turns Microsoft SQL Server into a Research Database system. Legume Research DB comes with tools to import the contents of spreadsheets, database tables, XML files, and more, so that you can combine the data from different sources into one common database. Each piece of data can be tagged with the name and source file and the system automatically adapts to the data being imported. Use SQL to analyze data across studies and between studies. To learn more, click here.


A Master Reference to control discrepancies

We’ve seen engineers who keep their bills of materials and equipment lists in spreadsheets. We’ve seen engineers purchase expensive analysis software for the input database it has built into it. We’ve seen engineers try to build databases in Access so they can control the database, but never complete the task enough to use it in time for it to be helpful. We’ve seen engineers make expensive mistakes because of the discrepancies between their drawings and analysis files and equipment lists . These are all reasons for using a Master Reference system.

Datura now offers Legume Master Reference for SQL Server. This product turns Microsoft SQL Server into a Master Reference system. Legume Master Reference comes with tools to import the contents of spreadsheets, database tables, XML files, and more, so that you can combine the data from different sources into one common database. Each piece of data can be tagged with the name and source file and the system automatically adapts to the data being imported. Using the data, you can look for discrepancies and generate equipment lists or bills of material. To learn more, click here.


Design Libraries – ready for any thing

Every time you start a new project, there is a good chance that you will re-enter the technical data for all of the equipment and materials in the project, even though someone probably already entered the same data in a previous project. There are several reasons for this:

  • You don’t know the source and validity of the data in the previous project.
  • The previous project had different customer requirements, standards and units, which changed the column content.
  • New equipment brings new options and capabilities.

These difficulties have made engineering more expensive than it needs to be, forcing engineers into using spreadsheets as databases to give them flexibility, but repeating a lot of work over and over. This is the reason for using a Design Library system.

Datura now offers Legume Design Library for SQL Server. This product turns Microsoft SQL Server into a Design Library system. Legume Design Library comes with tools to import the contents of spreadsheets, database tables, XML files, and more, so that you can combine the data from past projects into one common database. Each piece of data can be tagged with the project name and source file and the system automatically adapts to the data being imported. The data is time-stamped and given identifiers that make it easier to distribute to other team members more easily. Each change is archived to make it easier to audit your changes or reverse them. To learn more, click here.


B-52s and Repositories for Long-term Archival (LTA)

Can you get at the data from a project you worked on two years ago? How about 5 years ago? How about an analysis file from 20 years ago? Before you think that’s unlikely, the Space Shuttle program operated over a 30-year lifespan, and the B-52 is still in service after 60 years and it is not alone among critical manufactured goods. Ships, buildings and facilities last even longer.

One of the reasons it becomes difficult to view engineering data from the past is because most of it is stored in a proprietary format that contains a variety of structures, such as records inside of records inside of records. To read the data, you must use the original software, which may cost thousands of dollars or may not be available any longer. This is the reason for using a Long-Term Archival system.

Datura now offers Legume Repository for SQL Server. This product turns Microsoft SQL Server into a Long-Term Archival system (LTA). Legume Repository comes with tools to import the contents of spreadsheets, database tables, XML files, and more.  Each piece of data can be tagged with the project name and source file and the system automatically adapts to the data being imported. To learn more, click here.

The Memex has been Missing a Database – and it’s finally here!

The Memex is a machine that was described in Vannevar Bush‘s July 1945 Atlantic Monthly article, “As We May Think“. Before the end of World War II, President Truman’s Director of Scientific Research wrote this article to describe how we could use technologies and ideas developed during the war years to improve scientific research and collaboration. He describes how scientists were forced into collaboration for the war effort, and the benefits of the accelerated development through collaboration were so great that he envisioned tools that would make them work faster and share faster. Part of this vision included what we would now describe as stereo-headcams and microphones on each scientist to record their observations. All information would be transferred to another part – a microfilm viewing system. Another part was the Memex, short for “memory-index”, which allowed scientists to link together an index of related microfilm pages in the way you link things in your memory, using what we now call hypertext links. These links would be created by researchers of a particular subject, and could be followed in order on a thread of connections. A scientist could share the links with another researcher, who could add to more. Sound familiar? No wonder the Memex was considered an ancestor of the World Wide Web.

As useful as the Web is, pages of text don’t really store knowledge in an easily processed way. This has been helped somewhat with the Semantic Web, using simple sentences stored in XML text to bind pages or concepts together with more knowledgeable links. It can be used to store data, but the simple sentences, called triples, used have diffficulty handling data models of any complexity with much efficiency.

Last year, in the Communications of the ACM, Feb 2011, the article “Still Building the Memex“, outlines the different technologies that are attempting to fulfill the rest of the promise of the Memex – a flexible, personal database that lets us “remember” and organize things in as flexible and interconnected a manner as your memory. They review the products, and how they organize data into trees and graphs, spatially, categories, “zigzags” (link threads), and transclusions, which resemble object linking and embedding (such as a picture in a bar chart in a spreadsheet embedded in word processing document). The authors identify what’s missing – a database to store it in and the ability to handle all of the organization types simultaneously and seamlessly.

Welcome the Semantic-Relational Database to the Memex. It has exactly the characteristics needed for the full Memex. Using any database that you can store a table, row, or record, you can store any set of data as sentences that include pictures and documents as “word-phrases”. This is the same flexible conceptual method we use to design any data model and structure, but which is always mapped to a table or object. Actually storing as sentences had not yet been the direct implementation, short of the first steps taken with the simpler Semantic Web design with semantic “triples” of Subject-Verb-Object. The Semantic-Relational Database uses full sentences, which makes it multi-dimensional, is self-referential, and this flexiblity lets it handle any organizational structure needed.

The Data Chain – Combining Data with Knowledge

A semantic-relational database (SRDB) can be used to store the following style of nested information to any depth you want:

  • Ringo Starr said [John Lennon said [Brian Jones said […]]]

Combining this capability with the flexibility of an SRDB, one can create a new kind of data system – one that tracks data like products in a supply chain – and more. Let’s take an engineering example:

Several years ago, a new occupational safety rule required electrical workers to wear protective apparel when working near live equipment while other workers must be kept back the necessary safe distance. This new Arc Flash requirement forced facilities to re-analyze their electrical systems, calculate the effects of Arc Flash, and post labels at each panel that inform workers of the necessary apparel and distances. During the early phase of this rule, it was realized that many analyses were being done with data that was outdated, invalid, or chosen for use in the proper sizing of the equipment by a previous engineer who did the design – not for the proper analysis of Arc Flash safety. Furthermore, when the previous engineer did their analysis, much of the data used in the analysis would be “default” data that the analysis program’s data entry form fills in automatically. When this happens, there is rarely a record of whether the data is “default” or from the nameplate or the manufacturer’s technical specifications. Even if the data has been entered from the technical specifications, it may have been converted from horsepower (mechanical power) to amps (electrical current) because that is how the analysis program requests it, while the original values may have been lost over time. Another complication comes when the user selected from one of several values to give a worst-case for that analysis, but not worst-case for another analysis, and losing the other value possibilities along the way. Additionally, analyses aren’t done alone – they occur in groups, based on configurations and alternative designs.

If the data is stored in an SRDB, the situation improves tremendously. We can build the information in layers, connected from source data to analysis to knowledge, like the chain from factory to distributor to customer. As the data is collected and processed, it can be tagged or annotated automatically or can be prompted from the user. Let’s see how this works in our Arc Flash example:

  1. we start our site data collection with drawings, loading the documents as document phrases, and we use sentences to annotate the documents with source information (what is the document about, its date and time, the user who created, and the user who loaded it, etc.)
  2. we walk through the site and we take pictures of the equipment and nameplates, load those documents, and annotate them
  3. we collect the technical documents of each piece of equipment, and load and annotate those documents
  4. we collect strip charts and other data collections, and load and annotate those documents
  5. we collect any previous analysis data, and load and annotate those documents

Now that we have the RAW DATA, we can develop the INFORMATION we need from that raw data for our analysis.

  1. Any image can be scanned with optical-character recognition OCR software into chunks of searchable text, each of which can be loaded as a text phrase and connected to the picture with a sentence indicating it as an OCR scan of that picture. This chunk will tagged or annotated as created by the user who performed this OCR scan at a particular date-time.
  2. Any spreadsheet or standard data file can be imported into the database as phrases and sentences, and each piece is tagged as coming from that imported file by that user. If an error is made in the import, one can find and remove all of those sentences and try again.
  3. At every level, the data can be checked by a supervisor and let them sign off on it. One can annotate the data with a confidence level of probability (80% sure), an error value (+4, -2%), or “guess-timate” range, which can carry forward into the analyses.

Now we have scanned, imported, and converted all of our data as far as we can automatically. The analysis program we want to use this data will often the data to be processed into a specific format – amps, not milliamps – volt, not kilovolts – etc.

  1. Using a variety of tools, from data entry forms to SQL calculated views of existing data, we can create the set of sentences that represent the input data for an analysis.
  2. Results and logs from an analysis can be put back into the database, and tagged with the date-time. Thanks to the incremental, non-destructive nature of the an SRDB, a snapshot view can be seen for the point-in-time that analysis was done.
  3. Using processes and triggers, we can make the database automatically update it’s content as changes are made.
  4. Configuration states can be defined on the fly, used to tag data, and those tags used to filter the selection to be used for analysis
  5. Since the data changes are incremental, you can define a set of projects that are incremental changes of a base project frozen at a particular point-in-time, lessening the need to duplicate projects or publish snapshots.
  6. Over time, as the same equipment is used from one project to the next, time is saved as the user gets to use data that has been validated already by a known entity.
  7. As experience is generated with equipment, recommendations for usage and other notes can be added, turning the database into a knowledgebase.

With this type of system, you can generate a list of any data entered and calculated by any user or program, you can get a list of data based on its document source, whether it has been signed off by the supervisor. With this type of system, you can point at any value and burrow down to see where it came from, and even get the notes from the engineer on why they used a particular value. When changes are made, the old values are still there so an audit trail is maintained. The ultimate engineering database.

Expanding the Syntax

So far, we’ve only considered three syntax parts to a sentence – Subject, Verb, and Object. We are not limited to these three parts. By using a Syntax Table, we can expand sentences to include more syntax parts.


By adding more syntax positions, we can handle more subtleties of communication, like using a Passive Verb or Punctuation.

Subsentences With Extended Syntax

Expanding the syntax of a sentence is another way to put metadata into a sentence. Using syntax for metadata can save space, but not all applications may know what to do with the new syntax positions.