Data Sources

Data Migration Services and Information works with many different data sources, formats and schemas. In general, as long as your data is encoded properly and accessible programmatically (meaning it is not encrypted by a proprietary algorithm, and not stored in an unaccessible binary format), then we can work with your data. We have the capabilities to write custom algorithms and programs to ingest your data, analyze it, and understand it. We are then able to offer services such as data migrations, we can migrate your records or files (or convert to them a different format), as well as provide expert level analysis and/or reports for your data.

Below are tables that describe some of the most common data sources we work with:


CSV, XML and JSON Data

Data Source Description Important Considerations Example
CSV CSV files are Comma Serparated Value format files; however, the delimiter doesn't necessarily have to be a comma, as we often see semi-colons or other delimiters used.
Data within CSV files should be properly escaped and properly enclosed. For example, if the delimiter is a comma, but a value within the file contains a comma (eg, color = reg, orange, blue), the value "red, orange, blue" should be enclosed, typically with double quotes. But because a value could have a double quote within it, eg 1/8" would mean 1/8 of an inch, it needs to be escaped with a backslash or other escape character.

Note that even if your data doesn't conform to this, we can still work with. Typically we would perform a cleansing operation to fix the missing enclosures or escape characters via an algorithm.

id,product code,name,qty
1,5689,Hat,1
2,1358,Shirt,2
3,9956,Jeans,1
4,4645,Shoes,1
XML XML files are eXtensible Markup Language files. Many systems will import and export XML files. If a system's preferred method for data interaction is XML, then that system will also typically have an XSD (XML Schema Definition) file, which describes the structure of the XML files, including what elements are allowed, where they are allowed, how many times, and the same for attributes, etc. It's important to ensure that the XML is both well-formed (meaning it's actually a complete XML document, without any syntax errors), as well as valid (if there is an XSD to validate the data against). Believe it or not, we have actually seen cases where exports from enterprise systems produce XML documents that are either invalid, or not well formed (in certain rare cases). While most service providers would not perform an analysis and overlook these outliers, we always perform an analysis on all data sets provided to us, and we would be able to locate and find these cases, so that we can better handle them during any service work we provide for our clients. <items> <item id="0001" type="donut"> <name>Cake</name> <ppu>0.55</ppu> <batters> <batter id="1001">Regular</batter> <batter id="1002">Chocolate</batter> <batter id="1003">Blueberry</batter> </batters> <topping id="5001">None</topping> <topping id="5002">Glazed</topping> <topping id="5005">Sugar</topping> <topping id="5006">Sprinkles</topping> <topping id="5003">Chocolate</topping> <topping id="5004">Maple</topping> </item> </items>
JSON JSON files are JavaScript Object Notation Files. Although originally invented specifically for working and interactive with JavaScript, JSON has become a very important data format recently. Most APIs that used to be SOAP and XML based are converting over to REST and JSON. While XML files can have an XSD file that can describe the structure and format of the data, JSON does not have have a comparable type of schema file to describe its structure. { "colors": [ { "color": "black", "category": "hue", "type": "primary", "code": { "rgba": [255,255,255,1], "hex": "#000" } }, { "color": "white", "category": "value", "code": { "rgba": [0,0,0,1], "hex": "#FFF" } }, ] }

Back to top


Database (SQL, MySQL, MSSQL, etc)

Data Source Description Important Considerations Example
SQL SQL is the acronym for Structured Query Language. There are many different kinds of databases, and many of them have an implementation of SQL. SQL is one of the best languages for analyzing datasets. When working with SQL, it's important to consider reserved words. For example, key is a reserved word, so if you have a database column called key, it must always be interacted with by wrapping ` characters around the field, such as `key`. SQL data is tabular / table data. Any table could be considered SQL data.
MySQL MySQL is a free, open source database implementation that uses a form of SQL. Most SQL is very similar, although there can be slight differences between databases such as MySQL's SQL and MSSQL's SQL. It's important to consider whether your database should be MyISAM or InnoDB. To learn more about these different types of databases, check out MyISAM to InnoDB. SQL data is tabular / table data. Any table could be considered SQL data.
MSSQL MSSQL is Microsoft's propriety SQL Server Database. N/A SQL data is tabular / table data. Any table could be considered SQL data.
Oracle Oracle is Oracle's proprietary SQL Database. N/A SQL data is tabular / table data. Any table could be considered SQL data.
MS Access MS Access is a very old database from Microsoft. If you are currently running an MS Access database, we would highly recommend considering upgrading to a new and better database. MS Access data is tabular / table data.
Other Are you using a different database that isn't listed here, and need migration, analysis or reporting services? If so, feel free to contact us for a free consultation. N/A N/A

Back to top


Data on Hard Drives

Data Source Description Important Considerations Example
Data on a Hard Drive Data on a hard drive is essentially any data that is stored on a hard drive. The data can be inside a database on a hard drive, or it can simply be data distributed across the hard drive inside of many different files across many different folders. When working with hard drive data, it's important to consider what data is useful, and what data should be omitted. Whenever we receive an external hard drive of data, it's very rarely the case that all of the data on the hard drive needs to be migrated or worked with. One of our first tasks is usually to ingest, store and profile all of the data on the hard drive, and then work with our client to try and figure out what data is meaningful versus what data should be omitted. This can be based on a variety of conditions, such as file type or extension, what directories the data is in, the schema of the files, etc.

Back to top


ERP Data

Data Source Description Important Considerations Example
SAP SAP is a Systems, Applications and Products ERP System. One important thing to consider when working with SAP data is, how easily accessible is the data? We are usually not provided direct access to the database, and thus, the data must be accessed either by Remote Function Calls (RFCs), or through some other API. Sometimes an api must be developed specifically to provide us access to the SAP data. SAP Data is table data
Sage MAS 90/200 MAS 90/200 is Sage's ERP System As with SAP, it's important to consider how easily the data can be accessed programmatically. MAS 90/200 is table data

Back to top


Parts Catalog Systems

Data Source Description Important Considerations Example
Catbase Catbase was a parts catalog publishing and maintenance program written by a company called Nova. The company has since been disbanded, and its technology has been purchased by a newer Parts Catalog authoring provider. Catbase data is both data within an MS Access database, as well as pseudo-encrypted image files. The image files are not really encrypted, but rather an encryption-through-obscurity type of method, where the developers of Catbase did not want other developers to be able to easily work with or migrate their data. Catbase data contains tables with names such as:

SSE
SSEDRAW
DICT
etc

The image files might look like random filenames with random extensions, such as:

KRIL0usiosoi.ikl
KRILu109soi.9so

Once decrypted though, the image files should look like:
000001.TIF
000002.TIF
000003.TIF
...etc
Mincom/Ventyx Link One Link One was a 1990's parts catalog software built mostly for the mining industry. Companies such as Joy Global (now Komatsu) and others would use Link One to author their parts catalogs and then distribute them to their customers/mines via a CD. Link One data is generally stored within either LDFs (List Definition Files) or .BLI files, which are proprietary and binary files that cannot be worked with from an outside service provider. If you are using LinkOne and need your data migrated, you must first check that you have the data in .LDF files, which are the source and text-readable files that can be programmatically worked with. Intro.LDF
10012.LDF
20013.LDF
30044.LDF
...etc
SparePort2 SparePort2 is a parts catalog system that was developed overseas (we believe in Europe). The data that we have worked with is generally XML data, that uses some kind of dml model/specification. We have found that when data is exported out of SparePort2 and into XML format, if the incorrect language or other export options are not accurately chosen, then the exported data can have issues. SparePort2 generally has XML and RDF files. The RDF files are also XML files, but they serve more as of a metadata type of file for each file's corresponding assembly XML file.
Documoto Documoto is a new Software as a Service (SaaS) provider that offers a new online authoring solutions for Parts Catalogs. N/A 10012.XML
20013.XML
..etc

Back to top


PLM / PDM Data

Data Source Description Important Considerations Example
PLM Product Lifecycle Management systems go beyond just dealing with CAD data and specs, but rather is used as a system to encapsulate the entire lifecycle of a product. N/A Some examples are: Teamcenter, Propel, Oracle Agile, Autodesk Fusion Lifecycle, Dessault Enovia, etc
PDM Product Data Management systems are typically used to store CAD and other information related to product design. N/A Some examples are: ActiVault, CMPRO, Plytix Index, PDXpert, Salsify, etc

Back to top


Zip or Archive Packages

Data Source Description Important Considerations Example
Zip or Archive File Any zip or archive file that contains files and folders (and possibly more sub-files and sub-folders) containing data. When working with data that exists within files within a zip file, it is important to consider what data is pertinent / important, versus what data can be ignored. Data.zip
Data.tar
Data.gz
Data.7z
Data.rar

Back to top


Other Data Source Not Listed

Below are some other data sources that we can work with:

Data Source Description Important Considerations Example
MIF MIF files are Framemaker files. Although they aren't really structured data files (or at least not structured neatly), we can still programmatically interact with them and extract, ingest, analyze and migrate data from MIF files. MIF files can be difficult to work with. A good understanding of how each MIF file relates to other MIF files must be understood. <MIFFile 8.00> <Para <Paraline <String `Hello, world!> > <String `Ø&#946;¿'> >
RDF RDF files are Resource Description Framework files. It is an XML file, but a specific type of XML file that is meant to describe data and to be used by computers, rather than intelligently read by humans. It's important to verify the namespaces that are being used within RDF XML documents / files. <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:si="https://www.w3schools.com/rdf/"> <rdf:Description rdf:about="https://www.w3schools.com"> <si:title>W3Schools</si:title> <si:author>Jan Egil Refsnes</si:author> </rdf:Description> </rdf:RDF>
Other? Do you have some other data format that is not mentioned here? Let us know and we'll see if we can work with it! We can ingest, analyze and migrate almost any data that we can access programmatically. You show us

Back to top