How to Upload Pre-Labeled Data | CrowdAI Knowledge Base

You may have labeled data from an open-source dataset or from your own work elsewhere and want to upload that to our platform to train a model or boost model performance by adding more data without the legwork of labeling yourself – that’s great! —here is a step-by-step guide of how you can upload a pre-labeled dataset.

Before you start this process there are a few things you will need ready to go:

A dataset
Your labels in MSCOCO or PASCAL in their respective file format.
1. MSCOCO will be ONLY 1 JSON file
2. PASCAL will be multiple XML files (1 per media)
Take note of the categories within your labels – you will need to replicate the categories in your project

Steps to upload the labeled data:

Create a dataset as normal
Upload your media to the dataset as normal
1. Currently works with imagery only; full motion video label ingest is in the works!
Create a project & add your dataset to the project
1. Note: Make sure you create a project with the same model architecture in which your labels were created. Ex. if you have segmentation/polygon labels, you must create a segmentation project. If you have bounding box labels, you must create an object detection project.
Navigate to the annotate tab in your project
Create the same EXACT categories for your project which are seen in your labels
1. If your project is object detection, choose a rectangle label type. If your project is segmentation, choose a polygon label type.
2. Category nesting is not allowed here, so just create the individual class and ignore the hierarchy in your JSON or XML files.
4. Highlighted in Yellow are the categories in the JSON – these are the SAME EXACT names you will need to use when building categories in the platform.
5. Highlighted in Red are the "supercategories" which we refer to as category nesting – DO NOT use these, category nesting is ignored for labeled data ingest.
Add the categories to your project
1. Note: if you do not add the categories first, then when you ingest the labels, nothing will happen.
Navigate back to the overview tab and select the “Ingest Labels into Project” link under “Resources” box.
Choose COCO or PASCAL from the drop down
Then upload your respective JSON (for COCO) or XML (for PASCAL) file(s)
Once you click on the files, they will automatically upload. You will see a green notification at the top.
From here you can navigate to the annotate tab and see that your labels have been imported!

A few important things to note:

Pre-labeled media will be ingested into Phase 3 for one round of QA (Phases 1 & 2 are non-existent since that work has already been done).
1. From here you can edit the labels in Phase 3 or fast-forward all Phase 3 tasks to “complete workflow” and start training your model. More details on how to do this in another article.
2. If you need to do more rounds of QA, you can rewind your tasks back to Phase 3 and keep the existing labels. This will save any changes you made in the first round of Phase 3 and allow you to edit more on a second round of Phase 3.
The file names of the media you upload and the file names of each media in the JSON or XML file need to remain the SAME! This is crucial as this is how we match up labels to media in the platform.
2. Here the file name is highlighted in Yellow – this should correspond exactly to a file name in your media, this is how we pair the label with the correct media.
This workflow works for both instance and semantic segmentation as well as object detection.
To mention again, we do not allow category nesting for label ingest. In your JSON or XML files there might be some hierarchy of categories but for label ingest to work in the platform please create a separate category for each class and DO NOT create nested categories in the platform.

Let’s take a look at COCO labels in a JSON file for context:

In Green: you will see some information about the JSON file, including licenses (which are needed to use the intended file – you do not have to input these anywhere on the platform)
In Yellow: you will see the information about the labeled categories names
In Blue: you will see information on each image/media in your JSON – these are where the file names must match the file names you uploaded in your dataset.
In Pink: you will see information about each label. Here you can see the labels are Segmentation labels. In this case we would need to use a segmentation model architecture when building the project and then create polygon categories to correspond to the pre-existing segmentation labels.