Automatic metadata extraction and authorization of documents with Microsoft Flow
Microsoft SharePoint has become a valuable asset for many companies around the world, which use Office 365 for being more productive, creative, and secure, no matter whether their members are in the office, at a conference in another country or simply at home. Documents concerning the same project or client are often stored in the same SharePoint library. However, employees have diverse permissions and authorizations in terms of the different files. Furthermore, as the number of documents in a library grows, the need for filters rises, sorting documents by specific parameters and even special views to cope with the vast amount of data. Instead of enabling these functionalities by manually processing every new document – inserting its metadata in the library and setting its viewing permissions, the whole procedure can be automated via Microsoft Flow.
How does that process work?
We, at Veroo Consulting, have implemented three different ways to trigger the automated procedure by using:
- an automated trigger “SharePoint – When a file is created (properties only)”. Afterwards, before the processes is started, it is checked, if the file is new and in the right folder.
- a manual trigger, which starts the Flow, when a button in the library is pressed (see Figure 1). The used trigger is “SharePoint – For a selected item”, from which the ID is received and used to get the file properties and to proceed with the process.
- a scheduled trigger, which executes the Flow every hour, for example. In this case, the Flow processes all files from ID x up to ID y. In order to support parallelism, every hour or so, 10 files are updated with toUpdate parameter set to true. After that another Flow with “SharePoint – When a file is created or modified (properties only)” checks for the true value of the toUpdate parameter and performs the procedure. Another way to do this step is by using the HTTP connection to call the second Flow, which, however, is a Premium Connector and should be paid for.
In this way, the best option according to the concrete situation can be selected.
Metadata Extraction
Generally, first a SharePoint library with the necessary parameters (columns) is created, where the files’ metadata will be inserted. For each different authorization a folder is created and it´s permissions are set accordingly, so that only specific accounts can view and edit the contained documents. This folder structure can afterwards be hidden by using a View. Further, the files that will be stored in this library should follow a specific naming structure. For example:
Company/Person; Subject
In this way specific metadata can be extracted from the names of the files and inserted in the columns of the library. This information should be separated with the “;” symbol and should follow a specific order in the name of the document. In the beginning of the Flow, it should be checked if the file is a folder or not, so that mistake cannot occur after a creation of a new folder. That can be easily implemented with a Condition action and an IsFolder SharePoint dynamic content from the trigger action. In case of one “;” the following expression can be used to find its position in the file name: indexOf(outputs(‚File_Name‘),‘;‘). In case of multiple “;”-separated data, the substring expression can be used to extract the characters up to the index and then, the index expression can be used once again on the output. A condition follows – If the index expression outputs -1, then no “;” is found and therefore, no special information is stored in the file name. This is done until all “;” symbols are located and their enclosing information is saved in variables.
Moreover, other important information can also be included in a natural sentence manner in the Subject part of the name. In this case, it is checked with a Condition if the file name contains (i.e. the contains expression) a specific word, if so a library column of the file can be filled with yes/no or a specific word.
In the same manner, calendar dates can also be searched for in the file name and inserted in the library columns. Unfortunately, Microsoft Flow does not support regular expressions, but this can be done by going through every character in the filename and checking for the d(d).d(d).dddd pattern (see Figure 2). The DoUntil action is used with the
or(equals(length(outputs(‚File_Name‘)), variables(‚count‘)), variables(‚isReady‘))
expression, which checks if the whole length of the file name has been gone through or if the date patter has already been detected. The count variable is incremented on every checked character. Important to note is that the DoUntil action in Microsoft Flow is limited on its number of runs and time running, therefore these settings should be carefully selected. Keep in mind that depending on the number of parallel executions of the Flow the time needed for it to finish will also increase!
For the pattern search, it is firstly deduced that characters at specific indexes should be either an integer or a “.” and the last two are integers or empty (see Figure 3).
- First it is checked, if the following character in the pattern should be an integer
contains(variables(‚valuesInteger‘),string(length(variables(‚date‘)))),
In this case it is afterwards checked if it is an integer with the int expression. In case it is, the character is added to the date variable.
- If this is false, a condition is performed. If the following character in the pattern should be an integer or a dot, a
contains(variables(‚valuesIntegerOrDot‘),string(length(variables(‚date‘))))
condition is executed.
-
- If true and it is not an integer, it is checked if it is a dot and added to the date, if yes and there is no dot preceding. If true and it is an integer, it is checked, if there are enough dots in the date and if yes, it is added to the variable.
- If false and the current index of the pattern is not in the valuesIntegerOrDot variable, it is checked if
contains(variables(‚valuesIntegerOrEmpty‘),string(length(variables(‚date‘)))).
-
-
- If true and it is an integer, it is added to the date. If not, the pattern is found and isReady is set to true.
- If false and the length of the date variable is 10, the pattern is found and isReady is set to true. If not, the pattern is not found and isReady is set to false.
-
Before the file properties are updated with the date, the date string should be converted to ISO 8601 in order to be saved in SharePoint. Therefore, first the year length is checked:
length(substring(variables(‚date‘),add(lastIndexOf(variables(‚date‘),‘.‘),1),sub(length(variables(‚date‘)),add(lastIndexOf(variables(‚date‘),‘.‘),1)))).
First, the date string is modified with the following expression:
concat(split(variables(‚date‘),‘.‘)[2],‘-‚,split(variables(‚date‘),‘.‘)[1],‘-‚,split(variables(‚date‘),‘.‘)[0]).
Then, if the length is equal to 2,
formatDateTime(outputs(‚From_string_to_ISO_8601_datetime_value1′),’yy-MM-dd‘)
is used. Otherwise, it is updated with:
formatDateTime(outputs(‚From_string_to_ISO_8601_datetime_value‘),’yyyy-MM-dd‘).
In this way, the metadata is automatically extracted from the document names and added in the library as parameters so that it can be used for filtering or sorting the files.
Authorization
In addition, the previously extracted metadata can also include the authorized person or group for the file. For this reason, a Flow can be ran with the trigger “When a file is created or modified (properties only)”. Checks are performed, if the file is not a folder, if it is in the correct folder, if it is
new and if it is not an init file, created with a new folder. If all is good, the file is moved in the corresponding authorization folder. Hence, the authorization functionality is also supported.
This procedure is a beautiful example for that how via Office 365 trivial working processes can be automated and, in this way, time and money can be saved for more creative ideas and projects.