Virtual Machines

Authors

Ahmed Al-Hindawi

Tom Frost

Published

March 4, 2023

As a user of FlowEHR you will need to create a virtual machine (VM) within an Azure Workspace in order to access FlowEHR assets within a secure Trusted Research Environment (TRE). The following instructions describe the process of VM creation and access, as well as some initial setup to access a Jupyter notebook once inside the VM.

This documentation relies heavily upon the official AzureTRE docs which should be consulted before performing the below steps.

1 Creating a new VM

  1. Visit the FlowEHR Azure TRE landing page URL (please request from a member of the team)
  2. Select an appropriate Workspace from the list provided
  3. Under Workspace Services, select the VMs option
  4. Under Resources, you will see any previously created VMs. If there are none, select Create New in the upper right
  5. Select Linux Machine > Create (only option available as of 05/12/2022)
  6. Choose an approriate name and description for the VM and select the desired image.
    • For linux VMs, there are two Ubuntu 18.04 images available. The Data Science variant may have a more relevant selection of packages installed.
  7. Select an approriate VM size from the dropdown menu. If you’re not sure of your requirements, opt for the 2 CPU | 8GB RAM option. You can scale this up later if needed.
  8. Check the box marked Shared storage to enable access to workspace shared storage.
  9. Hit Submit. Your VM will start provisioning and you may need to wait a few minutes.

2 Stopping a running VM

  1. Select the three dots at the upper right of the VM entry under Resources
  2. Select Action > Stop
  3. The VM will be stopped momentarily

3 Starting a stopped VM

  1. Select the three dots at the upper right of the VM entry under Resources
  2. Select Action > Start

4 Connecting to a created VM

  1. Visit the FlowEHR Azure TRE landing page URL
  2. Select an appropriate Workspace from the list provided
  3. Under Workspace Services, select the VMs option
  4. Under Resources, you will see any previously created VMs
  5. Select Connect under the VM you wish to connect to. This launches a separate browser window or tab.

5 Deleting a VM

  1. Select the three dots at the upper right of the VM entry under Resources
  2. Select Action > Stop
  3. When the VM has been deallocated, repeat step 1. and select Delete

6 Installing software within the VM

As the VM is located within a TRE, most outbound internet access is restricted.

6.1 GitHub

Access to repositories hosted on GitHub is restricted. Repositories can be mirrored on an ad-hoc basis to a TRE-accessible Gitea instance, where this documentation can also be found.

You should be supplied with the URL for the Gitea instance during your onboarding. If you have not received the Gitea URL, please request it from member of the team.

6.2 APT packages

The VM has access to software packages via a Nexus mirror

6.3 Python packages via Conda and PyPI

The Nexus mirror provides mirrored access to packages available via conda forge and PyPI via the standard pip install and conda install command line interfaces.

7 Accessing EHR and DICOM data within the TRE

This documentation is accompanied by Jupyter notebook files which provide detailed examples of accessing and viewing EHR and DICOM data from within a Jupyter Lab environment within a TRE VM.

Before launching the notebooks and after following the steps in the above example, install required dependencies within your created virtual environment with

pip install -r requirements.txt

7.1 A note on the data accessible from the TRE

Data provided has gone through an anonymisation step and will not entirely look like the original data due to the removal of PII and consequent structural changes that may appear as a result, such as line break removals.

As such, structural elements of reports such as line breaks or sentence lengths should not be used to generate features as inputs to machine learning models.