Virtual Machines
As a user of FlowEHR you will need to create a virtual machine (VM) within an Azure Workspace in order to access FlowEHR assets within a secure Trusted Research Environment (TRE). The following instructions describe the process of VM creation and access, as well as some initial setup to access a Jupyter notebook once inside the VM.
This documentation relies heavily upon the official AzureTRE docs which should be consulted before performing the below steps.
1 Creating a new VM
- Visit the FlowEHR Azure TRE landing page URL (please request from a member of the team)
- Select an appropriate Workspace from the list provided
- Under
Workspace Services
, select theVMs
option - Under
Resources
, you will see any previously created VMs. If there are none, selectCreate New
in the upper right - Select
Linux Machine > Create
(only option available as of 05/12/2022) - Choose an approriate name and description for the VM and select the desired image.
- For linux VMs, there are two Ubuntu 18.04 images available. The
Data Science
variant may have a more relevant selection of packages installed.
- For linux VMs, there are two Ubuntu 18.04 images available. The
- Select an approriate VM size from the dropdown menu. If you’re not sure of your requirements, opt for the
2 CPU | 8GB RAM
option. You can scale this up later if needed. - Check the box marked
Shared storage
to enable access to workspace shared storage. - Hit
Submit
. Your VM will start provisioning and you may need to wait a few minutes.
2 Stopping a running VM
- Select the three dots at the upper right of the VM entry under
Resources
- Select
Action > Stop
- The VM will be stopped momentarily
3 Starting a stopped VM
- Select the three dots at the upper right of the VM entry under
Resources
- Select
Action > Start
4 Connecting to a created VM
- Visit the FlowEHR Azure TRE landing page URL
- Select an appropriate Workspace from the list provided
- Under
Workspace Services
, select theVMs
option - Under
Resources
, you will see any previously created VMs - Select
Connect
under the VM you wish to connect to. This launches a separate browser window or tab.
5 Deleting a VM
- Select the three dots at the upper right of the VM entry under
Resources
- Select
Action > Stop
- When the VM has been deallocated, repeat step 1. and select
Delete
6 Installing software within the VM
As the VM is located within a TRE, most outbound internet access is restricted.
6.1 GitHub
Access to repositories hosted on GitHub is restricted. Repositories can be mirrored on an ad-hoc basis to a TRE-accessible Gitea instance, where this documentation can also be found.
You should be supplied with the URL for the Gitea instance during your onboarding. If you have not received the Gitea URL, please request it from member of the team.
6.2 APT packages
The VM has access to software packages via a Nexus mirror
6.3 Python packages via Conda and PyPI
The Nexus mirror provides mirrored access to packages available via conda forge and PyPI via the standard pip install
and conda install
command line interfaces.
7 Accessing EHR and DICOM data within the TRE
This documentation is accompanied by Jupyter notebook files which provide detailed examples of accessing and viewing EHR and DICOM data from within a Jupyter Lab environment within a TRE VM.
Before launching the notebooks and after following the steps in the above example, install required dependencies within your created virtual environment with
pip install -r requirements.txt
7.1 A note on the data accessible from the TRE
Data provided has gone through an anonymisation step and will not entirely look like the original data due to the removal of PII and consequent structural changes that may appear as a result, such as line break removals.
As such, structural elements of reports such as line breaks or sentence lengths should not be used to generate features as inputs to machine learning models.