Developing with Kinect for Azure - Understanding the human body by Andreas Erben
12K views
Nov 16, 2023
Kinect for Azure is the latest iteration of the widely successful Kinect depth sensor that enables understanding human beings. Kinect is all grown up with a modern powerful set of APIs allowing development on multiple operating system platforms. This session will introduce the developer experience and how to develop for Kinect for Azure, from working with the sensor streams of the device, to accessing skeletal tracking of human bodies. Attendees will learn about scenarios they can implement with skeletal tracking. Through specific examples developers and decision makers will understand how to leverage the technology in real applications. Conference Website: https://www.2020twenty.net/lightup #lightup #2020twenty
View Video Transcript
0:00
Alright, thanks everybody for joining us today on July 14th, 2020, for the Light-Up Virtual Conference, 24 hours of Microsoft Technology Learning and Fundraising to Fight COVID-19
0:14
We'll be talking about Connect to Azure for Developers. My name is Andreas Urban and you can..
0:21
you can, that's important, take a picture of this slide, you can donate in support of UNICEF to fight this COVID-19 crisis
0:34
Every contribution matters. We will probably go back to the slide at the very end and you can also go to the light up conference website to get this donation link
0:44
We have a bunch of great sponsors. I don't read all the names individually, but without sponsors
0:53
an event like this would be impossible. So thanks a lot to every every sponsor
0:58
Thanks for also for C Sharp Corner and to the tech platform to facilitate and set up this event
1:04
My name again, Andreas Urban. I'm the CTO for Applied AI and Mixed Reality and CEO of Americas of DayNet
1:11
I have 25 years of experience in this field and have been dabbling with VR and interactive user
1:18
interfaces already in the mid-1990s, but then a lot of years there nothing was happening. I worked
1:23
with companies of all sizes and all types from startups to big enterprises, so but I love geeky
1:29
technology. My contact information is there if you have questions you can also contact me after this
1:35
event if you don't cover anything you learn more talking about kinetic to the connect sensor a
1:42
little bit right um i believe my camera feed is shared um and um if not then i will go back to it
1:50
again um this is the connect sensor i'm holding into the camera it's a little um uh deaf camera
1:59
um that has as you can see some light emitter and um you see probably some speckles and pick
2:05
pixels on the camera and on the camera sensor and has a few cameras in there has connectors for
2:12
usbc has additional connector for power um if you would unscrew this it would have
2:18
another connector for synchronization um but i have a lot of content to cover so let's go back
2:25
to my slides first and let's look at what are the what is the history of Kinect initially was
2:33
released for the Xbox 360. It was a complete new gaming device for motion control video games and so
2:40
on but hackers good hackers have started to dabble with it and started to write applications for it
2:46
and Microsoft had a different question to solve what do we do do we support the hackers or do we
2:51
They decided to support the hackers and decided to make that API to develop Kinect software public
2:58
and developed a product out of Kinect for Windows version 1.1, which has
3:04
scalable tracking enabled, what that is we will see later, with 20 joints and landmarks
3:09
which was groundbreaking at that time. Then 2013, Kinect 1 came part of Xbox One, and following 2014, Kinect for Windows 2 was released
3:19
was released which improved body tracking and a lot of other things and the core sensor technology
3:26
was also changed finally in 2018 was a mixed year because we learned that connect for
3:35
xbox one didn't have really a future but then the project connect for azure was announced
3:41
and 2019 it launched as azure connect development kit at the mobile world congress with new body
3:48
tracking SDK and skeleton suite and zero composed of 28 joints. Well this is again how it looks like
3:58
if you would blow it up then I believe you can see my mouse cursor. This is the main board and
4:07
all the different components of the lenses here you see the RGB camera and you see the DEFs camera
4:16
which is light emitting, light emitting module, which is laser light, modulated laser light
4:24
that's been emitted. And you also have a microphone array here on top where Kinect can locate sound
4:33
in space and theoretically in a specific direction or you can use it for noise cancellation of ambient sound
4:40
A little bit extra nerdy content that you can read about on this paper
4:48
the public information, how does it actually work? We have a so-called time of light based depth camera in there
4:56
It means it modulates light or coming out of light emitters, and it knows how that light is modulated
5:04
So by knowing how it is modulated and knowing the exact timing of the modulation, it can then calculate and disambiguate for every pixel that it sees for this light signal, and it's an infrared light signal
5:19
It can then know, based on this modulation, how far is something away from the camera in a specific spatial depth resolution
5:33
There are two resolution modes in terms of depth, where you get a little bit more detail on a narrow point of view
5:40
or you get a little less detail but have a wider view of view
5:45
This is another slide how basically some of the math inside works
5:51
There's a few other interesting factors also. Those are the sensor screens that you see from it
5:57
It has an active IR image. What does it mean, active IR image
6:02
It means that it's driven by the infrared light emitters of that device
6:09
It's infrared, but it's not heat. It cannot, this camera cannot see heat specifically
6:15
But because of this fancy light modulation, it actually can remove ambient light from the picture
6:21
So if I would shine like another infrared source on the same spot here, you wouldn't see a bright spot
6:27
coming here. because it will only show the light that it knows about
6:33
from the modulation from the camera. This is a so-called depth map
6:37
It's a visualization about how far something is away. It just shows this color scheme board
6:42
Don't worry about it. And it has a full color image. It has an IMU, so an acceleration inertia unit
6:49
that you also would find in phones. That's very useful if you would want to mount this on a robot
6:53
for example, or get an idea which angle the camera is positioned
6:58
And here's the microphone array, the raw data. It has, well, eight channels, no seven channels
7:05
sorry, seven microphones built in. As in software stack, very, very rough, right
7:12
You have the Azure Connect DK device that connects to your PC while you use PC
7:17
And the PC access to this via just regular USB protocol. There's no driver, specific driver
7:24
you really working with the sensor SDK accesses that device pretty directly Speech can be processed through the microphone stream it presents itself or you can access the microphone screen
7:38
And on top of the sensor SDK, there are a bunch of things completely on top of that
7:42
What the sensor SDK provides, we will look at a little bit in detail, but you could use Azure Vision Services
7:49
like cognitive services custom vision to just look at the camera images from the camera
7:54
and get information out of it. And well, body tracking SDK, we will look what that is
8:03
It's a magic way you can understand what anybody does or how it's possible and so on
8:10
And then any type of usable skills that you would wanna build on top of that SDK
8:16
that then is the building blocks to build your applications. So the sensor SDK, and that's also something
8:24
that old Microsoft would never done, completely open source. So it's cross that from a low level access
8:32
You can write software on Linux for that. Currently, as far as I know, it's Linux and Windows
8:39
And it's not yet on Mac OS. I don't know if that will ever happen
8:45
I believe one of the challenges is that the components that is being used to calculate
8:54
the 3D location. So basically the mathematical magic that works on the census stream and
9:03
gets you the devs, that's the only part that's not open source. I don't believe they have ported that
9:09
because that's also accelerated by the GPU. Then, well, all the other things
9:16
will not be accessible on links and on Windows. The vision is to have it eventually
9:23
even on raspberry pi type of devices they have enough processing power so to not be strained
9:30
through purely desktop systems as in languages available you can develop most natively in c
9:37
the whole thing is written in c there's a c plus plus wrap on top of it and then a c sharp wrapper
9:43
and all my samples um set expectations are going to be in the c sharp wrapper
9:49
I will not be by the way, I will not be able to monitor the chat myself, but the chat will be monitored by the moderators of the session and we will have questions
10:01
And the answer and you don't need to hold them necessarily, but maybe it's better to hold them a little bit and the moderate
10:11
to do that. So there's tools right? I mean the Sense SDK comes with tools. You can download it
10:17
you can install it and then you have it on your hard drives. Somewhere here I can program files
10:26
Azure Connect SDK. Tools is you have a tool to update the firmware which I will show. You have
10:37
a recorder to record a data stream and then the viewer component
10:46
So the viewer component, if you just launch it, looks like this
10:50
And when you start it, it shows you lift those devices. It looks at the USB bus, where are my devices
10:57
and then you open the device. When you open the device, no other software can access it
11:03
That's a difference compared to Connect for Windows version 2, where you could have multiple pieces of software accessing the device at the same time, at the expense of having this centralized driver component itself is installed on the computer
11:21
So they wanted to get rid of that. There's a bunch of settings you can choose here. something called binning, which basically combines neighboring pixels to get better information
11:35
It's noisy and better depth vision in that sense. And you can choose between a wide view and a narrow view, and you have a passive IR camera
11:47
camera. You can choose the color format, the colors being pulled to the resolution of the color
11:54
screen and the frame, how many frames you want to take, enable microphone and so on and so on and so on
12:01
Let's start it quickly. So this is a view similar to what you've seen in the slide, how it looks
12:07
slide live. Let me walk into the picture really quickly. That's me
12:14
And yeah, and you can choose also between a 2D mode and a 3D mode
12:20
So you see here that it actually has maps, is able to map all this video
12:27
stream into the 3D space. Okay, so let me not forget to close this device
12:33
Otherwise, if I show you the code later on, I will get an error. because my demo code will not be able to access it
12:41
Alright. So Azure Connect, again, it's open source, it's on GitHub, and it's also the best way to report issues or to ask questions
12:50
is to just read by GitHub. And you can also use customer voice to propose new features
12:59
And Stack Overflow is also a platform that the Connect team actively monifolds
13:04
So rather than trying to chase down those people inside the Microsoft
13:08
who are they, why don't they help me use GitHub? And they really are pretty responsive on that
13:14
So let's go to body tracking. Body tracking has been designed from the ground up
13:20
new for Azure Connect. It provides what they call an instant segmentation map
13:26
It means basically give me the pixels of the image, the location of those pixels that belong to a body
13:33
It gives you 3D joint positions for a person and gives you a unique ID for each of those
13:40
persons to track temporarily. Meaning if you have frame one and the next frame frame two
13:47
you will get the same identifier, the same number for the same skeleton even if they move around
13:53
which is important if you want to write software. Quality has been improved and anatomically more
14:00
is more correct compared to Kinect version V2. Has higher joint accuracy and precision
14:07
improved robustness for sight through bending and lying. That is actually a big thing because for Kinect for Windows
14:13
for the Xbox was designed primarily to interact with the Xbox console
14:19
And so it was trained on people facing the camera, facing the screen and interacting with the screen
14:26
So it would still track you relatively well if you look at it from the side
14:31
or if you bend in a weird angle or lying down, but it was never designed for that
14:36
And it can track up to 15 skeletons, which is not a hard limit, but a practical limit
14:42
Also, Connect for Azure. You can do Crest from development. As I think Windows with Linux works right now
14:51
it's not preview anymore actually. And the SDKs are available in C C and C Sharp It runs on top of what they call the Onyx Runtime ONIX runtime is a cross engine actually that Microsoft uses for open neural network exchange format to do AI on arbitrarily hardware
15:15
But supported right now is the NVIDIA hardware platform and you need at the least an NVIDIA 1050PI or better for the hardware acceleration
15:25
Yes, it also will work with the NVIDIA 2080s and your devices
15:31
So that's a constraint to the thousand numbers of the NVIDIA devices
15:37
So how did they do it? They trained it with synthetic data. So you have this great device here
15:46
And what they did is basically those are not real people here
15:50
This is artificially rendered. people based on real emotions of real people and they put basically fake clothing on them
16:02
and so they know exactly for each of those artificial synthetic people like synthetic people
16:08
like in a pixar movie think about it right so they know exactly where each point on those bodies are
16:14
because it's all computer simulated and then they projected that what they're rendering here into
16:21
to how would the connect sensor see it? Like artificially, right? They tried to approximate it as well as possible
16:31
How would a connect sensor see that exact same thing? And then train AI on top of that, right
16:38
So they get all the perfect positions. They know it's the perfect labeling
16:42
They know exactly where's the leg of the artificial model. And it solves a lot of problems
16:48
For example, human diversity, like all body shapes and sizes, all skin colors, all types of humans that we have in reality can be simulated in that platform
17:01
And so they can train their AI better. And they can do that also in multiple simulated environments, not just in a gym or in a living room, but also in a patient's room or in an OR operating room and so on
17:15
Yeah, well, this is the fundamental approach. Currently, it is trained really just for humans
17:20
No, you cannot track your dog with it. That's a very question that you get almost every time I talk about it
17:27
But it's, yeah, theoretically they could able that, but they wouldn't need to build models of dogs and retrain the whole thing for them or other animals
17:39
If you compare it with Kinect version 2, I will not talk too much about this slide
17:45
It's in the video composite and later on. Fundamentally, it's a newer
17:53
now it uses a neural network compared to before. Decision Forest has a better synthetic ground truth
18:00
has improved robustness, is supported on more software platforms, not just Windows, has
18:09
the C++ sub wrappers. It however needs a little bit more powerful hardware for Kinect for Windows
18:17
version 2. The original Surface Pro was enough to really run it. So that's a good example as a
18:26
notebook. But now you need to have a well to really run it sufficiently. It's a seven generation
18:33
interchromatized for gigabyte memory and this NVIDIA graphics card right here. So as for the limit of how many skeletons there are to be tracked compared to the hard limit of six
18:46
15 is a practical limit. You cannot really put more than 15 people into the field of view of
18:51
Kinect and fit them all. That's the practical limit of it. So again the C-API builds on
19:03
on top of the sensor SDK, easier to wrap around for other languages, and there's a C-sharp API
19:09
A body frame, so how was the body representative? It's a collection of body structs. What is a body
19:14
struct? It has the ID of that body that's tracked, has a joint position in 3D which is a vector
19:22
and has joint orientations as a so-called quaternion. I have a slide later on what a quaternion is
19:29
but it's very common in game development and 3D development because it's very easy computationally
19:36
to represent rotations in a quaternion. The math behind it is a little bit funky
19:43
Think about it as complex numbers on steroids. This is the index map I mentioned before. It gives
19:50
you the outline of the body that the connect recognizes and you can use that for example for
19:54
a green screen effect to show virtual background and so on. And input capture, you can just capture
20:05
everything the camera sees and they're usually on if you want. The skeletons are in 3D so you can
20:11
rotate that image around and get 3D information for all those coordinates. The architecture
20:19
how does the whole thing work with body checking so it starts with the azure connect development
20:25
kit on top of that is the sensor sdk and then we have the camera screens the active infrared camera
20:31
stream um which is just a black and white image is used as a convolution neural network in the
20:36
onyx runtime to then get a two-dimensional joint coordinates for information there's also
20:43
And also regular web cameras has also an Onyx model, you can just download Google for it if you want or Bing for it
20:53
That gives you 2D joints if you want to write visual apps that don't need the Mac device
21:00
And it gives you segmentation apps. Now this is two dimensional. Well, here the depth image comes to rescue and it gives you, it knows all about the distance from the camera and space
21:12
in space. Then they do a step called model fitting. Model fitting basically combines those joint
21:18
coordinates from 2D with a real distance in space and uses knowledge about the typical human anatomy
21:28
about how far something would be. So even if the sensor stream might be a little bit messed up
21:34
because it knows how a body is that the arm length doesn't suddenly change between two images. So we
21:40
can use that to get good approximations of 3D joints even if this would be called occlusion
21:46
or partial occlusion of a specific joint where the camera cannot see everything
21:52
The processing, well you get the capture object from the SDK and then you AQ the capture to be
22:03
processed on the GPU with a computer with a new network and then the CPU takes over that result
22:08
and does some model fitting and that result that ends up in a queue and the application
22:15
developer would pop the result of operation from that queue. Skeleton 3 a little bit more detail those are the coordinates that you get So the extension of the landmarks has improved and positions and so on and so on
22:39
I can show here, by the way, one little quirks that connect for Windows version 2 had
22:43
if you would look at a body and look at the hip joint and that body would raise their arms from
22:49
low to high, then the hip joint suddenly would also move up as the arms are raised. Even so
22:57
the body in reality never changed. This is something that's also been fixed and make for Azure
23:05
A joint definition, this is how it roughly looks like. It's a float
23:09
three-dimensional vector for the position and the quaternion for the orientation, which is a four-dimensional float, and that gives the joint
23:19
So what can you do? And I'm sorry that I'm rushing over the contact
23:23
or literally having no time for life. It enables you to do different things
23:29
For example, kinematic ysis is yzing all those motions and all those things you can see from a perspective
23:37
of understanding what's going on with human body. yze the posture, then use it for physical rehabilitation
23:44
to see do people do what they're supposed to do and calculate
23:49
mathematically what it means in terms of the exercises that were observed by the camera
23:59
can be used for fitness apps or for sports instruction. A lot of people ask about the golf swing, can I improve my golf swing
24:07
It's theoretically possible. I would not recommend it purely because part of the golf swing is a very, very fast motion that matters
24:16
and that might not fit completely into the 30 frame maximum frame rates the camera has
24:23
But other types of sports where you want to instruct people about form, meaning the body posture
24:32
exercises is very good. Can be used for patient monitoring of people where they are supposed to be and not are they in bed
24:39
Do they move or do they breathe? Does their chest move when they're lying down
24:44
breathing didn't stop or can be used for fall detection. I can talk about fall detection a little bit later, maybe towards the end
24:52
Human understanding is a different scenario as we're still looking at human bodies
24:58
but we want to not understand something intrinsic the body does, but their behavior
25:03
How long do they stand in front of a specific shopping disc
25:08
How often do they go in a specific area so we can detect them and how
25:13
to track people as they move through the space. Maybe you want to know where they are or where they shouldn't be
25:21
A very common scenario is also in the industry. If you have a machine and nobody should be near that machine
25:29
but the machine is running, you could have a connect sensors that, for example
25:33
only enable the machine if bodies are only outside that radio, but not in radius, but not inside the radius
25:41
And you can use it for smart spaces interactions, like, yeah, it's similar to the next slide
25:49
which is human interaction. You could interact with information science and video walls
25:54
wave to the video wall, and it does something back to you. Interactive art and performance
25:59
you could transform all those sensitive data into something creative, you can use it for interactive
26:05
it seems, exhibits, and, or customizing and fitting, machine safety, So the two last topics are pretty related and what you can do with them
26:16
Like a little walkthrough about the body tracking 3D viewer. Yeah, I mean, you would..
26:26
Well, actually, let me use that chance actually to show you this 3D body tracking
26:33
Similar to how we had installation of the SDK, we also have the installation of the body tracker
26:40
And it's a command line app here that you install in that folder
26:46
When you install that, it's starting up. Yeah, that's up and starts with the visualization
26:56
And you can move that around as I said before. Figure. Maximize it
27:07
Let me go into the picture. So the camera starts seeing me and it starts dragging my joints as I move on
27:14
As you change that angle a little bit, so it now sees me from the side, but of course
27:22
it doesn't know the pixels on my side. So it doesn't also know what's behind me because it only sees me from the front
27:32
So from a coding perspective, how does it all start? The first operation is really, hey, give me the device configuration, verify and start
27:43
the camera. So let's start with a little demo of sensor basics. Images
27:52
We have a break point here. So you see that it tries to open the device and I hope actually that I've finished my demo
28:48
So it tries to open the device, I would throw an arrow if it cannot open the device here
28:54
Then it starts the camera with the device configuration resetting, which is a bunch of parameters in the API
29:02
Again, this is a chief sharp wrapper. It's an operation that can take a second
29:08
Yeah, so it's out of the camera. Now this app owns the camera
29:15
and we can get the metadata from the camera. So we know this resolution is like 20 times 1080
29:21
And we initialize a bitmap. And then when we get the component runs
29:59
Thank you
30:29
Okay, yeah, so it received the capture object and it has a device timestamp, which you get
30:34
from the capture. This timestamp you can use in applications to establish mass based on time
30:40
For example, calculate velocity of how fast something moves to take the difference between
30:45
timestamps and you then calculate the difference of movement and you can use
30:51
that device time standard also identifies that image uniquely. And then what it does, it takes basically it writes the buffer it gets back from the API
31:05
into the bitmap. So this is something we'll just show briefly, but I will not
31:11
I mean, you cannot really debug into it. Marks it bit better but sturdy and then displays that
31:16
So let's remove the break point so you see it happening. In this case here, it's just, I mean, it's pretty boring
31:23
It's actually just a video camera showing the video. Yeah, that's, but the fundamentals are the same for everything
31:32
It's pretty straightforward to access a sensor basis. That's the takeaway from this
31:37
Let's look at body tracking as a more advanced example because we don't have
31:42
time for all things that are possible. Again, the process is similar
31:47
We start the camera, we create a body tracker object, which is a query capture
31:55
which is the object to get back from the camera. Remember, here you need both the dev screen and the infrared data screen
32:04
Then you check is a new capture, then it says a new capture
32:08
You queue that capture to the body tracker and then after that, try to pop the latest
32:16
result from the body tracker, which could take could happen only in the next frame of processing
32:23
So this runs in the background and then you process that result
32:27
Yeah, that's the demo will come in a little bit later. So again, input capture, you get the collection body struct and the 2D body index
32:40
This is an enumeration you get as an application developer, which is very helpful to let you
32:48
do the operations based on the structure of the body. So you can say, hey, give me the distance between the elbow and the wrist or the vector
32:59
of the error the risks and you know where that's pointing based on those positions of the
33:07
speedy positions of those indexes right again the joint definition I showed that now let's do the
33:14
demo application so you see here it was the same as above i start the camera i do the device calibration
33:36
then i initially initialize the tracker which is a component for body tracking
33:42
and then I do the sensor capture right here so that my break point is start right there
33:48
and after that we try to get the result from the tracker
33:54
So you see the track is initialized and we got the capture app object
34:12
If you look into it, you see it has a depth image with all the goodness and the IR image
34:19
which you can just use in your code as you want. And then we include that
34:24
Yeah, so at one point we get a result from that. So let me remove this breakpoint here
34:29
so we can actually get a look at the results that we get back to the tracker. Yeah, so we got a frame back and the frame actually
34:38
well, sorry. Here, the frame might be zero, null, right? So that means I'll be asking the tracker
34:48
but we didn't get anything back. So at one point, the processing will leave it
34:55
See the frame, we got the frame back finally. I have a timestamp, but there's no body to be found
35:01
That's a simple reason, because I haven't been in the camera. So we need the break point And here so this is the video stream based on the depth image It shows the depth image here
35:19
And now the track and it's, yeah. And in the visualizer, it sets the frame data
35:30
Now let's look at the visualizer briefly. The visualizer is basically, checking the data of the tracker
35:46
Actually move into the picture and I hope that forgets the break point
35:51
Okay, I didn't forget the break point. Yeah, so here's a bunch of math that's happening in the render window
35:59
And in the render window, we looked at, hey, did I get any number of bodies
36:04
Here we got a number of bodies that recognize one body. So we need to actually unwrap that buffer and get the specific joint we want to have access to
36:39
I mean, the API lets you do it, but again, the wrapper itself doesn't provide a good inspector experience
36:49
So that's one of the downsides. It's the buddy ID 1. So let's get one of the joints
36:55
It's the first joint here. No. My mouse says it's running on the batteries. I have to fix it later on. So we get a joint and
37:20
that's the joint position. as a vector. And the vector with specific coordinates and this orientation
37:32
for Kernian, which has x, y, z and w as parameters. And then so we do some process
37:41
all those joints that we get. So this three-dimensional vectors and and orientations
37:49
What we do is basically for every joint check, hey, do we also have a value of the parent
37:54
Meaning for the wrist, I need the elbow to be able to draw
37:59
the lower part of the arm. And then let's render those two joint positions in speedy space
38:09
We walk into the image. And of course, I forgot to remove that breakpoint
38:16
Apologies for that. Yeah, again, you walk through the image. That's the problem when you need to step away and you see
38:27
So it renders like what I mentioned before. So this is a wrist joint
38:31
This is the elbow joint that renders that. Those lines based on this
38:39
The data we're getting back from the body tracking SDK. Let's go back to the slides
38:49
Let's discuss briefly an example. What can you do when you would have a system like that
38:57
On an abstract level, you have a system that's, and the slides are not completely updated, I have to say
39:03
unfortunately, but on an abstract level, you have a sensor system and you have a high level of sensor processing
39:08
and then you can do specific things. you can do posture classification on it and you can have something activity in fault detection
39:21
How do you do all this? You have a concept of the world. You have physical formulas
39:26
have a model, understanding of the body, parameters, and there's mass you're doing on that body
39:31
like calculating velocity and acceleration, then they visualize something. Look at body joint consideration
39:41
What you want to do, you want to look at the tracking state, if you're able to put each of those joints
39:46
so you get some level of confidence, how well I'm at tracking
39:50
You want to do math on it, so you want to maybe convert it to a numerical form
39:57
of that convenient for the math you're doing. You want to understand the joint hierarchy
40:01
as I mentioned, the list and elbow, for example, and then do all those math and so on
40:06
In general, you want to read about gaming engines and physics engines
40:11
and game development. Those have the best references to do math and speedy data
40:19
As a guidance, what I used before is I used the model from the MESA
40:24
and I got a stereotypical body and distribution of weight in a body But to do math on let say the energy that being used on the elbow joint
40:37
you need to understand the length and how heavy is the arm, upper and lower part of the arm
40:44
And that got it by looking at models they use to calculate forces on astronauts
40:49
for astronaut uniforms and astronaut spaces and so on. And then you make some assumptions like the bone is to be a rod shape
40:58
And then you can calculate an inertia by this. You can use muscle efficiency as some assumptions to do advanced
41:07
This is a slide about quaternions, really overloaded. But this is a fundamental definition of quaternion
41:14
It enables you to project things onto a sphere as a rotation
41:22
to simplify this, but they have wonderful YouTube videos about it. What cotronians mean to rotation in 2020, the best thing I can tell you
41:33
look it up on YouTube, cotronians rotations and you will, your eyes will be open
41:38
Yeah. Now about the math, you can do linear motion and angular motion
41:42
Those are formulas you can use from velocity to force, work energy
41:47
and it matters as they will show in a second. This is a video I did of an app I did, which is the old Connect sensor
41:56
And you see that those little bubbles represent energy calculations. I did based on motion
42:02
Like the green one is the angular motion and the yellow one is the absolute linear motion
42:12
You see they behave differently. My shoulder doesn't move when I lower the arm and raise the arm like this
42:21
but there's still energy being spent because I move my arm. So you want to do math, maybe considering both
42:32
Those different vectors, the lines show the acceleration of velocity. One thing, it is a sensor-based system
42:40
So you need to handle something called jitter. Every sensor system in the world, including computer systems, they do not give you consistently super precise readings all the time
42:52
There might be a difference in one pixel or the other between two frames
42:57
And if you do math on that, like acceleration, you take the difference of velocity between two frames, you get errors that get worse over time
43:09
here I smoothed I used a so-called double exponential filter and you see those lines stay relatively
43:17
well even as I move slowly now if I would remove that those filter the filtering here
43:27
and do that same motion again it's a recording I'm talking through it so you see how everything
43:33
jumps around like so this error and jitter would be uh pretty much exponential in when you
43:41
if you would do math on it so think about jitter um again the last slide i will just run through
43:48
you can calculate activity you can classify activity levels if you first calculate the
43:53
energy expenditure based on the approaches in formula below you can do fall detection by
43:59
different scenarios thinking about for example how fast does my head move towards the floor
44:05
if it has a sudden fast movement towards the floor and we know the position of the floor and suddenly
44:10
stops moving maybe something happened or it could be slow fall maybe you just want to see hey a body
44:16
has been on the floor for quite a while something might be wrong in general fall detection is not
44:22
easy as a warning because a lot of falls part of it somebody might go down to the floor to pick
44:27
something up or take a nap and difficult to distinguish from an actual fall or somebody
44:33
might jump on the sofa on the bed because they're tired and that jump might be specified as a violent
44:41
posture also very simple or very complex you can use advanced machine learning or you can make very
44:47
simple assumptions you want to detect the posture well look at the the angle of your leg towards the
44:55
floor the angle of your leg towards your hip and you get some information like it's like
45:03
relative if your upper leg is like this to the floor and your lower leg is like this to the
45:07
floor and your spine is like this to the floor maybe you're sitting as an example
45:13
posterior template i'm skipping over that and then this is a summary of all the
45:19
the math I did in a previous application. I validated energy expenditure estimation
45:25
with a real heat sensing and motion sensing arm band. The gold standard would be
45:32
actually measure the CO2 that I produced when I exercised, but I didn't have that
45:37
And I found that what I doing with the Kinect is similar in profile what I solving calculate the profile as those arm bands capture But there is some differences
45:51
So it depends on heavily in the motion type. So if you write an application, what you want to do eventually
45:56
is to correct, not just with this formula, which I did in my case
46:01
I corrected my formula and made it a little bit better. But in the end, what you want to do
46:06
you want to have formulas for those energy expenditure estimations or detecting motion in general
46:13
That's dependent on the type of motion we expect. That's the guidance
46:18
Well that was a lot of content to cover, a lot of ground to cover, and I hope it helped you to
46:24
understand a little bit how to develop software, where to ask questions for it and what you can do
46:29
with it. Yeah, this is all I have of the content. If there are any questions
46:36
I rely on the moderators to ask them now. I want to thank the sponsors again
46:43
Really, really thank the sponsors. And there's also feedback you can provide
46:50
All right, I went two minutes over so far. If there are questions, please address them now
47:01
Hi, Andrew, it's Aro here. And your session was really interesting. very futuristic kind of things we are able to see and uh it's very good uh especially the body
47:13
tracking was very amazing i just want to ask you for people who are uh uh in the uh want to learn
47:19
more how we get some learning resources how we can um do some investment how use the camera use
47:26
cameras and all so what kind of things we can do to learn more um well the good thing as i mentioned
48:02
Okay, I'm showing you my browser right now. So this is the GitHub repository where you can start
48:20
We need to understand even how the center SDK was developed. And there are examples in that center that you can use
48:59
And there's also a sent repository available on GitHub for. All those are connected
49:09
Yes, I mean, all this content I gave you besides of the stuff that I developed myself
49:14
it's all available publicly and beyond that you really need to look into a gaming
49:20
engines, 3D math and quaternions and those things. Is that a good summary that helps some of
49:28
hope that had some of the audience to get? Yeah, yeah, it's very important because if they want to learn more
49:33
about the mixed reality and how it works behind the scenes, it's a springboard for them to learn new stuff
49:40
how they can do themselves. It's good stuff. If it's this is the most important starting page here
49:45
just you will be able to get to pretty much everything through that page
50:21
Yes, Andrew, I think this is the your stuff is amazing and then we just so wait for others other speakers and thanks a lot for your amazing demo today. Yeah, yeah, thanks for thanks for having me and thank you for your patience through some break point and bug issues
50:39
So things happen. Thank you so much Andrew. Thank you. I'll just talk to you all soon and fight me on email me or find me on social media
50:47
Yes, sure. Yeah, thank you. Thank you so much. Bye bye
#Augmented & Virtual Reality
#Consumer Electronics
#Programming
#Reference
#Virtual Reality Devices