DenseNet Paper Walkthrough: All Linked

we attempt to practice a really deep neural community mannequin, one difficulty that we’d encounter is the vanishing gradient downside. That is primarily an issue the place the load replace of a mannequin throughout coaching slows down and even stops, therefore inflicting the mannequin to not enhance. When a community could be very deep, the gradient computation throughout backpropagation entails multiplying many spinoff phrases collectively by way of the chain rule. Do not forget that if we multiply small numbers (sometimes lower than 1) too many instances, it is going to make the ensuing numbers turning into extraordinarily small. Within the case of neural networks, these numbers are used as the idea of the load replace. So, if the gradient could be very small, then the load replace can be very gradual, inflicting the coaching to be gradual as nicely.

To deal with this vanishing gradient downside, we are able to truly use shortcut paths in order that the gradients can movement extra simply by way of a deep community. One of the crucial standard architectures that makes an attempt to resolve that is ResNet, the place it implements skip connections that bounce over a number of layers within the community. This concept is adopted by DenseNet, the place the skip connections are applied way more aggressively, making it higher than ResNet in dealing with the vanishing gradient downside. On this article I wish to speak about how precisely DenseNet works and implement the structure from scratch.

The DenseNet Structure

Dense Block

DenseNet was initially proposed in a paper titled “Densely Linked Convolutional Networks” written by Gao Huang et al. again in 2016 [1]. The primary thought of DenseNet is certainly to resolve the vanishing gradient downside. The rationale that it performs higher than ResNet is due to the shortcut paths branching out from a single layer to all different subsequent layers. To higher illustrate this concept, you possibly can see in Determine 1 beneath that the enter tensor x₀ is forwarded to H₁, H₂, H₃, H₄, and the transition layers. We do the identical factor to all layers inside this block, making all tensors related densely — therefore the title DenseNet. With all these shortcut connections, data can movement seamlessly between layers. Not solely that, however this mechanism additionally permits characteristic reuse the place every layer can immediately profit from the options produced by all earlier layers.

Determine 1. The construction of a single Dense block [1].

In a normal CNN, if we now have L layers, we may even have L connections. Assuming that the above illustration is only a conventional 5-layer CNN, we mainly solely have the 5 straight arrows popping out from every tensor. In DenseNet, if we now have L layers, we may have L(L+1)/2 connections. So within the above case we mainly bought 5(5+1)/2 = 15 connections in whole. You’ll be able to confirm this by manually tallying the arrows one after the other: 5 crimson arrows, 4 inexperienced arrows, 3 purple arrows, 2 yellow arrows, and 1 brown arrow.

One other key distinction between ResNet and DenseNet is how they mix data from completely different layers. In ResNet, we mix data from two tensors by element-wise summation, which might mathematically be outlined in Determine 2 beneath. As a substitute of performing element-wise summation, DenseNet combines data by channel-wise concatenation as expressed in Determine 3. With this mechanism, the characteristic maps produced by all earlier layers are concatenated with the output of the present layer earlier than ultimately getting used because the enter of the next layer.

Determine 2. The mathematical notation of a residual block in ResNet [1].

Determine 3. The mathematical notation of the final layer inside a dense block in DenseNet [1].

Performing channel-wise concatenation like this truly has a facet impact: the variety of characteristic maps grows as we get deeper into the community. Within the instance I confirmed you in Determine 1, we initially have an enter tensor of 6 channels. The H₁ layer processes this tensor and produces a 4-channel tensor. These two tensors are then concatenated earlier than being forwarded to H₂. This primarily implies that the H₂ layer accepts 10 channels. Following the identical sample, we are going to later have the H₃, H₄, and the transition layers to just accept tensors of 14, 18, and 22 channels, respectively. That is truly an instance of a DenseNet that makes use of the development charge parameter of 4, that means that every layer produces 4 new characteristic maps. In a while, we are going to use okay to indicate this parameter as steered within the authentic paper.

Regardless of having such complicated connections, DenseNet is definitely much more environment friendly as in comparison with the normal CNN by way of the variety of parameters. Let’s perform a little little bit of math to show this. The construction given in Determine 1 consists of 4 conv layers (let’s ignore the transition layer for now). To compute what number of parameters a convolution layer has, we are able to merely calculate input_channels × kernel_height × kernel_width × output_channels. Assuming that each one these convolutions use 3×3 kernel, our layers within the DenseNet structure would have the next variety of parameters:

H₁ → 6×3×3×4 = 216
H₂ → 10×3×3×4 = 360
H₃ → 14×3×3×4 = 504
H₄ → 18×3×3×4 = 648

By summing these 4 numbers, we may have 1,728 params in whole. Observe that this quantity doesn’t embrace the bias time period. Now if we attempt to create the very same construction with a conventional CNN, we would require the next variety of params for every layer:

H₁ → 6×3×3×10 = 540
H₂ → 10×3×3×14 = 1,260
H₃ → 14×3×3×18 = 2,268
H₄ → 18×3×3×22 = 3,564

Summing these up, a conventional CNN hits 7,632 params — that’s over 4× greater! With this parameter depend in thoughts, we are able to clearly see that DenseNet is certainly way more light-weight than conventional CNNs. The rationale why DenseNet might be so environment friendly is due to the characteristic reuse mechanism, the place as an alternative of computing all characteristic maps from scratch, it solely computes okay characteristic maps and concatenate them with the present characteristic maps from the earlier layers.

Transition Layer

The construction I confirmed you earlier is definitely simply the primary constructing block of the DenseNet mannequin, which is known as the dense block. Determine 4 beneath exhibits how these constructing blocks are assembled, the place three of them are related by the so-called transition layers. Every transition layer consists of a convolution adopted by a pooling layer. This part has two fundamental obligations: first, to scale back the spatial dimension of the tensor, and second, to scale back the variety of channels. The discount in spatial dimension is customary follow when establishing CNN-based mannequin, the place the deeper characteristic maps ought to sometimes have decrease dimension than that of the shallower ones. In the meantime, lowering the variety of channels is critical as a result of they could drastically improve because of the channel-wise concatenation mechanism executed inside every layer within the dense block.

To grasp how the transition layer reduces channels, we have to take a look at the compression issue parameter. This parameter, which the authors confer with as θ (theta), ought to have the worth of someplace between 0 and 1. Suppose we set θ to 0.2, then the variety of channels to be forwarded to the subsequent dense block will solely be 20% of the full variety of channels produced by the present dense block.

The Complete DenseNet Structure

As we now have understood the dense block and the transition layer, we are able to now transfer on to the entire DenseNet structure proven in Determine 5 beneath. It initially accepts an RGB picture of dimension 224×224, which is then processed by a 7×7 conv and a 3×3 maxpooling layer. Remember the fact that these two layers use the stride of two, inflicting the spatial dimension to shrink to 112×112 and 56×56, respectively. At this level the tensor is able to be handed by way of the primary dense block which consists of 6 bottleneck blocks — I’ll discuss extra about this part very quickly. The ensuing output will then be forwarded to the primary transition layer, adopted by the second dense block, and so forth till we ultimately attain the worldwide common pooling layer. Lastly, we go the tensor to the fully-connected layer which is accountable for making class predictions.

Determine 5. The whole DenseNet structure [1].

There are literally a number of extra particulars I would like to clarify concerning the structure above. First, the variety of characteristic maps produced in every step is just not explicitly talked about. That is primarily as a result of the structure is adaptive in accordance with the okay and θ parameters. The one layer with a hard and fast quantity is the very first convolution layer (the 7×7 one), which produces 64 characteristic maps (not displayed within the determine). Second, additionally it is necessary to notice that each convolution layer proven within the structure follows the BN-ReLU-conv-dropout sequence, aside from the 7×7 convolution which doesn’t embrace the dropout layer. Third, the authors applied a number of DenseNet variants, which they confer with as DenseNet (the vanilla one), DenseNet-B (the variant that makes use of bottleneck blocks), DenseNet-C (the one which makes use of compression issue θ), and DenseNet-BC (the variant that employs each). The structure given in Determine 5 is the DenseNet-B (or DenseNet-BC) variant.

The so-called bottleneck block itself is the stack of 1×1 and three×3 convolutions. The 1×1 conv is used to scale back the variety of channels to 4okay earlier than ultimately being shrunk additional to okay by the next 3×3 conv. The rationale for it’s because 3×3 convolution is computationally costly on tensors with many channels. So to make the computation quicker, we have to cut back the channels first utilizing the 1×1 conv. Later within the coding part we’re going to implement this DenseNet-BC variant. Nonetheless, if you wish to implement the usual DenseNet (or DenseNet-C) as an alternative, you possibly can merely omit the 1×1 conv so that every dense block solely contains 3×3 convolutions.

Some Experimental Outcomes

It’s seen within the paper that the authors carried out numerous experiments evaluating DenseNet with different fashions. On this part I’m going to indicate you some attention-grabbing issues they found.

Determine 6. DenseNet achieves higher accuracy than ResNet with fewer parameters and decrease computational price throughout completely different community depths [1].

The primary experimental consequence I discovered attention-grabbing is that DenseNet truly has significantly better efficiency than ResNet. Determine 6 above exhibits that it constantly outperforms ResNet throughout all community depths. When evaluating variants with comparable accuracy, DenseNet is definitely much more environment friendly. Let’s take a more in-depth take a look at the DenseNet-201 variant. Right here you possibly can see that the validation error is almost the identical as ResNet-101. Regardless of being 2× deeper (201 vs 101 layers), it’s roughly 2× smaller by way of each parameters and FLOPs (floating level operations).

Determine 7. How bottleneck layer and compression issue have an effect on mannequin efficiency [1].

Subsequent, the authors additionally carried out ablation examine concerning the usage of bottleneck layer and compression issue. We are able to see in Determine 7 above that using each the bottleneck layer inside the dense block and performing channel depend discount within the transition layer permits the mannequin to attain greater accuracy (DenseNet-BC). It may appear a bit counterintuitive to see that the discount within the variety of channels because of the compression issue improves the accuracy as an alternative. In truth, in deep studying, too many options may as an alternative harm accuracy because of data redundancy. So, lowering the variety of channels might be perceived as a regularization mechanism which might stop the mannequin from overfitting, permitting it to acquire greater validation accuracy.

DenseNet From Scratch

As we now have understood the underlying principle behind DenseNet, we are able to now implement the structure from scratch. What we have to do first is to import the required modules and initializing the configurable variables. Within the Codeblock 1 beneath, the okay and θ we mentioned earlier are denoted as GROWTH and COMPRESSION, which the values are set to 12 and 0.5, respectively. These two values are the defaults given within the paper, which we are able to undoubtedly change if we need to. Subsequent, right here I additionally initialize the REPEATS listing to retailer the variety of bottleneck blocks inside every dense block.

# Codeblock 1
import torch
import torch.nn as nn

GROWTH      = 12
COMPRESSION = 0.5
REPEATS     = [6, 12, 24, 16]

Bottleneck Implementation

Now let’s check out the Bottleneck class beneath to see how I implement the stack of 1×1 and three×3 convolutions. Beforehand I’ve talked about that every convolution layer follows the BN-ReLU-Conv-dropout construction, so right here we have to initialize all these layers within the __init__() methodology.

The 2 convolution layers are initialized as conv0 and conv1, every with their corresponding batch normalization layers. Don’t overlook to set the out_channels parameter of the conv0 layer to GROWTH*4 as a result of we wish it to return 4okay characteristic maps (see the road marked with #(1)). This variety of characteristic maps will then be shrunk even additional by the conv1 layer to okay by setting the out_channels to GROWTH (#(2)). As all layers have been initialized, we are able to now outline the movement within the ahead() methodology. Simply needless to say on the finish of the method we now have to concatenate the ensuing tensor (out) with the unique one (x) to implement the skip-connection (#(3)).

# Codeblock 2
class Bottleneck(nn.Module):
    def __init__(self, in_channels):
        tremendous().__init__()
        
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(p=0.2)
        
        self.bn0   = nn.BatchNorm2d(num_features=in_channels)
        self.conv0 = nn.Conv2d(in_channels=in_channels, 
                               out_channels=GROWTH*4,          #(1) 
                               kernel_size=1, 
                               padding=0, 
                               bias=False)
        
        self.bn1   = nn.BatchNorm2d(num_features=GROWTH*4)
        self.conv1 = nn.Conv2d(in_channels=GROWTH*4, 
                               out_channels=GROWTH,            #(2)
                               kernel_size=3, 
                               padding=1, 
                               bias=False)
    
    def ahead(self, x):
        print(f'originalt: {x.dimension()}')
        
        out = self.dropout(self.conv0(self.relu(self.bn0(x))))
        print(f'after conv0t: {out.dimension()}')
        
        out = self.dropout(self.conv1(self.relu(self.bn1(out))))
        print(f'after conv1t: {out.dimension()}')
        
        concatenated = torch.cat((out, x), dim=1)              #(3)
        print(f'after concatt: {concatenated.dimension()}')
        
        return concatenated

In an effort to verify if our Bottleneck class works correctly, we are going to now create one which accepts 64 characteristic maps and go a dummy tensor by way of it. The bottleneck layer I instantiate beneath primarily corresponds to the very first bottleneck inside the primary dense block (refer again to Determine 5 should you’re uncertain). So, to simulate precise the movement of the community, we’re going to go a tensor of dimension 64×56×56, which is basically the form produced by the three×3 maxpooling layer.

# Codeblock 3
bottleneck = Bottleneck(in_channels=64)

x = torch.randn(1, 64, 56, 56)
x = bottleneck(x)

As soon as the above code is run, we are going to get the next output seem on our display.

# Codeblock 3 Output
authentic     : torch.Measurement([1, 64, 56, 56])
after conv0  : torch.Measurement([1, 48, 56, 56])    #(1)
after conv1  : torch.Measurement([1, 12, 56, 56])    #(2)
after concat : torch.Measurement([1, 76, 56, 56])

Right here we are able to see that our conv0 layer efficiently decreased the characteristic maps from 64 to 48 (#(1)), the place 48 is the 4okay (do not forget that our okay is 12). This 48-channel tensor is then processed by the conv1 layer, which reduces the variety of characteristic maps even additional to okay (#(2)). This output tensor is then concatenated with the unique one, leading to a tensor of 64+12 = 76 characteristic maps. And right here is definitely the place the sample begins. Later within the dense block, if we repeat this bottleneck a number of instances, then we may have every layer produce:

second layer → 64+(2×12) = 88 characteristic maps
third layer → 64+(3×12) = 100 characteristic maps
fourth layer → 64+(4×12) = 112 characteristic maps
and so forth …

Dense Block Implementation

Now let’s truly create the DenseBlock class to retailer the sequence of Bottleneck cases. Have a look at the Codeblock 4 beneath to see how I do this. The way in which to do it’s fairly simple, we are able to simply initialize a module listing (#(1)) after which append the bottleneck blocks one after the other (#(3)). Observe that we have to maintain observe of the variety of enter channels of every bottleneck utilizing the current_in_channels variable (#(2)). Lastly, within the ahead() methodology we are able to merely go the tensor sequentially.

# Codeblock 4
class DenseBlock(nn.Module):
    def __init__(self, in_channels, repeats):
        tremendous().__init__()
        
        self.bottlenecks = nn.ModuleList()    #(1)
        
        for i in vary(repeats):
            current_in_channels = in_channels + i*GROWTH    #(2)
            self.bottlenecks.append(Bottleneck(in_channels=current_in_channels))  #(3)
        
    def ahead(self, x):
        for i, bottleneck in enumerate(self.bottlenecks):
            x = bottleneck(x)
            print(f'after bottleneck #{i}t: {x.dimension()}')
        
        return x

We are able to take a look at the code above by simulating the primary dense block within the community. You’ll be able to see in Determine 5 that it incorporates 6 bottleneck blocks, so within the Codeblock 5 beneath I set the repeats parameter to that quantity (#(1)). We are able to see within the ensuing output that the enter tensor, which initially has the form of 64×56×56, is reworked to 136×56×56. The 136 characteristic maps come from 64+(6×12), which follows the sample I gave you earlier.

# Codeblock 5
dense_block = DenseBlock(in_channels=64, repeats=6)    #(1)
x = torch.randn(1, 64, 56, 56)

x = dense_block(x)

# Codeblock 5 Output
after bottleneck #0 : torch.Measurement([1, 76, 56, 56])
after bottleneck #1 : torch.Measurement([1, 88, 56, 56])
after bottleneck #2 : torch.Measurement([1, 100, 56, 56])
after bottleneck #3 : torch.Measurement([1, 112, 56, 56])
after bottleneck #4 : torch.Measurement([1, 124, 56, 56])
after bottleneck #5 : torch.Measurement([1, 136, 56, 56])

Transition Layer

The subsequent part we’re going to implement is the transition layer, which is proven in Codeblock 6 beneath. Just like the convolution layers within the bottleneck blocks, right here we additionally use the BN-ReLU-conv-dropout construction, but this one is with an extra common pooling layer on the finish (#(1)). Don’t overlook to set the stride of this pooling layer to 2 to scale back the spatial dimension by half.

# Codeblock 6
class Transition(nn.Module):
    def __init__(self, in_channels, out_channels):
        tremendous().__init__()
        
        self.bn   = nn.BatchNorm2d(num_features=in_channels)
        self.relu = nn.ReLU()
        self.conv = nn.Conv2d(in_channels=in_channels, 
                              out_channels=out_channels, 
                              kernel_size=1, 
                              padding=0,
                              bias=False)
        self.dropout = nn.Dropout(p=0.2)
        self.pool = nn.AvgPool2d(kernel_size=2, stride=2)    #(1)
     
    def ahead(self, x):
        print(f'originalt: {x.dimension()}')
        
        out = self.pool(self.dropout(self.conv(self.relu(self.bn(x)))))
        print(f'after transition: {out.dimension()}')
        
        return out

Now let’s check out the testing code within the Codeblock 7 beneath to see how a tensor transforms as it’s handed by way of the above community. On this instance I’m attempting to simulate the very first transition layer, i.e., the one proper after the primary dense block. That is primarily the rationale that I set this layer to just accept 136 channels. Beforehand I discussed that this layer is used to shrink the channel dimension by way of the θ parameter, so to implement it we are able to merely multiply the variety of enter characteristic maps with the COMPRESSION variable for the out_channels parameter.

# Codeblock 7
transition = Transition(in_channels=136, out_channels=int(136*COMPRESSION))

x = torch.randn(1, 136, 56, 56)
x = transition(x)

As soon as above code is run, we should always get hold of the next output. Right here you possibly can see that the spatial dimension of the enter tensor shrinks from 56×56 to twenty-eight×28, whereas the variety of channels additionally reduces from 136 to 68. This primarily signifies that our transition layer implementation is right.

# Codeblock 7 Output
authentic         : torch.Measurement([1, 136, 56, 56])
after transition : torch.Measurement([1, 68, 28, 28])

The Complete DenseNet Structure

As we now have efficiently applied the primary parts of the DenseNet mannequin, we are actually going to assemble the whole structure. Right here I separate the __init__() and the ahead() strategies into two codeblocks as they’re fairly lengthy. Simply be certain that you place Codeblock 8a and 8b inside the similar pocket book cell if you wish to run it by yourself.

# Codeblock 8a
class DenseNet(nn.Module):
    def __init__(self):
        tremendous().__init__()
        
        self.first_conv = nn.Conv2d(in_channels=3, 
                                    out_channels=64, 
                                    kernel_size=7,    #(1)
                                    stride=2,         #(2)
                                    padding=3,        #(3)
                                    bias=False)
        self.first_pool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)  #(4)
        channel_count = 64
        

        # Dense block #0
        self.dense_block_0 = DenseBlock(in_channels=channel_count,
                                        repeats=REPEATS[0])          #(5)
        channel_count = int(channel_count+REPEATS[0]*GROWTH)         #(6)
        self.transition_0 = Transition(in_channels=channel_count, 
                                       out_channels=int(channel_count*COMPRESSION))
        channel_count = int(channel_count*COMPRESSION)               #(7)
    

        # Dense block #1
        self.dense_block_1 = DenseBlock(in_channels=channel_count, 
                                        repeats=REPEATS[1])
        channel_count = int(channel_count+REPEATS[1]*GROWTH)
        self.transition_1 = Transition(in_channels=channel_count, 
                                       out_channels=int(channel_count*COMPRESSION))
        channel_count = int(channel_count*COMPRESSION)

        # # Dense block #2
        self.dense_block_2 = DenseBlock(in_channels=channel_count, 
                                        repeats=REPEATS[2])
        channel_count = int(channel_count+REPEATS[2]*GROWTH)
        
        self.transition_2 = Transition(in_channels=channel_count, 
                                       out_channels=int(channel_count*COMPRESSION))
        channel_count = int(channel_count*COMPRESSION)

        # Dense block #3
        self.dense_block_3 = DenseBlock(in_channels=channel_count, 
                                        repeats=REPEATS[3])
        channel_count = int(channel_count+REPEATS[3]*GROWTH)
        
        
        self.avgpool = nn.AdaptiveAvgPool2d(output_size=(1,1))       #(8)
        self.fc = nn.Linear(in_features=channel_count, out_features=1000)  #(9)

What we do first within the __init__() methodology above is to initialize the first_conv and the first_pool layers. Remember the fact that these two layers neither belong to the dense block nor the transition layer, so we have to manually initialize them as nn.Conv2d and nn.MaxPool2d cases. In truth, these two preliminary layers are fairly distinctive. The convolution layer makes use of a really giant kernel of dimension 7×7 (#(1)) with the stride of two (#(2)). So, not solely capturing data from giant space, however this layer additionally performs spatial downsampling in-place. Right here we additionally have to set the padding to three (#(3)) to compensate for the massive kernel in order that the spatial dimension doesn’t get decreased an excessive amount of. Subsequent, the pooling layer is completely different from those within the transition layer, the place we use 3×3 maxpooling fairly than 2×2 common pooling (#(4)).

As the primary two layers are executed, what we do subsequent is to initialize the dense blocks and the transition layers. The thought is fairly easy, the place we have to initialize the dense blocks consisting of a number of bottleneck blocks (which the quantity bottlenecks is handed by way of the repeats parameter (#(5))). Bear in mind to maintain observe of the channel depend of every step (#(6,7)) in order that we are able to match the enter form of the next layer with the output form of the earlier one. After which we mainly do the very same factor for the remaining dense blocks and the transition layers.

As we now have reached the final dense block, we now initialize the worldwide common pooling layer (#(8)), which is accountable for taking the common worth throughout the spatial dimension, earlier than ultimately initializing the classification head (#(9)). Lastly, as all layers have been initialized, we are able to now join all of them contained in the ahead() methodology beneath.

# Codeblock 8b
    def ahead(self, x):
        print(f'originaltt: {x.dimension()}')
        
        x = self.first_conv(x)
        print(f'after first_convt: {x.dimension()}')
        
        x = self.first_pool(x)
        print(f'after first_poolt: {x.dimension()}')
        
        x = self.dense_block_0(x)
        print(f'after dense_block_0t: {x.dimension()}')
        
        x = self.transition_0(x)
        print(f'after transition_0t: {x.dimension()}')

        x = self.dense_block_1(x)
        print(f'after dense_block_1t: {x.dimension()}')
        
        x = self.transition_1(x)
        print(f'after transition_1t: {x.dimension()}')
        
        x = self.dense_block_2(x)
        print(f'after dense_block_2t: {x.dimension()}')
        
        x = self.transition_2(x)
        print(f'after transition_2t: {x.dimension()}')
        
        x = self.dense_block_3(x)
        print(f'after dense_block_3t: {x.dimension()}')
        
        x = self.avgpool(x)
        print(f'after avgpooltt: {x.dimension()}')
        
        x = torch.flatten(x, start_dim=1)
        print(f'after flattentt: {x.dimension()}')
        
        x = self.fc(x)
        print(f'after fctt: {x.dimension()}')
        
        return x

That’s mainly the entire implementation of the DenseNet structure. We are able to take a look at if it really works correctly by operating the Codeblock 9 beneath. Right here we go the x tensor by way of the community, wherein it simulates a batch of a single 224×224 RGB picture.

# Codeblock 9
densenet = DenseNet()
x = torch.randn(1, 3, 224, 224)

x = densenet(x)

And beneath is what the output appears like. Right here I deliberately print out the tensor form after every step so that you could clearly see how the tensor transforms all through the whole community. Regardless of having so many layers, that is truly the smallest DenseNet variant, i.e., DenseNet-121. You’ll be able to truly make the mannequin even bigger by altering the values within the REPEATS listing in accordance with the variety of bottleneck blocks inside every dense block given in Determine 5.

# Codeblock 9 Output
authentic             : torch.Measurement([1, 3, 224, 224])
after first_conv     : torch.Measurement([1, 64, 112, 112])
after first_pool     : torch.Measurement([1, 64, 56, 56])
after bottleneck #0  : torch.Measurement([1, 76, 56, 56])
after bottleneck #1  : torch.Measurement([1, 88, 56, 56])
after bottleneck #2  : torch.Measurement([1, 100, 56, 56])
after bottleneck #3  : torch.Measurement([1, 112, 56, 56])
after bottleneck #4  : torch.Measurement([1, 124, 56, 56])
after bottleneck #5  : torch.Measurement([1, 136, 56, 56])
after dense_block_0  : torch.Measurement([1, 136, 56, 56])
after transition_0   : torch.Measurement([1, 68, 28, 28])
after bottleneck #0  : torch.Measurement([1, 80, 28, 28])
after bottleneck #1  : torch.Measurement([1, 92, 28, 28])
after bottleneck #2  : torch.Measurement([1, 104, 28, 28])
after bottleneck #3  : torch.Measurement([1, 116, 28, 28])
after bottleneck #4  : torch.Measurement([1, 128, 28, 28])
after bottleneck #5  : torch.Measurement([1, 140, 28, 28])
after bottleneck #6  : torch.Measurement([1, 152, 28, 28])
after bottleneck #7  : torch.Measurement([1, 164, 28, 28])
after bottleneck #8  : torch.Measurement([1, 176, 28, 28])
after bottleneck #9  : torch.Measurement([1, 188, 28, 28])
after bottleneck #10 : torch.Measurement([1, 200, 28, 28])
after bottleneck #11 : torch.Measurement([1, 212, 28, 28])
after dense_block_1  : torch.Measurement([1, 212, 28, 28])
after transition_1   : torch.Measurement([1, 106, 14, 14])
after bottleneck #0  : torch.Measurement([1, 118, 14, 14])
after bottleneck #1  : torch.Measurement([1, 130, 14, 14])
after bottleneck #2  : torch.Measurement([1, 142, 14, 14])
after bottleneck #3  : torch.Measurement([1, 154, 14, 14])
after bottleneck #4  : torch.Measurement([1, 166, 14, 14])
after bottleneck #5  : torch.Measurement([1, 178, 14, 14])
after bottleneck #6  : torch.Measurement([1, 190, 14, 14])
after bottleneck #7  : torch.Measurement([1, 202, 14, 14])
after bottleneck #8  : torch.Measurement([1, 214, 14, 14])
after bottleneck #9  : torch.Measurement([1, 226, 14, 14])
after bottleneck #10 : torch.Measurement([1, 238, 14, 14])
after bottleneck #11 : torch.Measurement([1, 250, 14, 14])
after bottleneck #12 : torch.Measurement([1, 262, 14, 14])
after bottleneck #13 : torch.Measurement([1, 274, 14, 14])
after bottleneck #14 : torch.Measurement([1, 286, 14, 14])
after bottleneck #15 : torch.Measurement([1, 298, 14, 14])
after bottleneck #16 : torch.Measurement([1, 310, 14, 14])
after bottleneck #17 : torch.Measurement([1, 322, 14, 14])
after bottleneck #18 : torch.Measurement([1, 334, 14, 14])
after bottleneck #19 : torch.Measurement([1, 346, 14, 14])
after bottleneck #20 : torch.Measurement([1, 358, 14, 14])
after bottleneck #21 : torch.Measurement([1, 370, 14, 14])
after bottleneck #22 : torch.Measurement([1, 382, 14, 14])
after bottleneck #23 : torch.Measurement([1, 394, 14, 14])
after dense_block_2  : torch.Measurement([1, 394, 14, 14])
after transition_2   : torch.Measurement([1, 197, 7, 7])
after bottleneck #0  : torch.Measurement([1, 209, 7, 7])
after bottleneck #1  : torch.Measurement([1, 221, 7, 7])
after bottleneck #2  : torch.Measurement([1, 233, 7, 7])
after bottleneck #3  : torch.Measurement([1, 245, 7, 7])
after bottleneck #4  : torch.Measurement([1, 257, 7, 7])
after bottleneck #5  : torch.Measurement([1, 269, 7, 7])
after bottleneck #6  : torch.Measurement([1, 281, 7, 7])
after bottleneck #7  : torch.Measurement([1, 293, 7, 7])
after bottleneck #8  : torch.Measurement([1, 305, 7, 7])
after bottleneck #9  : torch.Measurement([1, 317, 7, 7])
after bottleneck #10 : torch.Measurement([1, 329, 7, 7])
after bottleneck #11 : torch.Measurement([1, 341, 7, 7])
after bottleneck #12 : torch.Measurement([1, 353, 7, 7])
after bottleneck #13 : torch.Measurement([1, 365, 7, 7])
after bottleneck #14 : torch.Measurement([1, 377, 7, 7])
after bottleneck #15 : torch.Measurement([1, 389, 7, 7])
after dense_block_3  : torch.Measurement([1, 389, 7, 7])
after avgpool        : torch.Measurement([1, 389, 1, 1])
after flatten        : torch.Measurement([1, 389])
after fc             : torch.Measurement([1, 1000])

Ending

I believe that’s just about all the pieces concerning the principle and the implementation of the DenseNet mannequin. You may also discover all of the codes above in my GitHub repo [2]. See ya in my subsequent article!

References

[1] Gao Huang et al. Densely Linked Convolutional Networks. Arxiv. https://arxiv.org/abs/1608.06993 [Accessed September 18, 2025].

[2] MuhammadArdiPutra. DenseNet. GitHub. https://github.com/MuhammadArdiPutra/medium_articles/blob/fundamental/DenseNet.ipynb [Accessed September 18, 2025].

DenseNet Paper Walkthrough: All Linked

The DenseNet Structure

Dense Block

Transition Layer

The Complete DenseNet Structure

Some Experimental Outcomes

DenseNet From Scratch

Bottleneck Implementation

Dense Block Implementation

Transition Layer

The Complete DenseNet Structure

Ending

References

Related Articles

Our books now obtainable worldwide! – NanoApps Medical – Official web site

3 Methods to Mirror Android Telephone Display on Home windows 11/10 (Wired and Wi-fi)

5 Greatest Bitnami Photographs Options for 2026

Latest Articles

Our books now obtainable worldwide! – NanoApps Medical – Official web site

3 Methods to Mirror Android Telephone Display on Home windows 11/10 (Wired and Wi-fi)

5 Greatest Bitnami Photographs Options for 2026

Our books now obtainable worldwide! – NanoApps Medical – Official web...