Stop Flattening Your Data! Why NdLinear Might Be Your AI’s Secret Weapon

Last Updated on May 1, 2025 by Editorial Team

Author(s): Vivek Tiwari

Originally published on Towards AI.

Stop Flattening Your Data! Why NdLinear Might Be Your AI’s Secret Weapon

Linear layers are fundamental building blocks in countless neural networks, performing essential transformations on data. The standard approach, often implemented as nn.Linear in frameworks like PyTorch, typically operates on flattened input vectors, effectively treating each feature independently. But what happens when the structure of the input data – its multi-dimensional arrangement – holds critical information? This is where NdLinear emerges as a compelling alternative. Unlike its traditional counterpart, NdLinear is specifically designed to process multi-dimensional tensors while preserving their inherent structure. This post delves into a fundamental comparison: nn.Linear versus NdLinear. We'll explore their core mechanics, mathematical differences, performance trade-offs, and the crucial question of when preserving data structure with NdLinear offers a distinct advantage over the standard linear transformation.

The Flattening Problem: Why Standard Linear Layers Fall Short on Structured Data.
Meet the Classic: nn.Linear
Meet NdLinear: Motivation, Math, and Theory: NdLinear
Head-to-Head Comparison
Seeing is Believing: Performance & Benchmarks
Hands-On: Code Implementation
The Verdict: When to Choose NdLinear vs. nn.Linear
Probable Applications
Conclusion: Embracing Structure for Smarter AI

1. The Flattening Problem: Why Standard Linear Layers Fall Short on Structured Data

The standard linear layer (nn.Linear) is a workhorse in deep learning, incredibly useful for many tasks. But it has a fundamental characteristic that becomes a major drawback when dealing with structured data: it expects its input to be a simple, flat vector (or a batch of them). Think about data like images with their grid of pixels, tables with rows and columns, or even text documents where layout matters. This data isn't naturally flat; it has inherent structure, dimensions, and relationships baked into its arrangement.

To feed this rich, multi-dimensional data into a standard nn.Linear layer, we're forced to perform an operation called "flattening." Imagine taking that image grid or table structure and just stretching it out into one long line of numbers. This act of flattening achieves the required input format, but at a significant cost: it completely destroys the spatial or sequential relationships between the data points. The layer no longer knows which pixel was next to which, or which table cell was above another. All that valuable structural context is simply lost in translation. This is the essence of the "flattening problem" – standard linear layers, by their very nature, ignore the potentially crucial information encoded in the shape of the data. When structure matters, this flattening becomes a critical bottleneck.

2. Meet the Classic: `nn.Linear`

The nn.Linear layer, often called a "fully connected" or "dense" layer, is arguably one of the most fundamental and widely used components in neural networks. It performs a simple, yet powerful, linear transformation on its input data.

How It Works (The Math)

At its core, nn.Linear applies an affine transformation to the incoming data. The mathematical formula governing this transformation is:

Forward:

Where:

X ∈ ℝB × (D₁·D₂···Dₙ) (input flattened)

W ∈ ℝ(H₁·H₂···Hₙ) × (D₁·D₂···Dₙ) (weight matrix)

b ∈ ℝH₁·H₂···Hₙ (bias)

B = batch size, D₁…Dₙ = input dims, H₁…Hₙ = output dims

Limitations of nn.Linear:

Loses multi-dimensional context — spatial, channels, time, semantics.
Parameter count explodes: grows as (product of input dims) × (product of output dims).
Often requires large models for expressive power — leading to overfitting, redundancy, inefficiency.

Parameter Explosion Example:If D₁=D₂=D₃=64 and H₁=H₂=H₃=128,
nn.Linear params = 6⁴³ × 12⁸³ = 230 ≈ 1,073,741,824
Enormous for practical use in large models!

This flattening process discards valuable structural information. For instance, in image data, the spatial relationships between pixels are lost. In document processing, the layout and relative positioning of elements — critical for understanding document structure — are discarded.

3. Meet NdLinear: Motivation, Math, and Theory: `NdLinear`

Published in early 2025 (arXiv paper), NdLinear is an N-dimensional linear transformation: for tensors, it acts on each axis independently using separate, smaller weight matrices, thereby preserving the structure and slashing parameter growth.

NdLinear takes a fundamentally different approach to linear transformations. Instead of flattening the input tensor, it preserves its multi-dimensional structure by applying separate transformations along each dimension.

This approach offers a powerful inductive bias that aligns with the inherent structure of many types of data, from images to documents to multi-dimensional time series.

4. Head-to-Head Comparison: `nn.Linear` vs. `NdLinear`

Now that we understand the motivation and mechanics behind both nn.Linear and the structure-aware NdLinear, let's directly compare them across key aspects.

Mathematical Foundations {#mathematics}

To understand NdLinear better, let’s delve deeper into the mathematical operations involved.

Mode-wise Tensor-Matrix Multiplication

For each dimension ii of the input tensor, NdLinear performs a mode-ii tensor-matrix multiplication. This operation maps the ii-th dimension from size DiDi to size HiHi while leaving other dimensions unchanged.

The algorithm for NdLinear transformation can be broken down into these steps:

For each dimension ii from 1 to nn:

Transpose the tensor to isolate dimension ii
Reshape for linear operation
Apply linear mapping to dimension ii
Restore tensor shape

The tensor algebra notation X×iWiX×iWi represents this process for dimension ii.

Figure 1: nn.Linear flattens the input tensor; NdLinear preserves structure and transforms along each dimension.

NdLinear Transformation:

Where:

Each Wi ∈ ℝDi × Hi is a learnable matrix for axis i
Preserves N-dimensional shape: never flattens/tangles structure

Operation is similar to sequential mode-wise tensor-matrix multiplication (akin to Tucker/CP decompositions in tensor algebra)

The difference becomes dramatic as dimensions increase. For example, with Di=64Di=64 and Hi=128Hi=128 for all ii, and n=3n=3:

nn.Linear: 643×1283=230≈1643×1283=230≈1 billion parameters
NdLinear: 3×64×128=24,5763×64×128=24,576 parameters

Figure 2: Massive reduction in parameter count when using NdLinear on multi-dimensional data.

This parameter efficiency translates into several practical advantages:

Reduced memory footprint
Faster training times
Lower risk of overfitting
Feasibility for deployment on resource-constrained devices

Computational Complexity Analysis {#complexity}

Beyond parameter efficiency, NdLinear also offers computational advantages. Let’s analyze the computational complexity of both approaches.

For an input tensor X ∈ R^(B × D₁ × D₂ × … × Dn) and output tensor Y ∈ R^(B × H₁ × H₂ × … × Hn), where B is the batch size and Dᵢ, Hᵢ represent the dimensions of the input and output tensors respectively, the complexity of operations can be expressed as follows:

– For nn.Linear:

O(B ⋅ ∏ᵢ=1ⁿ Dᵢ ⋅ ∏ᵢ=1ⁿ Hᵢ)

– For NdLinear:

O(B ⋅ Σᵢ=1ⁿ (∏ⱼ≠ᵢ Dⱼ ⋅ Dᵢ ⋅ Hᵢ))

Figure 3: NdLinear achieves far lower computational complexity in high-dimensional tensor scenarios.

Practical Implications

In practice, this computational efficiency translates to:

Faster inference times
Reduced energy consumption
Improved scalability for large models
Better utilization of hardware resources

5. Seeing is Believing: Performance & Benchmarks

While the mathematical differences are clear, the real question is: does using NdLinear actually lead to better results? This is where benchmarks and real-world applications come into play. Often, the most telling comparisons arise when these layers are integrated into larger models designed for tasks where data structure is paramount – think about understanding complex documents or analyzing spatial data.

For instance, consider projects focused on Document AI (DocAI), where models need to parse layouts, extract information from forms, and understand tables. When comparing models using nn.Linear versus those enhanced with NdLinear in such scenarios, several key performance aspects are typically evaluated:

Figure 3: NdLinear achieves far better scores than nn.linear

Accuracy & Task Performance:

This is where NdLinear often demonstrates its value. In tasks like token classification on structured documents (e.g., identifying headers, questions, answers in forms using datasets like FUNSD), models incorporating NdLinear frequently show noticeable improvements in metrics like F1-score or overall accuracy compared to baseline models relying solely on nn.Linear. The ability to preserve layout information directly translates to better understanding of the document's semantics.

2. Computational Metrics (Latency & Memory):

Benchmarks often include operational metrics. While standard nn.Linear layers are highly optimized, NdLinear's performance can vary based on its specific implementation. Some benchmarks might show NdLinear having slightly different inference latency compared to nn.Linear (perhaps marginally higher in some cases due to more complex operations, as hinted in project benchmarks ). Similarly, memory usage might differ and needs comparison. The trade-off between potential accuracy gains and computational cost is a crucial evaluation point.

3. Qualitative Insights:

Beyond quantitative metrics, qualitative analysis often reveals how NdLinear improves understanding. For example, models using NdLinear might be significantly better at correctly classifying tokens within complex table structures or accurately linking labels to their corresponding values in dense forms – scenarios where nn.Linear's flattening approach might struggle due to lost spatial context. Observers might note fewer errors related to layout ambiguity.

In essence, while standard nn.Linear provides a strong baseline, performance benchmarks from relevant application areas often show that NdLinear's structure-aware approach can provide tangible benefits, particularly when the spatial arrangement of data is critical to the task at hand. This empirical evidence motivates considering NdLinear as a powerful alternative in the right contexts.

Key Metrics (Extracted from DOCAI):

Loss Reduction: NdLinear reduced the loss from 0.25–0.28 to approximately 0.10, a substantial improvement in model convergence.
Improved F1 Score: The F1 score increased from 0.70–0.75 to around 0.80, indicating better overall performance.
Enhanced Precision & Recall: While precision remained similar (~0.80), recall improved from ~0.75 to ~0.77+.
Higher Accuracy: Accuracy jumped from 0.65–0.70 to 0.80–0.85, a significant boost.
Faster Inference: Inference speed improved from ~50ms/sample to 35–40ms/sample, a 20–30% speedup.
Reduced Memory Usage: Memory requirements decreased from ~100MB to 80–85MB, a 15–20% reduction.

These improvements demonstrate that NdLinear not only preserves important structural information in document processing but also delivers tangible performance benefits across all key metrics.

Why NdLinear Excels in Document Understanding

Document understanding tasks particularly benefit from NdLinear because:

Layout Preservation: Documents have natural 2D structures that NdLinear preserves.
Positional Relationships: The spatial relationships between form elements (fields, labels, values) are critical for correct classification.
Multi-Scale Features: Documents contain features at different scales (characters, words, paragraphs, sections) that benefit from dimension-specific processing.

By processing each dimension separately, NdLinear effectively captures these document-specific characteristics, leading to the observed performance improvements.

6. Hands-On: Code Implementation.

Basic Implementation of NdLinear

Here’s a simplified implementation of NdLinear in PyTorch:

Using NdLinear in a Model

Here’s how you might use NdLinear in a simple model:

Tips for Effective NdLinear Usage

Dimension Alignment: Ensure that input dimensions and hidden dimensions are correctly aligned.
Parameter Tuning: Experiment with different dimension sizes to find the optimal balance between expressiveness and efficiency.
Initialization Strategy: Use appropriate weight initialization strategies for stable training.
Combine with Other Techniques: NdLinear works well in combination with other techniques like attention mechanisms and convolutions.

7. The Verdict: When to Choose `NdLinear` vs. `nn.Linear`

Choosing between NdLinear and nn.Linear isn't about one being universally "better"; it's about selecting the right tool for the job based on your data and goals. Here’s a guide:

When `NdLinear` Shines (Consider Choosing It):

Your Data Has Meaningful Structure: This is the primary reason. If you’re working with images, videos, volumetric data, documents where layout is important, time-series with spatial correlations, or any data where the arrangement (2D, 3D, sequential) carries significant information, NdLinear is designed precisely for this. It avoids the information loss caused by flattening.
Preserving Relationships is Key: If understanding the local context, spatial relationships, or sequence order within your features is crucial for the task (e.g., identifying objects in images, parsing tables in documents), NdLinear's structure-preserving nature is a major advantage.
Seeking Performance Edge on Structured Tasks: As benchmarks often show, if you need the highest possible accuracy on tasks heavily dependent on understanding structure, the potential performance boost from NdLinear (despite potential computational trade-offs) might be worth it.
Specific Architectural Goals: In some advanced architectures or research areas, explicitly maintaining tensor dimensionality throughout the network might be a design requirement that NdLinear facilitates. Some specific implementations might also be tailored for certain hardware profiles like edge devices or CPUs, where their way of handling structure could offer advantages.

When `nn.Linear` Still Makes Sense (Stick with the Classic):

Working with Unstructured or Tabular Data: If your input data is already naturally represented as flat feature vectors without inherent spatial or sequential structure (like typical spreadsheet data or pre-extracted feature sets), nn.Linear is perfectly suitable and efficient.
Simplicity and Speed are Paramount: nn.Linear is simple to implement and understand. Its operations are usually highly optimized in deep learning libraries, often leading to faster execution times compared to potentially more complex NdLinear implementations, especially if structure isn't critical.
Features Are Already Structure-Agnostic: If preceding layers in your network (like a CNN’s global pooling layer or a powerful sequence encoder) have already effectively summarized the structural information into a feature vector, a subsequent nn.Linear layer is often the appropriate choice for final classification or regression.
Prototyping and Baselines: nn.Linear serves as an excellent, easy-to-implement baseline. It's often best to start simple and only introduce complexity like NdLinear if there's a clear need and demonstrated benefit for your specific problem.
Maximum Compatibility: As the standard, nn.Linear ensures maximum compatibility with existing codebases, tutorials, and pre-trained models.

In short: If your data’s shape tells an important story, listen to it with NdLinear. If you're dealing with flat features or prioritizing speed and simplicity above all else, the classic nn.Linear remains a solid choice.

Replacing nn.Linear with NdLinear

To replace nn.Linear with NdLinear in an existing model:

Identify Multi-dimensional Inputs: Look for places where tensors are flattened before being passed to linear layers.
Determine Input and Output Dimensions: Analyze the structure of your data to determine appropriate dimension sizes.
Replace Linear Layers: Substitute nn.Linear with NdLinear, preserving the multi-dimensional structure.
Adjust Subsequent Operations: Ensure that subsequent operations in your model correctly handle the multi-dimensional output of NdLinear.

Tips for Effective NdLinear Usage

Dimension Alignment: Ensure that input dimensions and hidden dimensions are correctly aligned.
Parameter Tuning: Experiment with different dimension sizes to find the optimal balance between expressiveness and efficiency.
Initialization Strategy: Use appropriate weight initialization strategies for stable training.
Combine with Other Techniques: NdLinear works well in combination with other techniques like attention mechanisms and convolutions.

8. Probable Applications

The impact of NdLinear extends beyond current applications, pointing to several exciting future directions:

1. Large Language Models (LLMs)

Preliminary experiments with NdLinear in Open Pre-trained Transformer (OPT) models have shown reduced perplexity scores despite having fewer parameters. This suggests potential for:

More efficient LLM architectures
Improved performance on language tasks
Reduced computational requirements for training and inference

2. Computer Vision

In vision tasks, NdLinear can preserve spatial relationships more effectively than traditional linear layers, leading to:

Better feature extraction in CNNs
More efficient Vision Transformers (ViTs)
Improved performance on tasks requiring spatial understanding

3. Multi-modal Learning

NdLinear’s ability to handle multi-dimensional data makes it particularly suitable for multi-modal learning, where:

Each modality can be processed along its natural dimensions
Cross-modal relationships can be captured more effectively
Structural information across modalities can be preserved

4. Resource-Constrained Environments

The parameter efficiency of NdLinear makes it ideal for deployment in resource-constrained environments such as:

Mobile devices
Edge computing systems
IoT devices
Low-power AI applications

5. Specialized Domains

Several specialized domains could benefit significantly from NdLinear:

Medical Imaging: Preserving 3D relationships in volumetric scans
Video Processing: Maintaining spatial-temporal relationships
Scientific Simulations: Handling multi-dimensional physical systems
Financial Time Series: Capturing multi-variate dependencies across time

9. Conclusion: Embracing Structure for Smarter AI

The journey through linear layers highlights a critical choice in modern neural network design: sticking with the classic nn.Linear or embracing the structure-aware capabilities of NdLinear. While nn.Linear remains a valuable tool for flat data, its inherent flattening process discards vital information when dealing with multi-dimensional inputs. NdLinear represents a significant architectural advancement, specifically engineered to preserve this crucial structure.

By operating directly on multi-dimensional tensors, NdLinear enables models to build a deeper, more context-rich understanding. This theoretical advantage often translates into tangible real-world improvements, as seen in application areas like Document AI, where models using NdLinear can demonstrate enhanced performance on key metrics for document understanding tasks compared to nn.Linear baselines. Furthermore, its design can offer potential benefits in parameter efficiency and computational load under certain conditions, making it a compelling alternative.

Looking ahead, the potential of NdLinear to reshape how we approach problems across diverse fields—from computer vision and NLP to any domain with multi-dimensional data—is immense. Considering NdLinear when tackling problems where data shape matters could unlock significant improvements in your models. The shift towards structure-aware processing is well underway, reminding us that sometimes, the most powerful insights lie not just in the data points themselves, but in how they're arranged. The revolution in neural network layers is here—and it's multi-dimensional.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Stop Flattening Your Data! Why NdLinear Might Be Your AI’s Secret Weapon

Author(s): Vivek Tiwari

Table of Contents

1. The Flattening Problem: Why Standard Linear Layers Fall Short on Structured Data

2. Meet the Classic: nn.Linear

How It Works (The Math)

Limitations of nn.Linear:

3. Meet NdLinear: Motivation, Math, and Theory: NdLinear

4. Head-to-Head Comparison: nn.Linear vs. NdLinear

Mathematical Foundations {#mathematics}

Mode-wise Tensor-Matrix Multiplication

Computational Complexity Analysis {#complexity}

Practical Implications

5. Seeing is Believing: Performance & Benchmarks

Key Metrics (Extracted from DOCAI):

Why NdLinear Excels in Document Understanding

6. Hands-On: Code Implementation.

Basic Implementation of NdLinear

Using NdLinear in a Model

Tips for Effective NdLinear Usage

7. The Verdict: When to Choose NdLinear vs. nn.Linear

When NdLinear Shines (Consider Choosing It):

When nn.Linear Still Makes Sense (Stick with the Classic):

Replacing nn.Linear with NdLinear

Tips for Effective NdLinear Usage

8. Probable Applications

1. Large Language Models (LLMs)

2. Computer Vision

3. Multi-modal Learning

4. Resource-Constrained Environments

5. Specialized Domains

9. Conclusion: Embracing Structure for Smarter AI

References

NdLinear Is All You Need for Representation Learning

Many high-impact machine learning tasks involve multi-dimensional data (e.g., images, volumetric medical scans…

GitHub – vivek-tiwari-vt/DOCAI: Structure-Aware Transformer for Document Intelligence using…

Structure-Aware Transformer for Document Intelligence using NdLinear FFNs – vivek-tiwari-vt/DOCAI

GitHub – ensemble-core/NdLinear: NdLinear is a simple drop-in replacement for nn.Linear that makes…

NdLinear is a simple drop-in replacement for nn.Linear that makes your models smaller, faster, and better. It works for…

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement

2. Meet the Classic: `nn.Linear`

3. Meet NdLinear: Motivation, Math, and Theory: `NdLinear`

4. Head-to-Head Comparison: `nn.Linear` vs. `NdLinear`

7. The Verdict: When to Choose `NdLinear` vs. `nn.Linear`

When `NdLinear` Shines (Consider Choosing It):

When `nn.Linear` Still Makes Sense (Stick with the Classic):