Market Cap: $3.4532T -0.730%
Volume(24h): $118.0652B -42.080%
  • Market Cap: $3.4532T -0.730%
  • Volume(24h): $118.0652B -42.080%
  • Fear & Greed Index:
  • Market Cap: $3.4532T -0.730%
Cryptos
Topics
Cryptospedia
News
CryptosTopics
Videos
Top News
Cryptos
Topics
Cryptospedia
News
CryptosTopics
Videos
bitcoin
bitcoin

$102614.759463 USD

3.48%

ethereum
ethereum

$3190.944655 USD

3.35%

xrp
xrp

$3.097785 USD

8.07%

tether
tether

$0.999922 USD

0.04%

solana
solana

$239.394899 USD

5.11%

bnb
bnb

$678.714852 USD

4.69%

usd-coin
usd-coin

$0.999925 USD

-0.02%

dogecoin
dogecoin

$0.333782 USD

5.61%

cardano
cardano

$0.951528 USD

6.24%

tron
tron

$0.246941 USD

4.33%

chainlink
chainlink

$24.204832 USD

4.16%

avalanche
avalanche

$34.020335 USD

-0.37%

stellar
stellar

$0.407642 USD

5.88%

toncoin
toncoin

$4.976454 USD

3.69%

hedera
hedera

$0.315927 USD

3.56%

Cryptocurrency News Articles

SmolVLMs: Hugging Face Releases the World's Smallest Vision-Language Models

Jan 26, 2025 at 12:21 am

Machine learning algorithms have been developed to handle lots of different tasks, from making predictions to matching patterns or generating images that match

SmolVLMs: Hugging Face Releases the World's Smallest Vision-Language Models

Recent years have seen a massive increase in the capabilities of machine learning algorithms, which can now perform a wide range of tasks, from making predictions to matching patterns or generating images that match text prompts. To enable them to take on such diverse roles, these models have been given a broad spectrum of capabilities, but one thing they rarely are is efficient.

In the present era of exponential growth in the field, rapid advancements often come at the expense of efficiency. It is faster, after all, to produce a very large kitchen-sink model filled with redundancies than it is to produce a lean, mean inferencing machine.

But as these present algorithms continue to mature, more attention is being directed at slicing them down to smaller sizes. Even the most useful tools are of little value if they require such a large amount of computational resources that they are impractical for use in real-world applications. As you might expect, the more complex an algorithm is, the more challenging it is to shrink it down. That is what makes Hugging Face’s recent announcement so exciting — they have taken an axe to vision language models (VLMs), resulting in the release of new additions to the SmolVLM family — including SmolVLM-256M, the smallest VLM in the world.

SmolVLM-256M is an impressive example of optimization done right, with just 256 million parameters. Despite its small size, this model performs very well in tasks such as captioning, document-based question answering, and basic visual reasoning, outperforming older, much larger models like the Idefics 80B from just 17 months ago. The SmolVLM-500M model provides an additional performance boost, with 500 million parameters offering a middle ground between size and capability for those needing some extra headroom.

Hugging Face achieved these advancements by refining its approach to vision encoders and data mixtures. The new models adopt the SigLIP base patch-16/512 encoder, which, though smaller than its predecessor, processes images at a higher resolution. This choice aligns with recent trends seen in Apple and Google research, which emphasize higher resolution for improved visual understanding without drastically increasing parameter counts.

The team also employed innovative tokenization methods to further streamline their models. By improving how sub-image separators are represented during tokenization, the models gained greater stability during training and achieved better quality outputs. For example, multi-token representations of image regions were replaced with single-token equivalents, enhancing both efficiency and accuracy.

In another advance, the data mixture strategy was fine-tuned to emphasize document understanding and image captioning, while maintaining a balanced focus on essential areas like visual reasoning and chart comprehension. These refinements are reflected in the model’s improved benchmarks which show both the 250M and 500M models outperforming Idefics 80B in nearly every category.

By demonstrating that small can indeed be mighty, these models pave the way for a future where advanced machine learning capabilities are both accessible and sustainable. If you want to help bring that future into being, go grab these models now. Hugging Face has open-sourced them, and with only modest hardware requirements, just about anyone can get in on the action.

Disclaimer:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

Other articles published on Jan 29, 2025